R: Word Cloud Case Study#
By Jacky Poon, originally published in Actuaries Digital as Analytics Snippet: In the Library.
Libraries and Packages#
In R, we will be using:
plotly for the pie charts,
dplyr for data manipulation,
tm for text mining,
wordcloud for word clouds,
and RColorBrewer for a touch of colour.
library("plotly")
library("dplyr")
library("tm")
library("wordcloud")
library("RColorBrewer")
Loading required package: ggplot2
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: NLP
Attaching package: ‘NLP’
The following object is masked from ‘package:ggplot2’:
annotate
Loading required package: RColorBrewer
If you do not have these packages installed previously, you will need to run install.packages to install them (e.g.install.packages("plotly")
)
Reading the Data#
Let’s download the data from the [Brisbane City Open Data Portal] (https://www.data.brisbane.qld.gov.au/data/dataset/library-checkouts-branch-date) which is publicly available under [Creative Commons Attribution 4.0] (https://creativecommons.org/licenses/by/4.0/):
temp <- tempfile()
download.file("https://www.data.brisbane.qld.gov.au/data/dataset/53d02339-1818-43df-9845-83808e5e247e/resource/ed431a68-15f2-430e-b140-4c603597680a/download/library-checkouts-all-branches-december-2017.csv.zip", temp, mode="wb")
data <- read.csv(unz(temp, "Library Checkouts all Branches December 2017.csv"))
Inspecting the data, we have title, author, item type, age, and the library branch it was checked out from, as well as various IDs:
data
Title | Author | Call.number | Item.id | Item.type | Status | Language | Age | Library | Date |
---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> |
Ashes to ashes / Jenny Han & Siobhan Vivian | Han, Jenny | YA-PBK HAN | 34000092366921 | YA-PBK | ON-SHELVES | YA | CDE | 201712070001 | |
Silicon chip | AD-MAGS V.30 NO.11 NOV 2017 | 34000103390480 | AD-MAGS | ON-SHELVES | ADULT | HPK | 201712070006 | ||
Nepal / written and researched by Bradley Mayhew, Lindsay Brown, Trent Holden | Mayhew, Bradley | 915.496 MAY | 34000089583934 | NONFICTION | ON-SHELVES | ADULT | CNL | 201712070017 | |
Trekking in the Nepal Himalaya / written and researched by Bradley Mayhew, Lindsay Brown, Stuart Butler | Mayhew, Bradley | 915.496 MAY | 34000099615932 | NONFICTION | ON-SHELVES | ADULT | CNL | 201712070017 | |
The destroyers / Christopher Bollen | Bollen, Christopher, 1975- | AD-PBK BOL | 34000102849353 | AD-PBK | ON-SHELVES | ADULT | BSQ | 201712070035 | |
The lonely city : adventures in the art of being alone / Olivia Laing | Laing, Olivia | 700.19 LAI | 34000100432228 | NONFICTION | ON-SHELVES | ADULT | BSQ | 201712070035 | |
Too many elephants in this house / Ursula Dubosarsky ; pictures by Andrew Joyner | Dubosarsky, Ursula, 1961- | PICTURE-BK DUB | 34000092809425 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712070135 | |
Lost cities of the ancients [dvd] | Clifton, Dan | 930.1 LOS | 34000084965854 | DVD | ON-SHELVES | ADULT | SGT | 201712070138 | |
Blind faith / Rebecca Zanetti | Zanetti, Rebecca | AD-PBK ZAN | 34000096686480 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Divided / Sharon M. Johnston | Johnston, Sharon M | AD-PBK JOH | 34000102465747 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Battlestorm / Susan Krinard | Krinard, Susan | AD-PBK KRI | 34000102655537 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Blind date / Bella Jewel | Jewel, Bella | AD-PBK JEW | 34000103727624 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Mist / Susan Krinard | Krinard, Susan | AD-PBK KRI | 34000092716661 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Black ice / Susan Krinard | Krinard, Susan | AD-PBK KRI | 34000095495792 | AD-PBK | ON-SHELVES | ADULT | SGT | 201712070138 | |
Islands : a New Zealand journey / [written by] Bruce Ansley and [illustrated by] Jane Ussher | Ansley, Bruce | 919.399 ANS | 34000101727212 | NONFICTION | ON-SHELVES | ADULT | SBK | 201712070203 | |
Great southern land / Ivan O'Mahoney & Steve Bibb | O'Mahoney, Ivan | 919.4 OMA | 34000091354852 | NONFICTION | ON-SHELVES | ADULT | SBK | 201712070203 | |
What a wonderful life : with positive psychology / Sarah Zobel Koelpin ; translated by Hans Wrang and Martin Aitken | Koelpin, Sarah Zobel | 158 KOE | 34000084428622 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
The everyday entrepreneur : apply the triple threat of amibition, confidence, and conviction for success on your own terms / Rob Basso with Adina Genn | Basso, Rob | 658.111 BAS | 34000087303970 | NONFICTION | ON-SHELVES | UNKNOWN | ADULT | GCY | 201712070208 |
Employee to entrepreneur : how to ditch the day job and start your own business / Chris Garden and Catherine Blackburn | Garden, Chris | 658.11 GAR | 34000091056747 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
It starts with passion : how to love what you do and do what you love / Keith Abraham | Abraham, Keith | 158.1 ABR | 34000092840107 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
Love karma : use your intuition to find, create, and nuture love in your life / Char Margolis | Margolis, Char | 131 MAR | 34000092935758 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
The five principles of spiritual reality / Paul Guggenheimer | Guggenheimer, Paul | 248.4 GUG | 34000093192375 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
The future is yours! : how to effectively manage the whole world, including life, family, and business, and remain true to yourself / Rolf U. Kramer, MA | Kramer, Rolf U | 658 KRA | 34000093777076 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
Chicken soup for the soul : touched by an angel : 101 miraculous stories of faith, divine intervention, and answered prayers / Amy Newmark ; foreword by Gabrielle Bernstein | Newmark, Amy | 235.3 CHI | 34000098320765 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
I know how to live, I know how to die : the teachings of Dadi Janki : a warm, radical, and life-affirming iew of who we are, where we come from, and what time is calling us to do / Neville Hodgkinson | Hodgkinson, Neville | 294.544 JAN | 34000097752158 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
9 ways to a resilient child / Dr. Justin Coulson | Coulson, Justin | 649.1 COU | 34000102689148 | NONFICTION | ON-SHELVES | ADULT | GCY | 201712070208 | |
Harry Potter and the Order of the Phoenix [dvd] | Grint, Rupert, 1988- | DVD | 34000094318805 | DVD | ON-SHELVES | ADULT | CDE | 201712070228 | |
Million dollar baby [dvd] | Eastwood, Clint, 1930- | DVD | 34000094719796 | DVD | ON-SHELVES | ADULT | CDE | 201712070228 | |
Test your brain [dvd] | 153 TES | 34000099772220 | DVD | ON-SHELVES | ADULT | CDE | 201712070228 | ||
The hunger games [dvd]. Mockingjay. Part 1 | Craig, Peter, 1969- | DVD | 34000098370794 | DVD | ON-SHELVES | ADULT | CDE | 201712070228 | |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
The Detective Dog / written by Julia Donaldson ; illustrated by Sara Ogilvie | Donaldson, Julia | PICTURE-BK DON | 34000100677152 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712092210 | |
May Gibbs' tales from the billabong / May Gibbs, Jane Massam ; illustrated by Caroline Keys | Gibbs, May, 1877-1969 | PICTURE-BK GIB | 34000101041499 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712092210 | |
A bus called heaven / Bob Graham | Graham, Bob, 1942- | PICTURE-BK GRA | 34000101790475 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712092210 | |
Zogk / by Julia Donaldson & illustrated by Axel Scheffler | Donaldson, Julia | PICTURE-BK DON | 34000101124097 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712092210 | |
Too many elephants in this house / Ursula Dubosarsky ; pictures by Andrew Joyner | Dubosarsky, Ursula, 1961- | PICTURE-BK DUB | 34000093607851 | PICTURE-BK | ON-SHELVES | JUVENILE | SBK | 201712092210 | |
My husband next door / Catherine Alliott | Alliott, Catherine | LARGEPRINT | 34000093335826 | LARGEPRINT | ON-SHELVES | ADULT | IPY | 201712092216 | |
Oskar and Mo / Britta Teckentrup | Teckentrup, Britta | PICTURE-BK TEC | 34000103971511 | PICTURE-BK | ON-SHELVES | JUVENILE | SCR | 201712092217 | |
When Luke Skywalker became an X-wing pilot / [written by Trey King] | King, Trey | PICTURE-BK KIN | 34000100725670 | PICTURE-BK | ON-SHELVES | JUVENILE | SCR | 201712092217 | |
Miffy at school / Dick Bruna | Bruna, Dick | PICTURE-BK BRU | 34000102491511 | PICTURE-BK | ON-SHELVES | JUVENILE | SCR | 201712092217 | |
The art of keeping secrets / Rachael Johns | Johns, Rachael | AD-PBK JOH | 34000101131142 | AD-PBK | ON-SHELVES | ADULT | MIT | 201712092223 | |
Good night, sleep tight / written by Mem Fox ; iIllustrated by Judy Horacek | Fox, Mem, 1946- | PICTURE-BK FOX | 34000089911861 | PICTURE-BK | ON-SHELVES | UNKNOWN | JUVENILE | IPY | 201712092226 |
Ollie and the wind / Ronojoy Ghosh | Ghosh, Ronojoy | PICTURE-BK GHO | 34000098533367 | PICTURE-BK | ON-SHELVES | JUVENILE | IPY | 201712092226 | |
Harry the dirty dog / by Gene Zion ; Pictures by Margaret Bloy Graham | Zion, Gene | PICTURE-BK ZIO | 34000100673573 | PICTURE-BK | ON-SHELVES | JUVENILE | IPY | 201712092226 | |
Gracie and Josh / Susanne Gervay | Gervay, Susanne, 1951- | PICTURE-BK GAR | 34000091320861 | PICTURE-BK | ON-SHELVES | UNKNOWN | JUVENILE | IPY | 201712092226 |
Conception, pregnancy and birth / Miriam Stoppard ; [Australian consultant : Jonathan Morris] | Stoppard, Miriam, 1937- | 618.2 STO | 34000090407669 | NONFICTION | ON-SHELVES | UNKNOWN | ADULT | CDE | 201712092229 |
The one man / Andrew Gross | Gross, Andrew, 1952- | AD-PBK GRO | 34000101056893 | AD-PBK | ON-SHELVES | ADULT | CDE | 201712092229 | |
The caveman / Jorn Lier Horst ; translated by Anne Bruce | Horst, J²rn Lier, 1970- | AD-PBK HOR | 34000097764484 | AD-PBK | ON-SHELVES | ADULT | CDE | 201712092229 | |
Bumpology : a myth-busting guide for curious parents-to-be / Linda Geddes | Geddes, Linda | 618.2 GED | 34000090987520 | NONFICTION | ON-SHELVES | UNKNOWN | ADULT | CDE | 201712092229 |
How to conceive naturally : and have a healthy pregnancy after 30 / Christa Orecchio, CN and Willow Buckley, CCH, CD (DONA) ; foreword by Sara Gottfried, MD | Orecchio, Christa | 618.178 ORE | 34000099596801 | NONFICTION | ON-SHELVES | ADULT | CDE | 201712092229 | |
Mastering civility : a manifesto for the workplace / Christine Porath | Porath, Christine Lynne | 650.13 POR | 34000102617065 | NONFICTION | ON-SHELVES | ADULT | GNG | 201712092236 | |
New scientist (Australasian ed.) | AD-MAGS V.235 NO.3143 SEP 16, 2017 | 34000103548699 | AD-MAGS | ON-SHELVES | ADULT | GNG | 201712092236 | ||
Sorting the beef from the bull : the science of food fraud forensics / Richard Evershed & Nicola Temple | Evershed, Richard | 664.07 EVE | 34000103729729 | NONFICTION | ON-SHELVES | ADULT | GNG | 201712092236 | |
GQ Australia | AD-MAGS JUN/JUL 2017 | 34000102347754 | AD-MAGS | ON-SHELVES | ADULT | GNG | 201712092236 | ||
The Tea Chest / Josephine Moon | Moon, Josephine | AD-PBK MOO | 34000093833853 | AD-PBK | ON-SHELVES | UNKNOWN | ADULT | CDA | 201712092249 |
Power play / Catherine Coulter | Coulter, Catherine | LARGEPRINT | 34000093370294 | LARGEPRINT | ON-SHELVES | ADULT | CDA | 201712092253 | |
NYPD Red. 3 / James Patterson and Marshall Karp | Patterson, James, 1947- | LARGEPRINT | 34000098959505 | LARGEPRINT | ON-SHELVES | ADULT | CDA | 201712092253 | |
The good life : over 160 easy, delicious recipes for a healthy, lean lifestyle / Sally Obermeder + Maha Koraiem | Obermeder, Sally | 641.563 OBE | 34000100617844 | NONFICTION | ON-SHELVES | ADULT | CRA | 201712092325 | |
How to land a jumbo jet : a visual exploration of travel facts, figures and ephemera / edited by Nigel Holmes | Holmes, Nigel, 1942- | 910.202 HOW | 34000088234091 | NONFICTION | ON-SHELVES | UNKNOWN | ADULT | BSQ | 201712092355 |
Land of the turquoise mountains : journeys across Iran / Cyrus Massoudi | Massoudi, Cyrus | 915.5 MAS | 34000097303937 | NONFICTION | ON-SHELVES | ADULT | BSQ | 201712092355 | |
Thinking about it only makes it worse : and other lessons from modern life / David Mitchell | Mitchell, David, 1974- | 827.92 MIT | 34000096553490 | NONFICTION | ON-SHELVES | ADULT | BSQ | 201712092355 |
There is a language column, but it has many blanks and UNKNOWN
values so that does not appear to be very useful.
What sort of items are being checked out?#
plot_ly(width=900, height=450) %>%
add_pie(data=count(data, Age), labels = ~Age, values= ~n,
type = 'pie', textinfo = 'label+percent', domain = list(x = c(0.6, 1), y = c(0, 1))) %>%
add_pie(data=count(data, Item.type), labels = ~Item.type, values= ~n,
type = 'pie', textinfo = 'label+percent', domain = list(x = c(0, 0.4), y = c(0, 1))) %>%
layout(showlegend=FALSE,
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
Although one might commonly associate libraries with books, 12% of checked out items are DVDs.
We also see that children’s literature seems to be quite popular, with 39% of items being from the âJuvenileâ category.
It also becomes apparent the data is not perfect - the number of Young Adult items seems unusually low, and for example there are some items of type 2017
which is not a valid item code.
Titles#
Finally, we will wrap up with a word cloud to visualise the content borrowed over the pre-holiday period.
While inspecting the data earlier, it appears many titles also include the author after a slash. In addition, many titles include the media type such as “dvd”. These will need to be removed. We will also need to clean the text for common issues such as stripping out common words (like “the”), numbers, and punctuation.
titles_corpus <- sapply(strsplit(as.character(data$Title), "/"), `[`, 1) %>%
VectorSource() %>%
Corpus() %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removeNumbers) %>%
tm_map(removeWords, stopwords("english")) %>%
tm_map(removeWords, c("dvd", "book", "sound", "recording", "novel")) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
wordcloud(titles_corpus, max.words = 100, scale = c(5, 0.5), random.order = FALSE, colors=brewer.pal(8, "Dark2"))
Warning message in tm_map.SimpleCorpus(., content_transformer(tolower)):
“transformation drops documents”
Warning message in tm_map.SimpleCorpus(., removeNumbers):
“transformation drops documents”
Warning message in tm_map.SimpleCorpus(., removeWords, stopwords("english")):
“transformation drops documents”
Warning message in tm_map.SimpleCorpus(., removeWords, c("dvd", "book", "sound", :
“transformation drops documents”
Warning message in tm_map.SimpleCorpus(., removePunctuation):
“transformation drops documents”
Warning message in tm_map.SimpleCorpus(., stripWhitespace):
“transformation drops documents”
Unsurprisingly, Australia appears to be a popular topic, and Christmas features prominently for the month of December. This concludes our trip to the library for now!