How to build Animated Charts like Hans Rosling — doing it all in R

A Small Educative Project for Learning Data Visualisation Skills leveraging 2 libraries (gganimate and plot.ly) — UPDATED with new gganimate version

Tristan Ganry
Towards Data Science

--

Hans Rosling was a statistics guru. He spent his entire life promoting the use of data with animated charts to explore development issues and to share a fact-based view of the world. His most popular TED Talk: “The best stats you’ve ever seen” counts more than 12 million views. It’s also one of the Top 100 most viewed TED Talk in the world.

Have you ever dreamed of building animated charts like his most popular one for free, from scratch, under 5 minutes, and in different formats you can share (gif, html for your website, etc.)?

What we will be building with gganimate ;)

This article will show you how to build animated charts with R using 2 approaches:

  • R + gganimate library that will create a GIF file
  • R + plot.ly that will generate an HTML file that you can embed into your website (see below the plot.ly version)
What we will be building with plot.ly ;)

The assumption is that you already have Rstudio installed and know some of the basics as this article will only covers the steps and the code to generate the visuals. The full code is available on Github for educational purpose.

STEP 1: Search and download the datasets you need

First stop is on gapminder.com to download the 3 data sets needed. This foundation was created by the Rosling family. It has fantastic visualizations and data sets that everyone should check to “fight ignorance with a fact-based worldview that everyone can understand”.

We will download 3 excel files (xlsx):

  1. Children per woman (total fertility)
  2. Population, total
  3. Life expectancy (years)

Once the file are downloaded and saved in your working folder, it’s time to clean and merge the data sets.

STEP2: Clean and merge the data

a1) Loading the data with xlsx library (replace ‘..’ by your folder)

# Please note that loading xlsx in R is really slow compared to csvlibrary(xlsx)population_xls <- read.xlsx("indicator gapminder population.xlsx", encoding = "UTF-8",stringsAsFactors= F, sheetIndex = 1, as.data.frame = TRUE, header=TRUE)fertility_xls <- read.xlsx("indicator undata total_fertility.xlsx", encoding = "UTF-8",stringsAsFactors= F, sheetIndex = 1, as.data.frame = TRUE, header=TRUE)lifeexp_xls <- read.xlsx("indicator life_expectancy_at_birth.xlsx", encoding = "UTF-8", stringsAsFactors= F, sheetIndex = 1, as.data.frame = TRUE, header=TRUE)

a2) UPDATE — Install gganimate new version on R version 5.3+

library(devtools)
library(RCurl)
library(httr)
set_config( config( ssl_verifypeer = 0L ) )
devtools::install_github("RcppCore/Rcpp")
devtools::install_github("thomasp85/gganimate", force = TRUE)

b) Clean and merge the data with reshape and dplyr libraries

# Load libraries
library(reshape)
library(gapminder)
library(dplyr)
library(ggplot2)
# Create a variable to keep only years 1962 to 2015
myvars <- paste("X", 1962:2015, sep="")
# Create 3 data frame with only years 1962 to 2015
population <- population_xls[c('Total.population',myvars)]
fertility <- fertility_xls[c('Total.fertility.rate',myvars)]
lifeexp <- lifeexp_xls[c('Life.expectancy',myvars)]
# Rename the first column as "Country"
colnames(population)[1] <- "Country"
colnames(fertility)[1] <- "Country"
colnames(lifeexp)[1] <- "Country"
# Remove empty lines that were created keeping only 275 countries
lifeexp <- lifeexp[1:275,]
population <- population[1:275,]
# Use reshape library to move the year dimension as a column
population_m <- melt(population, id=c("Country"))
lifeexp_m <- melt(lifeexp, id=c("Country"))
fertility_m <- melt(fertility, id=c("Country"))
# Give a different name to each KPI (e.g. pop, life, fert)
colnames(population_m)[3] <- "pop"
colnames(lifeexp_m)[3] <- "life"
colnames(fertility_m)[3] <- "fert"
# Merge the 3 data frames into one
mydf <- merge(lifeexp_m, fertility_m, by=c("Country","variable"), header =T)
mydf <- merge(mydf, population_m, by=c("Country","variable"), header =T)
# The only piece of the puzzle missing is the continent name for each country for the color - use gapminder library to bring it
continent <- gapminder %>% group_by(continent, country) %>% distinct(country, continent)
continent <- data.frame(lapply(continent, as.character), stringsAsFactors=FALSE)
colnames(continent)[1] <- "Country"
# Filter out all countries that do not exist in the continent table
mydf_filter <- mydf %>% filter(Country %in% unique(continent$Country))
# Add the continent column to finalize the data set
mydf_filter <- merge(mydf_filter, continent, by=c("Country"), header =T)
# Do some extra cleaning (e.g. remove N/A lines, remove factors, and convert KPIs into numerical values)
mydf_filter[is.na(mydf_filter)] <- 0
mydf_filter <- data.frame(lapply(mydf_filter, as.character), stringsAsFactors=FALSE)
mydf_filter$variable <- as.integer(as.character(gsub("X","",mydf_filter$variable)))
colnames(mydf_filter)[colnames(mydf_filter)=="variable"] <- "year"
mydf_filter$pop <- round(as.numeric(as.character(mydf_filter$pop))/1000000,1)
mydf_filter$fert <- as.numeric(as.character(mydf_filter$fert))
mydf_filter$life <- as.numeric(as.character(mydf_filter$life))

STEP3 — UPDATE to new version of gganimate: Build the chart with gganimate and generate a GIF file to share with your friends

Now that we have a clean data set that contain the 3 KPIs (population, fertility and life expectancy) and the 3 dimensions (country, year, continent) we can generate the visual with gganimate.

# Load libraries
library(ggplot2)
library(gganimate)
#library(gifski)
#library(png)
# Add a global theme
theme_set(theme_grey()+ theme(legend.box.background = element_rect(),legend.box.margin = margin(6, 6, 6, 6)) )
# OLD VERSION
# Create the plot with years as frame, limiting y axis from 30 years to 100
# p <- ggplot(mydf_filter, aes(fert, life, size = pop, color = continent, frame = variable)) +
# geom_point()+ ylim(30,100) + labs(x="Fertility Rate", y = "Life expectancy at birth (years)", caption = "(Based on data from Hans Rosling - gapminder.com)", color = 'Continent',size = "Population (millions)") +
# scale_color_brewer(type = 'div', palette = 'Spectral')
# gganimate(p, interval = .2, "output.gif")
# NEW VERSION# Create the plot with years as frame, limiting y axis from 30 years to 100
p <- ggplot(mydf_filter, aes(fert, life, size = pop, color = continent, frame = year)) +
labs(x="Fertility Rate", y = "Life expectancy at birth (years)", caption = "(Based on data from Hans Rosling - gapminder.com)", color = 'Continent',size = "Population (millions)") +
ylim(30,100) +
geom_point() +
scale_color_brewer(type = 'div', palette = 'Spectral') +
# gganimate code
ggtitle("Year: {frame_time}") +
transition_time(year) +
ease_aes("linear") +
enter_fade() +
exit_fade()
# animate
animate(p, width = 450, height = 450)
# save as a GIF
anim_save("output.gif")

Now you can enjoy your well deserved GIF animation and share it with your friends.

STEP4: Build the chart with plot.ly and generate an HTML file to embed in your website

# Load libraries
library(plotly)
library(ggplot2)
# Create the plot
p <- ggplot(mydf_filter, aes(fert, life, size = pop, color = continent, frame = year)) +
geom_point()+ ylim(30,100) + labs(x="Fertility Rate", y = "Life expectancy at birth (years)", color = 'Continent',size = "Population (millions)") +
scale_color_brewer(type = 'div', palette = 'Spectral')
# Generate the Visual and a HTML output
ggp <- ggplotly(p, height = 900, width = 900) %>%
animation_opts(frame = 100,
easing = "linear",
redraw = FALSE)
ggp
htmlwidgets::saveWidget(ggp, "index.html")

The code is available on Github. Thank you for reading my post if you enjoyed it please clap. Feel free to contact me if you want to make animated charts within your organization.

Other interesting links to learn more about animated charts with R:

--

--