Every time I had to convert dataset from long to the wide format, I needed to Google for the solution. Although I found an answer every time, yet it was impossible to remember when I needed since I did not fully understand how transforming the dataset works. I prefer to use tidyverse package for this task, but I know that reshape package works as well.

First, I create a dataset with 9 rows and 3 IDs in long format to take as an example.

library(tidyverse)
set.seed(519)
longdata1 <- data.frame(ID = 1:3,
    expand.grid(Name = c("Dora", "John", "Rob"), Year = 2012:2014),
    BMI = round(runif(9, 18, 35), 0)
    )
longdata1
##   ID Name Year BMI
## 1  1 Dora 2012  24
## 2  2 John 2012  33
## 3  3  Rob 2012  33
## 4  1 Dora 2013  34
## 5  2 John 2013  27
## 6  3  Rob 2013  27
## 7  1 Dora 2014  19
## 8  2 John 2014  25
## 9  3  Rob 2014  34

The dataset includes 3 people who have 3 different measurements of BMI for each year from 2012-2014. To transform the dataset longdata1 from long to wide is a simple task. In spread function I put the variables I want to separate (Year, BMI).

Transform dataset from long to wide

wide = longdata1 %>% 
  spread(Year, BMI)
wide
##   ID Name 2012 2013 2014
## 1  1 Dora   24   34   19
## 2  2 John   33   27   25
## 3  3  Rob   33   27   34

Below, I will convert dataset from wide to long with gather function, in which I include variables I like to put in one column as Year, and besides that the column with the values of BMI.

Transform dataset from wide to long.

longdata2 = wide %>%
  gather("2012", "2013", "2014", key = Year, value = BMI)
longdata2
##   ID Name Year BMI
## 1  1 Dora 2012  24
## 2  2 John 2012  33
## 3  3  Rob 2012  33
## 4  1 Dora 2013  34
## 5  2 John 2013  27
## 6  3  Rob 2013  27
## 7  1 Dora 2014  19
## 8  2 John 2014  25
## 9  3  Rob 2014  34

Compare the orginal old long format dataset with the newly created

identical(longdata2$BMI, longdata1$BMI)
## [1] TRUE

Convert data from long to wide format with multiple columns

What I presented above is easy, because the dataset has only 2 variables for transforming, but often the dataset has more than 2 variable for conversion.

I will add new variable to the dataset

set.seed(520)
long3 <- data.frame(ID = 1:3,
    expand.grid(Name = c("Dora", "John", "Rob"), Year = 2012:2014),
    BMI = round(runif(9, 18, 35), 0),
    Cholesterol = round(runif(9, 200, 300), 0)
    )
long3
##   ID Name Year BMI Cholesterol
## 1  1 Dora 2012  20         244
## 2  2 John 2012  26         299
## 3  3  Rob 2012  20         218
## 4  1 Dora 2013  19         227
## 5  2 John 2013  20         211
## 6  3  Rob 2013  28         204
## 7  1 Dora 2014  27         272
## 8  2 John 2014  21         277
## 9  3  Rob 2014  23         245

Confirming that the spread() won't work

wide2 = long3 %>% 
  spread(Year, BMI)
wide2
##   ID Name Cholesterol 2012 2013 2014
## 1  1 Dora         227   NA   19   NA
## 2  1 Dora         244   20   NA   NA
## 3  1 Dora         272   NA   NA   27
## 4  2 John         211   NA   20   NA
## 5  2 John         277   NA   NA   21
## 6  2 John         299   26   NA   NA
## 7  3  Rob         204   NA   28   NA
## 8  3  Rob         218   20   NA   NA
## 9  3  Rob         245   NA   NA   23

Or, the code spread(Year, BMI, Cholesterol) will not work.

Now lets see the solution, step by step:

In order to use spread() we need to have 2 variables in long format. So first, I will gather the data.

long3 %>% 
  gather("Year", "BMI", "Cholesterol", key = variable, value = number)
##    ID Name    variable number
## 1   1 Dora        Year   2012
## 2   2 John        Year   2012
## 3   3  Rob        Year   2012
## 4   1 Dora        Year   2013
## 5   2 John        Year   2013
## 6   3  Rob        Year   2013
## 7   1 Dora        Year   2014
## 8   2 John        Year   2014
## 9   3  Rob        Year   2014
## 10  1 Dora         BMI     20
## 11  2 John         BMI     26
## 12  3  Rob         BMI     20
## 13  1 Dora         BMI     19
## 14  2 John         BMI     20
## 15  3  Rob         BMI     28
## 16  1 Dora         BMI     27
## 17  2 John         BMI     21
## 18  3  Rob         BMI     23
## 19  1 Dora Cholesterol    244
## 20  2 John Cholesterol    299
## 21  3  Rob Cholesterol    218
## 22  1 Dora Cholesterol    227
## 23  2 John Cholesterol    211
## 24  3  Rob Cholesterol    204
## 25  1 Dora Cholesterol    272
## 26  2 John Cholesterol    277
## 27  3  Rob Cholesterol    245

The variable has similar name which unable the spread function. So I need to create another variable which will identify each of them.

long3 %>% 
  group_by(ID) %>% 
  mutate(Visit = 1:n())
## # A tibble: 9 x 6
## # Groups:   ID [3]
##      ID Name   Year   BMI Cholesterol Visit
##   <int> <fct> <int> <dbl>       <dbl> <int>
## 1     1 Dora   2012    20         244     1
## 2     2 John   2012    26         299     1
## 3     3 Rob    2012    20         218     1
## 4     1 Dora   2013    19         227     2
## 5     2 John   2013    20         211     2
## 6     3 Rob    2013    28         204     2
## 7     1 Dora   2014    27         272     3
## 8     2 John   2014    21         277     3
## 9     3 Rob    2014    23         245     3
long3 %>% 
  group_by(ID) %>% 
  mutate(Visit = 1:n()) %>% 
  gather("Year", "BMI", "Cholesterol", key = variable, value = number)
## # A tibble: 27 x 5
## # Groups:   ID [3]
##       ID Name  Visit variable number
##    <int> <fct> <int> <chr>     <dbl>
##  1     1 Dora      1 Year       2012
##  2     2 John      1 Year       2012
##  3     3 Rob       1 Year       2012
##  4     1 Dora      2 Year       2013
##  5     2 John      2 Year       2013
##  6     3 Rob       2 Year       2013
##  7     1 Dora      3 Year       2014
##  8     2 John      3 Year       2014
##  9     3 Rob       3 Year       2014
## 10     1 Dora      1 BMI          20
## # … with 17 more rows

Next step is to combine variables together and prepare for spread() function

long3 %>% 
  group_by(ID) %>% 
  mutate(Visit = 1:n()) %>% 
  gather("Year", "BMI", "Cholesterol", key = variable, value = number) %>% 
  unite(combi, variable, Visit)
## # A tibble: 27 x 4
## # Groups:   ID [3]
##       ID Name  combi  number
##    <int> <fct> <chr>   <dbl>
##  1     1 Dora  Year_1   2012
##  2     2 John  Year_1   2012
##  3     3 Rob   Year_1   2012
##  4     1 Dora  Year_2   2013
##  5     2 John  Year_2   2013
##  6     3 Rob   Year_2   2013
##  7     1 Dora  Year_3   2014
##  8     2 John  Year_3   2014
##  9     3 Rob   Year_3   2014
## 10     1 Dora  BMI_1      20
## # … with 17 more rows

Each variable in combi has unique name for same ID and is ready for spread.

long3 %>% 
  group_by(ID) %>% 
  mutate(Visit = 1:n()) %>% 
  gather("Year", "BMI", "Cholesterol", key = variable, value = number) %>% 
  unite(combi, variable, Visit) %>% 
  spread(combi, number)
## # A tibble: 3 x 11
## # Groups:   ID [3]
##      ID Name  BMI_1 BMI_2 BMI_3 Cholesterol_1 Cholesterol_2 Cholesterol_3
##   <int> <fct> <dbl> <dbl> <dbl>         <dbl>         <dbl>         <dbl>
## 1     1 Dora     20    19    27           244           227           272
## 2     2 John     26    20    21           299           211           277
## 3     3 Rob      20    28    23           218           204           245
## # … with 3 more variables: Year_1 <dbl>, Year_2 <dbl>, Year_3 <dbl>

Now the long dataset with more than 2 columns is in wide format.