This is a quick tutorial to introduce some of the basic tidyverse functions by creating some fake data
The design of this tutorial is meant to provide an intuitive understanding of the “pipe” operator: ‘%>%’
####Load the tidyverse library
library(tidyverse)
####Create lists to use as variables
var1<-rep(seq(1, 10, 1), 10)
#Create a variable consisting of 10 sequential sequences of 1:10
var2<-rep(seq(0, 1, 1), 50)
#Create a binary 0/1 variable
####Combine these variables into a tibble
dat<-as_tibble(cbind(var1, var2))
# (a 'tibble' is similar to a data frame)
####View the first ten rows.
head(dat, 10)
## # A tibble: 10 x 2
## var1 var2
## <dbl> <dbl>
## 1 1 0
## 2 2 1
## 3 3 0
## 4 4 1
## 5 5 0
## 6 6 1
## 7 7 0
## 8 8 1
## 9 9 0
## 10 10 1
####Alright, we have now created a tibble (saved as “dat”). It consists of 2 columns and 100 rows
##Tidyverse part 1: The pipe operator and the ‘select’ + ‘arrange’ functions
####Let’s say we want var1 to be our ID variable, so that 1-10 each represent a unique individual
####To do this, we can start by changing the name of “var1” to “id”" within the select function
dat<-dat %>% select(id=var1, var2)
#Here we are introducing the pipe ' %>% 'operator
dat
## # A tibble: 100 x 2
## id var2
## <dbl> <dbl>
## 1 1 0
## 2 2 1
## 3 3 0
## 4 4 1
## 5 5 0
## 6 6 1
## 7 7 0
## 8 8 1
## 9 9 0
## 10 10 1
## # … with 90 more rows
####We can also change the name of var2 in a similar manner: let’s name it “group”, which will act as a binary variable
dat<-dat %>% select(id, group=var2)
#Notice that the new name comes first in the expression
dat
## # A tibble: 100 x 2
## id group
## <dbl> <dbl>
## 1 1 0
## 2 2 1
## 3 3 0
## 4 4 1
## 5 5 0
## 6 6 1
## 7 7 0
## 8 8 1
## 9 9 0
## 10 10 1
## # … with 90 more rows
####Ok, now let’s say we want to rearrange our id column so that each individuals rows are ‘stacked’ on top of each other
dat<-dat %>% arrange(id)
#This is the arrange function
dat
## # A tibble: 100 x 2
## id group
## <dbl> <dbl>
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
## 7 1 0
## 8 1 0
## 9 1 0
## 10 1 0
## # … with 90 more rows
##Tidyverse part 2: The ‘mutate’ function and the ‘group_by’ function
####Now lets say we want to create a new variable. Here we will introduce the mutate function
dat_not_grouped<-dat %>% mutate(time = seq(0, 99))
#Here mutate is creating a new variable called "time"
####If you inspect the data in “dat_not_grouped”, you will see that we have created a time variable that goes from 0-99
####What we actually want to do here is create a time variable that goes in order from 0-9 for EACH ID
####In order to do this, we will combine the mutate function with the group_by function
dat_grouped<-dat %>% group_by(id) %>% mutate(time = seq(0, 9))
####Alright, now lets create a continuous random variable called ‘score’, grouped within people
dat<-dat_grouped %>%
group_by(id) %>%
mutate(score = rnorm(10, 0, 1.5)) %>%
mutate(score=round(score, 2))
#You can stack mutate functions vertically for better readability
dat
## # A tibble: 100 x 4
## # Groups: id [10]
## id group time score
## <dbl> <dbl> <int> <dbl>
## 1 1 0 0 0.01
## 2 1 0 1 -0.77
## 3 1 0 2 0.13
## 4 1 0 3 -0.96
## 5 1 0 4 -3.24
## 6 1 0 5 1.07
## 7 1 0 6 1.17
## 8 1 0 7 -0.13
## 9 1 0 8 1.09
## 10 1 0 9 -0.09
## # … with 90 more rows
##Tidyverse part 3: The filter function
####Here, let’s view id == 5. We will do this using the filter function
dat %>% filter(id==5)
## # A tibble: 10 x 4
## # Groups: id [1]
## id group time score
## <dbl> <dbl> <int> <dbl>
## 1 5 0 0 1.63
## 2 5 0 1 -1.35
## 3 5 0 2 -0.07
## 4 5 0 3 -1.39
## 5 5 0 4 0.92
## 6 5 0 5 0.26
## 7 5 0 6 -1.51
## 8 5 0 7 0.96
## 9 5 0 8 -2.6
## 10 5 0 9 1.8
####We can also use filter to subset only the data from “group = 1”
group_1_dat<-dat %>% filter(group==1)
#View the data for only group 1
##Review: ‘select’, ‘arrange’, ‘mutate’, ‘group_by’, ‘filter’
####Above we created a tibble (“dat”) and then applied functions to ‘dat’ using the pipe operator
####For review of these functions, I will create a ‘group_0_dat’ by applying each of these functions at once
dat2<-as_tibble(cbind(var1, var2))
#This is the same as the original dat variable
group_0_dat<-dat2 %>%
select(id=var1, group=var2) %>%
arrange(id) %>%
group_by(id) %>%
mutate(time = seq(0, 9)) %>%
mutate(score = round(rnorm(10, 0, 1.5), 2)) %>%
filter(group==0)
head(group_0_dat)
## # A tibble: 6 x 4
## # Groups: id [1]
## id group time score
## <dbl> <dbl> <int> <dbl>
## 1 1 0 0 0.93
## 2 1 0 1 -1.93
## 3 1 0 2 -2.16
## 4 1 0 3 -0.41
## 5 1 0 4 -0.03
## 6 1 0 5 0.5
##Writing out the tibble “dat”
#####We can now save our “dat” tibble as a csv file using ‘write.csv’
write.csv(dat, file="Tidy1_dat.csv", row.names = FALSE)