BDSI R Training I
Biological Data Science Institute
3rd November 2023
dplyr
is a core package in tidyverse
dplyr
(first on CRAN in 2014-01-29) was implemented in plyr
(first on CRAN in 2008-10-08)dplyr
has been evolving frequently but dplyr
v1.0.0 was released on CRAN in 2020-05-29dplyr
are maturing and thus the user interface is unlikely to changeHadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of Data Manipulation. R package version 1.0.2.
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29.
dplyr
“verbs”dplyr
include:
arrange
|
select
|
mutate
|
rename
|
group_by
|
summarise
|
dplyr
generally have the form:verb(data, args)
data
is a data.frame
objectrename(mtcars, miles_per_gallon = mpg)
arrange(mtcars, wt)
Almost all tidyverse packages import the magrittr
package to use %>%
x %>% f(y)
is the same as f(x, y)
x %>% f(y) %>% g(z)
is the same as g(f(x, y), z)
When you see the pipe operator %>%
, read it as “and then”
mtcars %>% # take mtcars data, and then
rename(miles_per_gallon = mpg) %>% # rename mpg as miles_per_gallon, and then
arrange(wt) # arrange row by wt
miles_per_gallon cyl disp hp drat
Lotus Europa 30.4 4 95.1 113 3.77
Honda Civic 30.4 4 75.7 52 4.93
Toyota Corolla 33.9 4 71.1 65 4.22
Fiat X1-9 27.3 4 79.0 66 4.08
Porsche 914-2 26.0 4 120.3 91 4.43
Fiat 128 32.4 4 78.7 66 4.08
Datsun 710 22.8 4 108.0 93 3.85
Toyota Corona 21.5 4 120.1 97 3.70
Mazda RX4 21.0 6 160.0 110 3.90
Ferrari Dino 19.7 6 145.0 175 3.62
Volvo 142E 21.4 4 121.0 109 4.11
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Merc 230 22.8 4 140.8 95 3.92
Ford Pantera L 15.8 8 351.0 264 4.22
Merc 240D 24.4 4 146.7 62 3.69
Hornet 4 Drive 21.4 6 258.0 110 3.08
AMC Javelin 15.2 8 304.0 150 3.15
Hornet Sportabout 18.7 8 360.0 175 3.15
Merc 280 19.2 6 167.6 123 3.92
Merc 280C 17.8 6 167.6 123 3.92
Valiant 18.1 6 225.0 105 2.76
Dodge Challenger 15.5 8 318.0 150 2.76
Duster 360 14.3 8 360.0 245 3.21
Maserati Bora 15.0 8 301.0 335 3.54
Merc 450SL 17.3 8 275.8 180 3.07
Merc 450SLC 15.2 8 275.8 180 3.07
Camaro Z28 13.3 8 350.0 245 3.73
Pontiac Firebird 19.2 8 400.0 175 3.08
Merc 450SE 16.4 8 275.8 180 3.07
Cadillac Fleetwood 10.4 8 472.0 205 2.93
Chrysler Imperial 14.7 8 440.0 230 3.23
Lincoln Continental 10.4 8 460.0 215 3.00
wt qsec vs am gear carb
Lotus Europa 1.513 16.90 1 1 5 2
Honda Civic 1.615 18.52 1 1 4 2
Toyota Corolla 1.835 19.90 1 1 4 1
Fiat X1-9 1.935 18.90 1 1 4 1
Porsche 914-2 2.140 16.70 0 1 5 2
Fiat 128 2.200 19.47 1 1 4 1
Datsun 710 2.320 18.61 1 1 4 1
Toyota Corona 2.465 20.01 1 0 3 1
Mazda RX4 2.620 16.46 0 1 4 4
Ferrari Dino 2.770 15.50 0 1 5 6
Volvo 142E 2.780 18.60 1 1 4 2
Mazda RX4 Wag 2.875 17.02 0 1 4 4
Merc 230 3.150 22.90 1 0 4 2
Ford Pantera L 3.170 14.50 0 1 5 4
Merc 240D 3.190 20.00 1 0 4 2
Hornet 4 Drive 3.215 19.44 1 0 3 1
AMC Javelin 3.435 17.30 0 0 3 2
Hornet Sportabout 3.440 17.02 0 0 3 2
Merc 280 3.440 18.30 1 0 4 4
Merc 280C 3.440 18.90 1 0 4 4
Valiant 3.460 20.22 1 0 3 1
Dodge Challenger 3.520 16.87 0 0 3 2
Duster 360 3.570 15.84 0 0 3 4
Maserati Bora 3.570 14.60 0 1 5 8
Merc 450SL 3.730 17.60 0 0 3 3
Merc 450SLC 3.780 18.00 0 0 3 3
Camaro Z28 3.840 15.41 0 0 3 4
Pontiac Firebird 3.845 17.05 0 0 3 2
Merc 450SE 4.070 17.40 0 0 3 3
Cadillac Fleetwood 5.250 17.98 0 0 3 4
Chrysler Imperial 5.345 17.42 0 0 3 4
Lincoln Continental 5.424 17.82 0 0 3 4
Stefan Milton Bache and Hadley Wickham (2020). magrittr: A Forward-Pipe Operator for R. R package version 2.0.1.
All the same result as below
mpg cyl
Mazda RX4 21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710 22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant 18.1 6
Duster 360 14.3 8
Merc 240D 24.4 4
Merc 230 22.8 4
Merc 280 19.2 6
Merc 280C 17.8 6
Merc 450SE 16.4 8
Merc 450SL 17.3 8
Merc 450SLC 15.2 8
Cadillac Fleetwood 10.4 8
Lincoln Continental 10.4 8
Chrysler Imperial 14.7 8
Fiat 128 32.4 4
Honda Civic 30.4 4
Toyota Corolla 33.9 4
Toyota Corona 21.5 4
Dodge Challenger 15.5 8
AMC Javelin 15.2 8
Camaro Z28 13.3 8
Pontiac Firebird 19.2 8
Fiat X1-9 27.3 4
Porsche 914-2 26.0 4
Lotus Europa 30.4 4
Ford Pantera L 15.8 8
Ferrari Dino 19.7 6
Maserati Bora 15.0 8
Volvo 142E 21.4 4
mpg
Mazda RX4 21.0
Mazda RX4 Wag 21.0
Datsun 710 22.8
Hornet 4 Drive 21.4
Hornet Sportabout 18.7
Valiant 18.1
Duster 360 14.3
Merc 240D 24.4
Merc 230 22.8
Merc 280 19.2
Merc 280C 17.8
Merc 450SE 16.4
Merc 450SL 17.3
Merc 450SLC 15.2
Cadillac Fleetwood 10.4
Lincoln Continental 10.4
Chrysler Imperial 14.7
Fiat 128 32.4
Honda Civic 30.4
Toyota Corolla 33.9
Toyota Corona 21.5
Dodge Challenger 15.5
AMC Javelin 15.2
Camaro Z28 13.3
Pontiac Firebird 19.2
Fiat X1-9 27.3
Porsche 914-2 26.0
Lotus Europa 30.4
Ford Pantera L 15.8
Ferrari Dino 19.7
Maserati Bora 15.0
Volvo 142E 21.4
pull(mpg)
. mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
slice
to subset by index and filter
by logical vectorx %>% f(y, g(.))
is the same as f(x, y, g(x))
x %>% f(y, .)
is the same as f(y, x)
tibble::rownames_to_column()
to convert rownames to a column to make into a tidy data
mtcars %>%
mutate(gpm = 1 / mpg,
wt = gpm^2,
wt = if_else(cyl==6, 10, wt),
hp = case_when(cyl==6 ~ 11,
cyl==4 ~ 10,
TRUE ~ 3))
mpg cyl disp hp drat wt qsec vs am gear carb gpm
Mazda RX4 21.0 6 160.0 11 3.90 1.000000e+01 16.46 0 1 4 4 0.04761905
Mazda RX4 Wag 21.0 6 160.0 11 3.90 1.000000e+01 17.02 0 1 4 4 0.04761905
Datsun 710 22.8 4 108.0 10 3.85 1.923669e-03 18.61 1 1 4 1 0.04385965
Hornet 4 Drive 21.4 6 258.0 11 3.08 1.000000e+01 19.44 1 0 3 1 0.04672897
Hornet Sportabout 18.7 8 360.0 3 3.15 2.859676e-03 17.02 0 0 3 2 0.05347594
Valiant 18.1 6 225.0 11 2.76 1.000000e+01 20.22 1 0 3 1 0.05524862
Duster 360 14.3 8 360.0 3 3.21 4.890215e-03 15.84 0 0 3 4 0.06993007
Merc 240D 24.4 4 146.7 10 3.69 1.679656e-03 20.00 1 0 4 2 0.04098361
Merc 230 22.8 4 140.8 10 3.92 1.923669e-03 22.90 1 0 4 2 0.04385965
Merc 280 19.2 6 167.6 11 3.92 1.000000e+01 18.30 1 0 4 4 0.05208333
Merc 280C 17.8 6 167.6 11 3.92 1.000000e+01 18.90 1 0 4 4 0.05617978
Merc 450SE 16.4 8 275.8 3 3.07 3.718025e-03 17.40 0 0 3 3 0.06097561
Merc 450SL 17.3 8 275.8 3 3.07 3.341241e-03 17.60 0 0 3 3 0.05780347
Merc 450SLC 15.2 8 275.8 3 3.07 4.328255e-03 18.00 0 0 3 3 0.06578947
Cadillac Fleetwood 10.4 8 472.0 3 2.93 9.245562e-03 17.98 0 0 3 4 0.09615385
Lincoln Continental 10.4 8 460.0 3 3.00 9.245562e-03 17.82 0 0 3 4 0.09615385
Chrysler Imperial 14.7 8 440.0 3 3.23 4.627701e-03 17.42 0 0 3 4 0.06802721
Fiat 128 32.4 4 78.7 10 4.08 9.525987e-04 19.47 1 1 4 1 0.03086420
Honda Civic 30.4 4 75.7 10 4.93 1.082064e-03 18.52 1 1 4 2 0.03289474
Toyota Corolla 33.9 4 71.1 10 4.22 8.701630e-04 19.90 1 1 4 1 0.02949853
Toyota Corona 21.5 4 120.1 10 3.70 2.163332e-03 20.01 1 0 3 1 0.04651163
Dodge Challenger 15.5 8 318.0 3 2.76 4.162331e-03 16.87 0 0 3 2 0.06451613
AMC Javelin 15.2 8 304.0 3 3.15 4.328255e-03 17.30 0 0 3 2 0.06578947
Camaro Z28 13.3 8 350.0 3 3.73 5.653231e-03 15.41 0 0 3 4 0.07518797
Pontiac Firebird 19.2 8 400.0 3 3.08 2.712674e-03 17.05 0 0 3 2 0.05208333
Fiat X1-9 27.3 4 79.0 10 4.08 1.341760e-03 18.90 1 1 4 1 0.03663004
Porsche 914-2 26.0 4 120.3 10 4.43 1.479290e-03 16.70 0 1 5 2 0.03846154
Lotus Europa 30.4 4 95.1 10 3.77 1.082064e-03 16.90 1 1 5 2 0.03289474
Ford Pantera L 15.8 8 351.0 3 4.22 4.005768e-03 14.50 0 1 5 4 0.06329114
Ferrari Dino 19.7 6 145.0 11 3.62 1.000000e+01 15.50 0 1 5 6 0.05076142
Maserati Bora 15.0 8 301.0 3 3.54 4.444444e-03 14.60 0 1 5 8 0.06666667
Volvo 142E 21.4 4 121.0 10 4.11 2.183597e-03 18.60 1 1 4 2 0.04672897
mutate
is done sequentially based on input order
mutate
but computational performance is usually better if done within the same mutate
call
mtcars %>%
mutate(gpm = 1 / mpg) %>%
mutate(wt = gpm^2)
am carb cyl disp drat gear hp mpg qsec vs wt
Mazda RX4 1 4 6 160.0 3.90 4 110 21.0 16.46 0 2.620
Mazda RX4 Wag 1 4 6 160.0 3.90 4 110 21.0 17.02 0 2.875
Datsun 710 1 1 4 108.0 3.85 4 93 22.8 18.61 1 2.320
Hornet 4 Drive 0 1 6 258.0 3.08 3 110 21.4 19.44 1 3.215
Hornet Sportabout 0 2 8 360.0 3.15 3 175 18.7 17.02 0 3.440
Valiant 0 1 6 225.0 2.76 3 105 18.1 20.22 1 3.460
Duster 360 0 4 8 360.0 3.21 3 245 14.3 15.84 0 3.570
Merc 240D 0 2 4 146.7 3.69 4 62 24.4 20.00 1 3.190
Merc 230 0 2 4 140.8 3.92 4 95 22.8 22.90 1 3.150
Merc 280 0 4 6 167.6 3.92 4 123 19.2 18.30 1 3.440
Merc 280C 0 4 6 167.6 3.92 4 123 17.8 18.90 1 3.440
Merc 450SE 0 3 8 275.8 3.07 3 180 16.4 17.40 0 4.070
Merc 450SL 0 3 8 275.8 3.07 3 180 17.3 17.60 0 3.730
Merc 450SLC 0 3 8 275.8 3.07 3 180 15.2 18.00 0 3.780
Cadillac Fleetwood 0 4 8 472.0 2.93 3 205 10.4 17.98 0 5.250
Lincoln Continental 0 4 8 460.0 3.00 3 215 10.4 17.82 0 5.424
Chrysler Imperial 0 4 8 440.0 3.23 3 230 14.7 17.42 0 5.345
Fiat 128 1 1 4 78.7 4.08 4 66 32.4 19.47 1 2.200
Honda Civic 1 2 4 75.7 4.93 4 52 30.4 18.52 1 1.615
Toyota Corolla 1 1 4 71.1 4.22 4 65 33.9 19.90 1 1.835
Toyota Corona 0 1 4 120.1 3.70 3 97 21.5 20.01 1 2.465
Dodge Challenger 0 2 8 318.0 2.76 3 150 15.5 16.87 0 3.520
AMC Javelin 0 2 8 304.0 3.15 3 150 15.2 17.30 0 3.435
Camaro Z28 0 4 8 350.0 3.73 3 245 13.3 15.41 0 3.840
Pontiac Firebird 0 2 8 400.0 3.08 3 175 19.2 17.05 0 3.845
Fiat X1-9 1 1 4 79.0 4.08 4 66 27.3 18.90 1 1.935
Porsche 914-2 1 2 4 120.3 4.43 5 91 26.0 16.70 0 2.140
Lotus Europa 1 2 4 95.1 3.77 5 113 30.4 16.90 1 1.513
Ford Pantera L 1 4 8 351.0 4.22 5 264 15.8 14.50 0 3.170
Ferrari Dino 1 6 6 145.0 3.62 5 175 19.7 15.50 0 2.770
Maserati Bora 1 8 8 301.0 3.54 5 335 15.0 14.60 0 3.570
Volvo 142E 1 2 4 121.0 4.11 4 109 21.4 18.60 1 2.780
mpg wt gear cyl disp hp drat qsec vs am carb
Mazda RX4 21.0 2.620 4 6 160.0 110 3.90 16.46 0 1 4
Mazda RX4 Wag 21.0 2.875 4 6 160.0 110 3.90 17.02 0 1 4
Datsun 710 22.8 2.320 4 4 108.0 93 3.85 18.61 1 1 1
Hornet 4 Drive 21.4 3.215 3 6 258.0 110 3.08 19.44 1 0 1
Hornet Sportabout 18.7 3.440 3 8 360.0 175 3.15 17.02 0 0 2
Valiant 18.1 3.460 3 6 225.0 105 2.76 20.22 1 0 1
Duster 360 14.3 3.570 3 8 360.0 245 3.21 15.84 0 0 4
Merc 240D 24.4 3.190 4 4 146.7 62 3.69 20.00 1 0 2
Merc 230 22.8 3.150 4 4 140.8 95 3.92 22.90 1 0 2
Merc 280 19.2 3.440 4 6 167.6 123 3.92 18.30 1 0 4
Merc 280C 17.8 3.440 4 6 167.6 123 3.92 18.90 1 0 4
Merc 450SE 16.4 4.070 3 8 275.8 180 3.07 17.40 0 0 3
Merc 450SL 17.3 3.730 3 8 275.8 180 3.07 17.60 0 0 3
Merc 450SLC 15.2 3.780 3 8 275.8 180 3.07 18.00 0 0 3
Cadillac Fleetwood 10.4 5.250 3 8 472.0 205 2.93 17.98 0 0 4
Lincoln Continental 10.4 5.424 3 8 460.0 215 3.00 17.82 0 0 4
Chrysler Imperial 14.7 5.345 3 8 440.0 230 3.23 17.42 0 0 4
Fiat 128 32.4 2.200 4 4 78.7 66 4.08 19.47 1 1 1
Honda Civic 30.4 1.615 4 4 75.7 52 4.93 18.52 1 1 2
Toyota Corolla 33.9 1.835 4 4 71.1 65 4.22 19.90 1 1 1
Toyota Corona 21.5 2.465 3 4 120.1 97 3.70 20.01 1 0 1
Dodge Challenger 15.5 3.520 3 8 318.0 150 2.76 16.87 0 0 2
AMC Javelin 15.2 3.435 3 8 304.0 150 3.15 17.30 0 0 2
Camaro Z28 13.3 3.840 3 8 350.0 245 3.73 15.41 0 0 4
Pontiac Firebird 19.2 3.845 3 8 400.0 175 3.08 17.05 0 0 2
Fiat X1-9 27.3 1.935 4 4 79.0 66 4.08 18.90 1 1 1
Porsche 914-2 26.0 2.140 5 4 120.3 91 4.43 16.70 0 1 2
Lotus Europa 30.4 1.513 5 4 95.1 113 3.77 16.90 1 1 2
Ford Pantera L 15.8 3.170 5 8 351.0 264 4.22 14.50 0 1 4
Ferrari Dino 19.7 2.770 5 6 145.0 175 3.62 15.50 0 1 6
Maserati Bora 15.0 3.570 5 8 301.0 335 3.54 14.60 0 1 8
Volvo 142E 21.4 2.780 4 4 121.0 109 4.11 18.60 1 1 2
🎯 Calculate the average weight (wt
) of a car for each gear type in (gear
) mtcars
🎯 Calculate the median weight (wt
) of a car for each gear (gear
) and engine (vs
) type in mtcars
across
across
, you can more easily apply a function to multiple columns# A tibble: 6 × 11
# Groups: gear [3]
gear vs mpg cyl disp hp drat wt qsec am carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 0 15.0 8 358. 194. 3.12 4.10 17.1 0 3.08
2 3 1 20.3 5.33 201. 104 3.18 3.05 19.9 0 1
3 4 0 21 6 160 110 3.9 2.75 16.7 1 4
4 4 1 25.2 4.4 116. 85.4 4.07 2.59 19.4 0.6 2
5 5 0 19.1 6.5 229. 216. 3.95 2.91 15.3 1 5
6 5 1 30.4 4 95.1 113 3.77 1.51 16.9 1 2
where
across
with the selection helper where
# A tibble: 6 × 8
# Groups: gear [3]
gear vs mpg disp hp drat wt qsec
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 0 15.0 358. 194. 3.12 4.10 17.1
2 3 1 20.3 201. 104 3.18 3.05 19.9
3 4 0 21 160 110 3.9 2.75 16.7
4 4 1 25.2 116. 85.4 4.07 2.59 19.4
5 5 0 19.1 229. 216. 3.95 2.91 15.3
6 5 1 30.4 95.1 113 3.77 1.51 16.9
c_across
mtcars %>%
rowwise() %>%
summarise(disp = disp, hp = hp, drat = drat, wt = wt,
score = sum(c_across(disp:wt)))
# A tibble: 32 × 5
disp hp drat wt score
<dbl> <dbl> <dbl> <dbl> <dbl>
1 160 110 3.9 2.62 277.
2 160 110 3.9 2.88 277.
3 108 93 3.85 2.32 207.
4 258 110 3.08 3.22 374.
5 360 175 3.15 3.44 542.
6 225 105 2.76 3.46 336.
7 360 245 3.21 3.57 612.
8 147. 62 3.69 3.19 216.
9 141. 95 3.92 3.15 243.
10 168. 123 3.92 3.44 298.
# ℹ 22 more rows
rowwise
?rowwise
?mtcars %>%
#rowwise() %>%
summarise(disp = disp, hp = hp, drat = drat, wt = wt,
score = sum(c_across(disp:wt)))
disp hp drat wt score
1 160.0 110 3.90 2.620 12295.14
2 160.0 110 3.90 2.875 12295.14
3 108.0 93 3.85 2.320 12295.14
4 258.0 110 3.08 3.215 12295.14
5 360.0 175 3.15 3.440 12295.14
6 225.0 105 2.76 3.460 12295.14
7 360.0 245 3.21 3.570 12295.14
8 146.7 62 3.69 3.190 12295.14
9 140.8 95 3.92 3.150 12295.14
10 167.6 123 3.92 3.440 12295.14
11 167.6 123 3.92 3.440 12295.14
12 275.8 180 3.07 4.070 12295.14
13 275.8 180 3.07 3.730 12295.14
14 275.8 180 3.07 3.780 12295.14
15 472.0 205 2.93 5.250 12295.14
16 460.0 215 3.00 5.424 12295.14
17 440.0 230 3.23 5.345 12295.14
18 78.7 66 4.08 2.200 12295.14
19 75.7 52 4.93 1.615 12295.14
20 71.1 65 4.22 1.835 12295.14
21 120.1 97 3.70 2.465 12295.14
22 318.0 150 2.76 3.520 12295.14
23 304.0 150 3.15 3.435 12295.14
24 350.0 245 3.73 3.840 12295.14
25 400.0 175 3.08 3.845 12295.14
26 79.0 66 4.08 1.935 12295.14
27 120.3 91 4.43 2.140 12295.14
28 95.1 113 3.77 1.513 12295.14
29 351.0 264 4.22 3.170 12295.14
30 145.0 175 3.62 2.770 12295.14
31 301.0 335 3.54 3.570 12295.14
32 121.0 109 4.11 2.780 12295.14
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
R employs what is called lazy evaluation for function inputs
Non-standard evaluation uses this feature to capture the input expression within the function and evaluate only when requested
rlang
rlang
filter
, the Tidyverse version of subset
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
filter
, replace it with dplyr::filter
dplyr::filter
is a conflict with stats::filter
and it may be using stats::filter
instead… I’ve fallen into this trap so many times!Lionel Henry and Hadley Wickham (2020). rlang: Functions for Base Types and Core R and ‘Tidyverse’ Features. R package version 0.4.8.
mtcars
for a given condition speed dist
1 25 85
speed dist
1 15 26
2 15 54
3 16 32
4 16 40
5 17 32
6 17 40
7 17 50
8 18 42
9 18 56
10 18 76
11 18 84
12 19 36
13 19 46
14 19 68
15 20 32
16 20 48
17 20 52
18 20 56
19 20 64
20 22 66
21 23 54
22 24 70
23 24 92
24 24 93
25 24 120
26 25 85
.data
is a special pronoun referring to variables in the data parsed in the first argument.env
is a special pronoun referring to variables in the environment (i.e. not in the data parsed in the first argument)tidyselect
package for variable/column selectionimmunity <- readr::read_csv("https://github.com/anu-bdsi/training-R1/raw/main/data/marcato-2021-immunity.csv")
str(immunity)
spc_tbl_ [1,024 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Timepoint : chr [1:1024] "CC" "CC" "CC" "CC" ...
$ Batch : num [1:1024] 1 1 1 1 1 1 1 1 1 1 ...
$ Animal : num [1:1024] 1 2 3 4 5 6 7 8 9 10 ...
$ Eartag : num [1:1024] 48454 4796 85085 884 59787 ...
$ Uplo : chr [1:1024] "up" "up" "low" "low" ...
$ Bafr : chr [1:1024] "back" "back" "back" "back" ...
$ Pen : num [1:1024] 4 4 6 6 7 7 2 2 10 10 ...
$ Corridor : num [1:1024] 1 1 1 1 1 1 1 1 2 2 ...
$ Diet : chr [1:1024] "E" "E" "M" "M" ...
$ Duration : num [1:1024] 6 6 6 6 6 6 6 6 6 6 ...
$ Type : chr [1:1024] "U" "U" "C" "C" ...
$ Bilirubin_1 : chr [1:1024] "12.95" "18.51" "26.11" "9.84" ...
$ Cortisol_1 : chr [1:1024] "7.58" "16.55" "17.49" "12.37" ...
$ Haptoglobin : chr [1:1024] "0.31" "0.28" "0.18" "0.55" ...
$ Titer_IgG : chr [1:1024] "4.7" "3.9" "4.4" "4.9" ...
$ Titer_IgM : chr [1:1024] "6.6" "4.0" "5.3" "3.8" ...
$ wbc : num [1:1024] 6.79 13.31 11.68 5.29 6.87 ...
$ neutro1 : num [1:1024] 2.77 6.02 4.17 1.6 2.22 4.94 3 8.65 3.81 6.93 ...
$ lympho1 : num [1:1024] 2.69 4.75 3.8 2.33 3.58 1.79 4.65 5.55 4.87 3.77 ...
$ mono1 : num [1:1024] 1.15 1.87 2.2 1.04 0.9 1.1 1.59 2.07 1.48 1.09 ...
$ eo1 : num [1:1024] 0.07 0.16 0.46 0.14 0.08 0.73 0.41 0.51 0.06 0.15 ...
$ baso1 : num [1:1024] 0.11 0.52 1.06 0.18 0.1 0.4 0.14 0.17 0.23 0.19 ...
$ neutro2 : num [1:1024] 40.8 45.3 35.6 30.2 32.3 ...
$ lympho2 : num [1:1024] 39.6 35.6 32.5 44 52.1 ...
$ mono2 : num [1:1024] 16.9 14.1 18.8 19.8 13.1 ...
$ eo2 : num [1:1024] 1.05 1.15 3.95 2.6 1.15 8.15 4.15 3.05 0.55 1.2 ...
$ baso2 : num [1:1024] 1.55 3.85 9.1 3.5 1.4 4.5 1.4 1 2.2 1.5 ...
$ CD8+ : num [1:1024] 7.05 5.73 10.7 5.15 8.08 12.1 15.5 5.19 14.4 11.5 ...
$ CD8+ perf+ : num [1:1024] 0.67 1.31 1.41 2.08 0.78 1.24 0 0.34 0.86 0.68 ...
$ NK+ : num [1:1024] 4.52 6.52 13.3 3.16 14.4 13.8 8.3 5.22 8.28 12.3 ...
$ NK+ perf+ : num [1:1024] 1.64 2.86 1.3 1.43 2.17 0.94 0.82 1.39 1.66 2.83 ...
$ CD4+ : num [1:1024] 17.9 11.2 0.1 0.09 8.47 22.5 14.9 8.19 24.9 11.5 ...
$ CD4+ perf+ : num [1:1024] 0.02 0.25 4.17 11.8 0.19 0.12 0 0.06 0.49 0.12 ...
$ gamma delta+ : num [1:1024] 41.6 48.9 23.5 55.9 36.9 18.9 3.17 63.5 0.11 37 ...
$ gamma delta+ perf+: num [1:1024] 0.06 0.33 0.12 0.23 0.07 0.19 0 0.12 0 0.09 ...
$ CD14+ large : num [1:1024] 35.2 34.2 44.7 24.1 19.1 61.1 3.93 43.7 28.3 25.1 ...
$ CD21+ large : num [1:1024] 8.04 12.5 12.8 6.85 16.3 4.3 29.1 3.4 11.5 6.92 ...
- attr(*, "spec")=
.. cols(
.. Timepoint = col_character(),
.. Batch = col_double(),
.. Animal = col_double(),
.. Eartag = col_double(),
.. Uplo = col_character(),
.. Bafr = col_character(),
.. Pen = col_double(),
.. Corridor = col_double(),
.. Diet = col_character(),
.. Duration = col_double(),
.. Type = col_character(),
.. Bilirubin_1 = col_character(),
.. Cortisol_1 = col_character(),
.. Haptoglobin = col_character(),
.. Titer_IgG = col_character(),
.. Titer_IgM = col_character(),
.. wbc = col_double(),
.. neutro1 = col_double(),
.. lympho1 = col_double(),
.. mono1 = col_double(),
.. eo1 = col_double(),
.. baso1 = col_double(),
.. neutro2 = col_double(),
.. lympho2 = col_double(),
.. mono2 = col_double(),
.. eo2 = col_double(),
.. baso2 = col_double(),
.. `CD8+` = col_double(),
.. `CD8+ perf+` = col_double(),
.. `NK+` = col_double(),
.. `NK+ perf+` = col_double(),
.. `CD4+` = col_double(),
.. `CD4+ perf+` = col_double(),
.. `gamma delta+` = col_double(),
.. `gamma delta+ perf+` = col_double(),
.. `CD14+ large` = col_double(),
.. `CD21+ large` = col_double()
.. )
- attr(*, "problems")=<externalptr>
tidyselect
syntax :
can be used to select contiguous columns in the data.# A tibble: 1,024 × 4
`CD8+ perf+` `NK+ perf+` `CD4+ perf+` `gamma delta+ perf+`
<dbl> <dbl> <dbl> <dbl>
1 0.67 1.64 0.02 0.06
2 1.31 2.86 0.25 0.33
3 1.41 1.3 4.17 0.12
4 2.08 1.43 11.8 0.23
5 0.78 2.17 0.19 0.07
6 1.24 0.94 0.12 0.19
7 0 0.82 0 0
8 0.34 1.39 0.06 0.12
9 0.86 1.66 0.49 0
10 0.68 2.83 0.12 0.09
# ℹ 1,014 more rows
:
for selecting contiguous variables!
for taking complement set of variables&
or |
for selecting intersection or union of two sets of variables, e.g. tibble [1,024 × 1] (S3: tbl_df/tbl/data.frame)
$ Titer_IgG: chr [1:1024] "4.7" "3.9" "4.4" "4.9" ...
c()
for combining selectionseverything()
to select all variableslast_col()
to select last variable, with option of an offsetstarts_with()
selects columns with the given prefixends_with()
selects columns with the given suffixcontains()
selects columns with a literal stringmatches()
selects columns that match the regular expressionnum_range()
selects columns with a numerical rangeall_of()
for selecting columns based on a character vectorany_of()
is the same as all_of()
but no error when variables do not existwhere()
selects based on where given function return TRUEanu-bdsi.github.io/training-R1/