Basics of R Programming

BDSI R Training I

Emi Tanaka

Biological Data Science Institute

2nd November 2023

Simple algebraic operations

  • You can use R like a calculator
3 + 2 * (2 - 6 / 3) 
[1] 3
  • Assignments to object:
    • You can assign values to objects using <- or =
    • The name of the object can be variable so long as it is syntactically valid (no spaces and most special characters, and the name cannot start with a digit)
a <- 3
b = 2
2 * a - b 
[1] 4

Vectors

  • We can combine scalars to form vectors using c():
a <- c(1, 2, 3)
a / 2
[1] 0.5 1.0 1.5
  • This is a vector of length 3
length(a)
[1] 3
  • This vector is stored as a double with the class as numeric
typeof(a)
[1] "double"
class(a)
[1] "numeric"

Vectors types

  • There are four primary types of atomic vectors: logical, integer, double and character.
logical_vec <- c(TRUE, FALSE, T, F)
integer_vec <- c(1L, 2L, 3L, 4L)
double_vec <- c(1, 2, 3, 4)
character_vec <- c("A", "B", 'C', 'D')
  • The integer and double vectors are collectively called numeric vectors.
  • A vector can only consist of the same type.
  • If you attempt to combine mismatched types together, it will try to coerce all values to the same type.
c(TRUE, F, "3", 10, 'X')
[1] "TRUE"  "FALSE" "3"     "10"    "X"    

Lists

  • Lists allow to combine elements of different types.
l <- list(c(1, 2, 3), 
          c(2.5, 3.0),
          c(TRUE, FALSE),
          c("a", "b"))
  • You can use str() to see the internal structure of an object in R.
str(l)
List of 4
 $ : num [1:3] 1 2 3
 $ : num [1:2] 2.5 3
 $ : logi [1:2] TRUE FALSE
 $ : chr [1:2] "a" "b"

Data frames

  • data.frame is a special type of a named list where each element of the vector is the same length.
df <- data.frame(grade = c("A", "B", "C"),
                     n = c(10, 14, 30))

df
  grade  n
1     A 10
2     B 14
3     C 30
colnames(df)
[1] "grade" "n"    
nrow(df)
[1] 3
ncol(df)
[1] 2

Subsetting vectors Part 1

  • Positive integers select elements at the specified positions:
x <- c(1.1, 2.2, 3.3, 4.4, 5.5)
x[c(3, 1)]
[1] 3.3 1.1
x[c(2, 2)]
[1] 2.2 2.2
x[c(2.3, 2.8)] # double is silently truncated to integers
[1] 2.2 2.2
  • Negative integers exclude elements at the specified positions:
x[-c(3, 1)]
[1] 2.2 4.4 5.5
x[c(-3, 1)] # you can't mix positive and negative integers
Error in x[c(-3, 1)]: only 0's may be mixed with negative subscripts

Subsetting vectors Part 2

  • Logical vectors select elements where logical value is TRUE.
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
[1] 1.1 3.3 5.5
x[x > 3]
[1] 3.3 4.4 5.5
  • If the logical vector used for subsetting a vector is shorter than it then the logical vector is recycled to match the length of the vector.
x[c(TRUE, FALSE)] # the same as the first one
[1] 1.1 3.3 5.5

Subsetting named vectors

  • Character vectors select elements based on the name of the vector (if any):
y <- c("a" = 1.1, "b" = 2.2, "c" = 3.3, "d" = 4.4, "e" = 5.5)
y
  a   b   c   d   e 
1.1 2.2 3.3 4.4 5.5 
y[c("c", "a", "a", "f")]
   c    a    a <NA> 
 3.3  1.1  1.1   NA 

Subsetting lists

str(l)
List of 4
 $ : num [1:3] 1 2 3
 $ : num [1:2] 2.5 3
 $ : logi [1:2] TRUE FALSE
 $ : chr [1:2] "a" "b"
l[1]
[[1]]
[1] 1 2 3
l[[1]]
[1] 1 2 3
l[c(1, 2)]
[[1]]
[1] 1 2 3

[[2]]
[1] 2.5 3.0
l[[c(1, 2)]] # what's happened here?!
[1] 2

Subsetting named lists

l2 <- list(A = c(1, 2, 3),
           log = c(TRUE, FALSE),
           who = c("Terry", "Jon"))

l2$A
[1] 1 2 3
l2[c("A", "log")]
$A
[1] 1 2 3

$log
[1]  TRUE FALSE
l2[["A"]]
[1] 1 2 3

Subsetting data frames

df
  grade  n
1     A 10
2     B 14
3     C 30
df[1, ]
  grade  n
1     A 10
df[, 1]
[1] "A" "B" "C"
df[, 1, drop = FALSE]
  grade
1     A
2     B
3     C
df$n
[1] 10 14 30
df[["n"]]
[1] 10 14 30

Communicating your problem

🆘 Asking for help 1 Part 1

  • What do you think about the question below?

🆘 Asking for help 1 Part 2

  • What do you think now?

I am looking to adjust the size of two separate ggplots within the same R chunk in Rmarkdown. These plots must be different when outputted as a pdf, so defining the dimensions at the beginning of the chunk doesn’t work. Does anyone have any ideas? My code is below.

```{r, fig.height = 3, fig.width = 3}
ggplot(df, aes(weight, height)) +
  geom_point()

ggplot(df, aes(height, volume)) +
  geom_point()
```

🆘 Asking for help 1 Part 3

  • Is this better?

I am looking to adjust the size of two separate ggplots within the same R chunk in Rmarkdown. These plots must be different when outputted as a pdf, so defining the dimensions at the beginning of the chunk doesn’t work. Does anyone have any ideas? My code is below.

```{r, fig.height = 3, fig.width = 3}
library(ggplot2)
ggplot(df, aes(weight, height)) +
  geom_point()

ggplot(df, aes(height, volume)) +
  geom_point()
```

🆘 Asking for help 1 Part 4

  • Okay better now?

I am looking to adjust the size of two separate ggplots within the same R chunk in Rmarkdown. These plots must be different when outputted as a pdf, so defining the dimensions at the beginning of the chunk doesn’t work. Does anyone have any ideas? My code is below.

```{r, fig.height = 3, fig.width = 3}
library(ggplot2)
df <- read.csv("mydata.csv")
ggplot(df, aes(weight, height)) +
  geom_point()

ggplot(df, aes(height, volume)) +
  geom_point()
```

🆘 Asking for help 1 Part 5

  • Are we done now?

I am looking to adjust the size of two separate ggplots within the same R chunk in Rmarkdown. These plots must be different when outputted as a pdf, so defining the dimensions at the beginning of the chunk doesn’t work. Does anyone have any ideas? My code is below.

```{r, fig.height = 3, fig.width = 3}
library(ggplot2)
ggplot(trees, aes(Girth, Height)) +
  geom_point()

ggplot(trees, aes(Height, Volume)) +
  geom_point()
```

❓ How to ask questions?

Checklist (note: not an exhaustive checklist)

If the question is asked in an public forum or similar:

If the problem is computer system related

If the problem is based on data

🆘 Asking for help 1 Check

🆘 Asking for help 2

  • How about the question on the right?
  • What makes it hard or easy for people to answer this question?

Session Information

You can easily get the session information in R using sessionInfo().
Scroll to see the packages used to make these slides.

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Sydney
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_4.3.1    fastmap_1.1.1     cli_3.6.1         tools_4.3.1      
 [5] htmltools_0.5.6   rstudioapi_0.15.0 yaml_2.3.7        rmarkdown_2.25   
 [9] knitr_1.43        jsonlite_1.8.7    xfun_0.40         digest_0.6.33    
[13] rlang_1.1.1       evaluate_0.21    

🎁 Reproducible Example with reprex LIVE DEMO

  • Copy your minimum reproducible example then run
reprex::reprex(session_info = TRUE)