We already learned that everything in R that exists is an
*object*. You most likely noted that there are different types of
objects: `2`

, for instance, was a number, but
`assign`

was a function.^{1} As you might have guessed, there are many
more types of objects. To understand the fundamental object types in R
is an essential prerequisite to master more complicated programming
challenges than those we have encountered so far. Thus, this post is
among those that will introduce you to the most important object types
that you will encounter in R.

These data types are summarized in the following figure:

This post will be about the most common types of vectors. See the previous post for a treatment of
functions, and the upcoming one
for more advanced types of vectors, such as `factor`

,
`matrix`

, and `data.frame`

.

Vectors are the most important object type in R - almost all data
that we will work with in R are vectors of some sort. Within the class
of vectors, the most important distinction is that between
**atomic vectors** and **lists**, which are
sometimes also called *generic vectors*.^{2} Both atomic vectors
and lists consist of one or more other objects. What distinguishes the
two is that while *atomic vectors are composed only of objects of the
same type*, lists can comprise objects of different types.

This makes it easy to classify atomic vectors in more detail: we usually say that the type of atomic vector is the type of the object it encompasses. Four major types of atomic vectors in this sense exist:

`logical`

(logical values): there are only two relevant logical values:`TRUE`

und`FALSE`

^{3}`integer`

(whole numbers): this type should be self-explanatory. Less intuitive is the rule that in order to define an integer in R you need to type the number followed by the letter`L`

such that`R`

interprets the number as an integer.^{4}Examples are`1L`

,`-400L`

or`10L`

.

`double`

(decimal numbers): these should be self-explanatory as well. Examples are`1.5`

,`0.0`

, or`-500.32`

.- Whole and decimal numbers are often summarized in the category
`numeric`

. However, the use of`numeric`

is almost always confusing, and many functions show unintuitive behavior when this category is used. I recommend you to never use it. `character`

(words): these can contain all kinds of tokens and are characterized by the fact that they always start and end with`"`

. Examples would be`"Hello"`

,`"500"`

oder`"1_2_Three"`

.

As indicated above, an atomic vector only comprises elements of the
same type. In this context, we should mention, however, the at first
sight ‘strange’ data type `NA`

, which denotes a missing
value:^{5}
whenever an element of a vector is missing, e.g. when the vector is used
to store observations of subjects, and for some subjects the observation
is missing, we will use `NA`

.^{6}

In the following we will study the different types of atomic vectors
and their typical behavior in more detail. But before doing so we should
introduce the function `typeof()`

: it helps us to identify
the type of an object in the first place.. To see how lets call the
function with the object (or the name of the object) we are interested
about:

`typeof(2L)`

`## [1] "integer"`

```
<- 22.0
x typeof(x)
```

`## [1] "double"`

There is also a family of functions that allows us to test whether an
object is actual of a certain type or not. The general syntax here is
`is.*()`

. For instance:

```
<- 1.0
x is.integer(x)
```

`## [1] FALSE`

`is.double(x)`

`## [1] TRUE`

This function always returns an object of type
`logical`

:

```
<- is.double(x)
y typeof(y)
```

`## [1] "logical"`

We can also try to transform objects from one type into another. We
call this process ‘coercion’ an the general syntax is
`as.*()*`

. For instance:

```
<- "2"
x print(
typeof(x)
)
```

`## [1] "character"`

```
<- as.double(x)
x print(
typeof(x)
)
```

`## [1] "double"`

Such a transformation is, however, not always possible:

`as.double("Hello")`

`## Warning: NAs introduced by coercion`

`## [1] NA`

Since R does not know how to turn the word ‘Hello’ into a decimal
number, it transforms it into a ‘missing value’ - `NA`

.

For the basic types discussed above there is a logical hierarchy of
feasible transformations: `logical`

→ `integer`

→
`double`

→ `character`

, meaning that you can
always transform a decimal number into a word, but not vice versa.

Transgression: Why change the types of objects anyway?Data types are extremely important for a programming language because otherwise it would remain unclear how mathematical operations could be applied to different objects such as numbers or words. You will transform objects yourself especially if you want to use a certain operation that is only defined for a certain type of object, and the object you are dealing with has been stored as a different type. This can happen, for example, when you read in data or translate words into numerical values yourself. If unexpected errors occur in your code with cryptic error messages, it is always a good idea to check the types of the objects used and transform them if necessary.

```
<- 2
x <- as.character(x)
y print(y)
```

`## [1] "2"`

```
<- as.double(y) # This works
z print(z)
```

`## [1] 2`

`<- as.double("Hallo") # This does not work k `

`## Warning: NAs introduced by coercion`

`print(k)`

`## [1] NA`

When transforming logical values, `TRUE`

counts as
`1`

and `FALSE`

as `0`

, a fact that
will come in handy later on:

```
<- TRUE
x as.integer(x)
```

`## [1] 1`

```
<- FALSE
y as.integer(y)
```

`## [1] 0`

Since it is not always clear when R issues a warning for transformations that are incompatible with the hierarchy just introduced and when it does not, you should always be cautious!

Moreover, transformations might change the properties of the transformed objects implicitly in unexpected ways. For instance, a transformation from a decimal number to a whole number can lead to unexpected rounding behavior:

```
<- 1.99
x as.integer(x)
```

`## [1] 1`

Another example is the following:

```
<- as.logical(99)
z print(z)
```

`## [1] TRUE`

Such implicit changes of the object properties do not necessary come with a warning message, so one should always be careful when transforming objects!

In many cases, function do the necessary transformations of their arguments automatically. In most cases this is very practical:

```
<- 1L # Integer
x <- 2.0 # Double
y <- x + y
z typeof(z)
```

`## [1] "double"`

But it can be dangerous in some cases as well. When adding up logical values they are transformed to numbers:

```
<- TRUE
x <- FALSE
y <- x + y # TRUE wird zu 1, FALSE zu 0
z print(z)
```

`## [1] 1`

This is useful if you want to know, for instance, how many elements of a vector meet a certain logical criterion:

```
<- c(1,2,3,4,5)
x sum(x > 3)
```

`## [1] 2`

In all these cases it is very important to stay informed about the types of objects you are dealing with. To help you out, the following table contains an overview over the most important transformation and test functions:

Type | Test | Transformation |
---|---|---|

logical | `is.logical` |
`as.logical` |

double | `is.double` |
`as.double` |

integer | `is.integer` |
`as.integer` |

character | `is.character` |
`as.character` |

function | `is.function` |
`as.function` |

NA | `is.na` |
NA |

NULL | `is.null` |
`as.null` |

A final remark on **scalars**: with scalar we usually
refer to ‘single numbers’, such as `2`

. There is no such
concept in R: `2`

is a vector with one elements (or: of
length 1). Thus, we do not distinguish the type of a vector with or more
than one elements.

**Note:** As you might have guessed already, we use the
function `c()`

to create longer vectors:

```
<- c(1, 2, 3)
x x
```

`## [1] 1 2 3`

We can also use this function to concatenate vectors:

```
<- 1:3 # Shortcut für: x <- c(1, 2, 3)
x <- 4:6
y <- c(x, y)
z z
```

`## [1] 1 2 3 4 5 6`

Since atomic vectos can only contain objects of the same type, one might expect the following code, which tries to concatenate objects of different types, to produce an error:

`<- c(1, "Hallo") x `

But this is not what happens! R transforms the objects according to
the hierarchy discussed above: `logical`

→
`integer`

→ `double`

→ `character`

. Due
to the absence of errors or warning messagse, such operations are often
an important source for mistakes.

**Note:** The length of a vector corresponds to its
numbers of elements. We can ‘measure’ its length using the function
`length()`

:

```
= c(1, 2, 3)
x <- length(x)
len_x len_x
```

`## [1] 3`

Transgression: How large can an integer become?In R, objects of type`integer`

are stored as 32-bit files. This means that for each single`integer`

, 32 bits of storage are available on your computer. This implies that really large numbers cannot be stored as integers, simply because the 32 bits are not sufficient:

```
<- 2147483647L
x typeof(x)
```

`## [1] "integer"`

```
<- 2147483648L
y typeof(y)
```

`## [1] "double"`

As you can see, the largest number that we can store as 32-bit integer is 2147483647. Larger numbers must be stored as

`double`

. The drawback of saving numbers in this type is, however, the risk of a loss of precision. If you want to avoid this you could try to save an integer as a 64 bit integer. This possibility has been added to R later to save large numbers as integers (something that happens faster than you think). To do so we must use the package^{7}`bit64`

:

```
<- bit64::as.integer64(2147483648)
z ::is.integer64(z) bit64
```

`## [1] TRUE`

Because this is a data type that has been added to R later, several functions do not work with 64-bit integers if the package

`bit64`

is not installed. Moreover, several standard functions return very irritating outputs:

`typeof(z)`

`## [1] "double"`

For this reason, and because

`bit64`

is not part of the standard installation of R you should avoid storing large numbers as`integer64`

whenever possible. Very large numbers should be stored as`double`

or, when precision is a serious issue, you should scale them down and then save them as`integer`

.

The logical values `TRUE`

and `FALSE`

are often
the result of logical operations, such as ‘Is 2 larger than 1?’. Such
logical operations occur very frequently and its a good idea to
familiarize yourself with the logical operators. You can find an
overview in the following table.

Operator | Function in R | Example |
---|---|---|

larger | `>` |
`2>1` |

smaller | `<` |
`2<4` |

equal | `==` |
`4==3` |

larger or equal | `>=` |
`8>=8` |

smaller or equal | `<=` |
`5<=9` |

not equal | `!=` |
`4!=5` |

and | `&` |
`x<90 & x>55` |

or | `|` |
`x<90 | x>55` |

either or | `xor()` |
`xor(2<1, 2>1)` |

not | `!` |
`!(x==2)` |

is true | `isTRUE()` |
`isTRUE(1>2)` |

The result of such logical operations is always a logical value:

```
<- 4
x <- x == 8
y typeof(y)
```

`## [1] "logical"`

You may also test longer vectors:

```
<- 1:3
x <2 x
```

`## [1] TRUE FALSE FALSE`

Tests can also be chained:

```
<- 1L
x >2 | x<2 & (is.double(x) & x!=0) x
```

`## [1] FALSE`

Since many mathematical operations interpret `TRUE`

as
`1`

, it is easy to check how often a certain condition is
met:

```
<- 1:50
x <- x<20
smaller_20 print(
sum(smaller_20) # How many elements are smaller then 20?
)
```

`## [1] 19`

```
print(
sum(smaller_20/length(x)) # Whats the share of these elements?
)
```

`## [1] 0.38`

The chained operation we just saw is an example for
*vectorizing* an operation. This means that the same operation is
applied to many elements, all of which are concatenated as a vector. For
instance, if you want to compute the square root of the numbers 5, 6 and
7 you could do:

`sqrt(5)`

`## [1] 2.236068`

`sqrt(6)`

`## [1] 2.44949`

`sqrt(7)`

`## [1] 2.645751`

Or you vectorize the operation:

`sqrt(c(5,6,7))`

`## [1] 2.236068 2.449490 2.645751`

Vectorizing operations is very useful since it speeds up the
computations considerably. Vectorized operations are *far* more
efficient and faster than applying the operation to each element of the
vector separately. Thus, whenever you need to apply a certain operation
more than once you should always think about using vectorization.^{8}

Words are distinguished by the fact that their beginning and their
end gets indicated by the symbol `'`

or `"`

:

```
<- "Hello"
x typeof(x)
```

`## [1] "character"`

```
<- 'Bye!'
y typeof(y)
```

`## [1] "character"`

Just as other kinds of atomic vectors, they can by concatenated using
`c()`

:

```
<- c(x, "und", y)
z z
```

`## [1] "Hello" "und" "Bye!"`

A useful function in this context is `paste()`

, which
transforms and combines elements of several vectors:

```
<- 1:10
x <- paste("Try nb.", x)
y y
```

```
## [1] "Try nb. 1" "Try nb. 2" "Try nb. 3" "Try nb. 4" "Try nb. 5"
## [6] "Try nb. 6" "Try nb. 7" "Try nb. 8" "Try nb. 9" "Try nb. 10"
```

The function `paste()`

also accepts an optional argument
`sep`

, which allows us to specify a token that should be
placed between the elements to be combined (the default is
`sep=" "`

):

```
<- 1:10
day_nr <- paste("Day", day_nr, sep = ": ")
x_axis x_axis
```

```
## [1] "Day: 1" "Day: 2" "Day: 3" "Day: 4" "Day: 5" "Day: 6" "Day: 7"
## [8] "Day: 8" "Day: 9" "Day: 10"
```

*Note**: Here we have an example of what is called ‘recycling’. since the vector

`c("Day")`

was shorter than the vector`day_nr`

,`c("Day")`

is simply copied so that the operation with`paste()`

makes sense. Recycling is useful, but sometimes it can be harmful, namely when you think that you are using two vectors of the same length, but this is actually not the case. In such a case recycling leads to the fact that no error message is printed and the fact that the two vectors are not of the same length remains concealed. An example of this is the following code, in which the intention is clearly to connect all weekdays to numbers and one weekday was simply forgotten:

```
<- paste("Tag ", 1:7, ":", sep="")
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
day_names paste(days, day_names)
```

```
## [1] "Tag 1: Monday" "Tag 2: Tuesday" "Tag 3: Wednesday" "Tag 4: Thursday"
## [5] "Tag 5: Friday" "Tag 6: Saturday" "Tag 7: Monday"
```

As indicated above, missing values are encoded as `NA`

.
This is particularly useful in statistical context, where are particular
element of a vector cannot simply be removed if it is unavailable.

Example:The vector`x`

contains a logical value that indicates whether a person has correctly answered the question on a questionnaire. If the person did not answer the third question on the questionnaire, this should be indicated by`NA`

. Simply omitting the value makes it impossible to determine afterwardswhichquestion the person did not answer.

Most operations that get `NA`

as an input will also give
`NA`

as an output, because it is unclear what the result of
the operation would be for different values for for the missing
value:

`5 + NA`

`## [1] NA`

The only exception is an operation that yields a certain value
completely independent from what you would substitute for
`NA`

:

`NA | TRUE # Always TRUE, no matter what you substitute for NA`

`## [1] TRUE`

To test whether a vector `x`

contains missing values you
should always use the function `is.na`

, never
`x==NA`

:

```
<- c(NA, 5, NA, 10)
x print(x == NA) # Unclear since not clear whether all NA must stand for the same value
```

`## [1] NA NA NA NA`

```
print(
is.na(x)
)
```

`## [1] TRUE FALSE TRUE FALSE`

Whenever an operation yields a value that cannot be defined, the
result is no `NA`

but `NaN`

(*not a
number*):

`0 / 0`

`## [1] NaN`

Another special element is `NULL`

. `NULL`

is in
fact a data type in itself (i.e. it is not a vector), but in practice
its best thought of as a vector of length zero:

```
<- NULL
x length(x)
```

`## [1] 0`

`NULL`

is frequently used to indicate that something does
not exist. An empty vector, for instance, is `NULL`

:

```
<- c()
x x
```

`## NULL`

`length(x)`

`## [1] 0`

This is different to a vector with one (or more) missing
*elements*:

```
<- NA
y length(y)
```

`## [1] 1`

When you define your own functions, you might use `NULL`

as the default value for optional arguments. We will learn about such
more advanced strategies later in this course. For now, its best to
think of `NULL`

as an vector of length zero.

We can extract single elements of a vector using squared brackets:

```
<- c(2,4,6)
x 1] x[
```

`## [1] 2`

This also allows us to modify specific elements:

```
<- c(2,4,6)
x 2] <- 99
x[ x
```

`## [1] 2 99 6`

But we can also extract more than one element:

`1:2] x[`

`## [1] 2 99`

Negative indices eliminate the respective elements:

`-1] x[`

`## [1] 99 6`

To get the last element of a vector you might combine this idea with
the function `length()`

:

`length(x)] x[`

`## [1] 6`

Here we shall mention a few functions that are particularly useful in
the context of atomic vectors,^{9} especially when it comes to producing such
vectors or to perform arithmetic operations with them.

**Creating atomic vectors**:

A sequence of whole numbers is something that we use very frequently.
To create such sequences, the shortcut `:`

comes in
handy:

```
<- 1:10
x x
```

`## [1] 1 2 3 4 5 6 7 8 9 10`

```
<- 10:1
y y
```

`## [1] 10 9 8 7 6 5 4 3 2 1`

To build more complex sequences we can use `seq()`

, which
in its simplest case is equivalent to `:`

:

```
<- seq(1, 10)
x print(x)
```

`## [1] 1 2 3 4 5 6 7 8 9 10`

The function `seq()`

, however, allows for a number of
useful optional arguments. For instance, `by`

allows us to
control the space between the numbers:

```
<- seq(1, 10, by = 0.5)
y print(y)
```

```
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
## [16] 8.5 9.0 9.5 10.0
```

If we want to specify the desired length of the resulting vector and
let R choose the necessary space between the elements, we better use
`length.out`

:

```
<- seq(2, 8, length.out = 4)
z print(z)
```

`## [1] 2 4 6 8`

And if we want to create a vector with the length as another vector,
the argument `along.with`

comes in handy. This is often used
for creating index vectors.^ [An index vector `x`

to any
vector `y`

with `N`

elements contains the integers
from 1 to `N`

. The *n*th value of x thus corresponds
to the index of the *n*th value of `y`

.] In such a
case we do not have to specify the index numbers directly:

```
<- seq(along.with = z)
z_index print(z_index)
```

`## [1] 1 2 3 4`

Another common task is to repeat a certain vector. This can be done
with `rep()`

:

```
<- rep(NA, 5)
x print(x)
```

`## [1] NA NA NA NA NA`

**Operations**

There are a number of operations that we use very frequently together
with vectors. Often we are interested in the **length** of
a vector. For this we can use the function `length()`

:

```
<- c(1,2,3,4)
x length(x)
```

`## [1] 4`

If we are looking for the **largest** and
**smalles** value of a vector we can use `min()`

and `max()`

:

`min(x)`

`## [1] 1`

`max(x)`

`## [1] 4`

Both functions (and many more similar functions) have the optional
argument `na.rm`

, which can be either `TRUE`

or
`FALSE`

. In the case of `TRUE`

, all
`NA`

values are removed before the operation gets
applied:

```
<- c(1,2,3,4,NA)
y min(y)
```

`## [1] NA`

`min(y, na.rm = TRUE)`

`## [1] 1`

The **mean** or the **variance/standard
deviation** of the elemtns can be computed with
`mean()`

, `var()`

, bzw. `sd()`

, all of
which have also the optional argument`na.rm`

:

`mean(x)`

`## [1] 2.5`

`var(y)`

`## [1] NA`

`var(y, na.rm = T)`

`## [1] 1.666667`

Finally, we often want to compute the **sum** or the
**product** of all the elements of the vector. Here the
functions `sum()`

and `prod()`

are useful:

`sum(x)`

`## [1] 10`

`prod(y, na.rm = T)`

`## [1] 24`

In contrast to atomic vectors, lists can contain objects of different
types. We create lists via the function `list()`

:

```
<- list(
l_1 "a",
c(1,2,3),
FALSE
)typeof(l_1)
```

`## [1] "list"`

` l_1`

```
## [[1]]
## [1] "a"
##
## [[2]]
## [1] 1 2 3
##
## [[3]]
## [1] FALSE
```

Lists can become very complex. The function `str()`

(short
for “structure”) helps us to get a quick overview over a list and its
elements:

`str(l_1)`

```
## List of 3
## $ : chr "a"
## $ : num [1:3] 1 2 3
## $ : logi FALSE
```

We can name the elements of lists:^{10}

```
<- list(
l_2 "first_element" = "a",
"second_element" = c(1,2,3),
"third_element" = FALSE
)
```

We can retrieve the names of all elements of the list with
`names()`

:

`names(l_2)`

`## [1] "first_element" "second_element" "third_element"`

There are two very important differences in the handling of vectors and lists:

- Vectorization does not work for list
- Indexing works differently

The first issue can be illustrated easily:

```
<- c(1,2,3)
vec_expl <- list(1,2,3)
list_expl sqrt(vec_expl)
```

`## [1] 1.000000 1.414214 1.732051`

But:

`sqrt(list_expl)`

`## Error in sqrt(list_expl): non-numeric argument to mathematical function`

The second issue is due to the more complex structure of lists. For
vectors we extracted single elements via `[`

. For lists,
there is a difference between `[`

and `[[`

. The
former always returns a list:

`2] l_1[`

```
## [[1]]
## [1] 1 2 3
```

The second then returns a vector and is more similar to the behavior
of `[`

in the context of atomic vectors:

`2]] l_1[[`

`## [1] 1 2 3`

Then to extract an elemtn of this vector we can chain the brackets:

`2]][3] l_1[[`

`## [1] 3`

We can also extract elements by their name:

`1]] l_2[[`

`## [1] "a"`

`"first_element"]] l_2[[`

`## [1] "a"`

Lists are fundamental to many more complex structures that we will encounter later. They are more flexible than atomic vectors, but this flexibility also makes them more difficult to use and less efficient for tasks where this flexibility is not needed. As a rule of thumb, whenever you can represent something as an atomic vector, you should do so. For using lists, you should always have a good reason.

In fact, we will learn below that

`2`

is not really a number, but a vector or length 1. Only in a next step,`2`

counts as a ‘number’, or, more precisely as a ‘double’.↩︎The only object type that is of relevance to use aside these two is

`NULL`

. We will learn about it during the end of this post.↩︎While you can abbrevate the two with

`T`

and`F`

, respectively, I recommend against using these sometimes ambiguous abbrevations.↩︎This syntax has historical reasons: when the type

`integer`

was introduced in R, the developers were guided by the type`long integer`

of the programming language`C`

. In`C`

the suffix for such an integer was ‘l’ or ‘L’. The R developers just transferred this practice into R, only they did not use ‘i’ to avoid a possible confusion between ‘l’ and ‘i’, which look very similar in many fonts (the suffix ‘i’ in R is used for the imaginary component of complex numbers).↩︎In principle there are different kinds of missing values, such as

`NA_integer_`

or`NA_character_`

, but they are irrelevant in practice: any`NA`

value in an atomic vector automatically ‘mimics’ the type of the atomic vector.↩︎`NULL`

on the other hand, is used to represent an absent vector, not an absent element of a vector. We will come back to`NULL`

during the end of the post.↩︎We will learn about packages in later session.↩︎

We learn more about this later in the course when delving into the topic of

*iteration*.↩︎For many common tasks there is already a predefined function in R. The easiest way to find them is by googling↩︎

We can actually also do this with vectors, but it is more common in the context of lists.↩︎