3 – Vectors

Creating vectors

A vector is a collection of elements. Importantly, vectors are atomic data types in R – that is, a vector is the most basic data type in R, and even a single number is represented as a vector. We can use the function c() (“combine”, “concatenate”) to create a vector consisting of multiple elements:

(y = c(1, 2, 3.1415, -100))

[1]    1.0000    2.0000    3.1415 -100.0000

Note

Wrapping an assignment in parentheses is a useful shortcut to also display the assigned value in the console. We could re-write the previous example as follows:

y = c(1, 2, 3.1415, -100)
y

The function length() returns the number of elements in a vector:

length(y)

[1] 4

Notice that a single number is just a vector with one element:

a = 6
length(a)

[1] 1

In fact, c() simply concatenates (flattens) all its arguments into a single vector:

c(666, y, 666, c(23, 24))

[1]  666.0000    1.0000    2.0000    3.1415 -100.0000  666.0000   23.0000   24.0000

In this example, we pass four arguments to the function: 666, y (itself a vector consisting of four elements), 666, and c(23, 24) (a two-element vector).

Types

A vector is a homogeneous data structure, which means that it consists of elements of the same type. We have already seen numeric vectors (where all elements are numbers), but R has several other types of vectors. In addition to numeric vectors, we will often work with logical vectors and character vectors (later on, we will also see factors and datetime vectors).

Let’s discuss each vector type in turn.

Numeric vectors

Numeric vectors consist of numbers:

c(2, 13, 15, 17)

[1]  2 13 15 17

The function class() determines the type of the argument you pass. Therefore, you can use class() to check the data type of a given object:

class(c(2, 13, 15, 17))

[1] "numeric"

z = 2
class(z)

[1] "numeric"

Logical vectors

Logical vectors consist of the values TRUE and FALSE (note the all-uppercase spelling):

class(TRUE)

[1] "logical"

class(c(FALSE, FALSE, TRUE))

[1] "logical"

Note

It is possible to abbreviate TRUE and FALSE with T and F, respectively. However, it is discouraged to use these short names because it decreases readability.

Logical vectors are one of the most frequently used vector types in R, because they are the result of a comparison:

x = c(0.5, 55, -10, 6)  # numeric vector
class(x)

[1] "numeric"

x < 1  # comparison

[1]  TRUE FALSE  TRUE FALSE

As we can see, the comparisons are performed element by element, resulting in a logical vector:

class(x < 1)  # logical vector

[1] "logical"

R supports the following comparison operators: >, >=, <, <=, == (equals) and != (is not equal). Comparisons can also be chained (combined) with | (or) and & (and) as well as negated (inverted) with !. It is also possible (and useful) to group expressions with parentheses. Here are some examples:

(3 > 5) & (4 == 4)

[1] FALSE

(TRUE == TRUE) | (TRUE == FALSE)

[1] TRUE

((111 >= 111) | !(TRUE)) & ((4 + 1) == 5)

[1] TRUE

Important

Make sure to use == if you want to test for equality and not = (which is the assignment operator)!

Character vectors

All elements of a character vector consist of strings, that is, a collection of characters, digits, and symbols. Each element of a character vector must be enclosed by single or double quotes. Here are some example character vectors:

(s = c("What's", 'your', "name?"))

[1] "What's" "your"   "name?"

class(s)

[1] "character"

class("Hello!")

[1] "character"

class("42")

[1] "character"

Unsurprisingly, length() returns the number of elements in a character vector. If you want to determine the number of characters of each element in the vector, use nchar():

length(c('Hello', 'world!'))

[1] 2

nchar(c('Hello', 'world!'))

[1] 5 6

Coercion

Recall that vectors are homogeneous data types (all elements have the same type). For example, all elements of a numeric vector are numbers. If we use c() to create a vector with different types, R automatically coerces these elements to a vector that can represent them all instead of throwing an error. This means that mixing numbers and characters results in a character vector, and mixing numbers and logicals results in a numeric vector:

(x = c(1, 2.14, "5", 6))

[1] "1"    "2.14" "5"    "6"

class(x)

[1] "character"

(y = c(1, TRUE, 2, FALSE, -7))

[1]  1  1  2  0 -7

class(y)

[1] "numeric"

Note

Did you notice that TRUE and FALSE were converted to 1 and 0, respectively? This property of logical values is often used in calculations, and we will see some examples later on.

There are also functions that explicitly convert a vector into a desired type:

as.numeric()
as.integer()
as.logical()
as.character()

For example, a character vector can be converted to a numeric vector if its elements can be represented (interpreted) as numbers:

as.numeric(c("1", "2.12", "66"))

[1]  1.00  2.12 66.00

Wherever this is not possible, R creates a missing value NA (“Not Available”) and issues a warning:

as.numeric(c("1", "2.12", "X"))

Warning: NAs introduced by coercion

[1] 1.00 2.12   NA

Working with vectors

Of course, R can perform calculations with numeric vectors. These operations are performed elementwise (separately for each element):

c(1, 2, 3, 4) * 100 + 2

[1] 102 202 302 402

We already know the basic arithmetic operators +, -, *, and / for addition, subtraction, multiplication, and division. The operators ^ or ** perform exponentiation (e.g. 5**2 means five to the power of two). Integer division and remainder can be computed with %/% and %%, respectively. Some additional useful functions that compute mathematical operations include abs() (absolute value), sqrt() (square root), log() (logarithm), and exp() (exponential function).

Recycling

Two vectors in an operation can even have different lengths. For example, R can compute the following addition with a 2-element vector and a 4-element vector:

c(1, 2) + c(6, 7, 8, 9)

[1]  7  9  9 11

Here, R repeats the elements of the shorter vector to match the length of the longer vector. In other words, this is what really happens behind the scenes:

c(1, 2, 1, 2) + c(6, 7, 8, 9)

[1]  7  9  9 11

This (automatic) process is called recycling, because the shorter vector is recycled to generate a vector that has the same number of elements as the longer vector. In fact, recycling also happens in the example we saw previously:

c(1, 2, 3, 4) * 100 + 2

[1] 102 202 302 402

We multiply a vector with four elements (c(1, 2, 3, 4)) with a vector with one element (100), and then add another vector with one element (2). Therefore, R recycles all shorter vectors to length four as follows:

c(1, 2, 3, 4) * c(100, 100, 100, 100) + c(2, 2, 2, 2)

[1] 102 202 302 402

Here’s another example that adds a 4-element vector to a 2-element vector:

c(1, 2, 3, 4) + c(0, 10)

[1]  1 12  3 14

c(1, 2, 3, 4) + c(0, 10, 0, 10)

[1]  1 12  3 14

If the longer vector’s length is not an exact multiple of the shorter vector’s length, R will still recycle the shorter vector (but will issue a warning):

c(1, 2, 3, 4, 5) + c(0, 10, 8)

Warning in c(1, 2, 3, 4, 5) + c(0, 10, 8): longer object length is not a multiple of shorter object length

[1]  1 12 11  4 15

c(1, 2, 3, 4, 5) + c(0, 10, 8, 0, 10)

[1]  1 12 11  4 15

Creating sequences

Sequences of numbers are frequently needed, so R provides shortcuts to create such vectors using either : or the function seq(). In both cases, we provide start and stop values, but whereas the step size for a sequence generated by : is always one, seq() can generate number sequences with arbitrary step sizes.

Here are some examples:

1:20

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

pi:10

[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

9:2

[1] 9 8 7 6 5 4 3 2

seq(1, 20)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq(20, 1)

 [1] 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

seq(0, 8, by=0.5)

 [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

seq(8, 0, by=-0.5)  # note the negative step size

 [1] 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

seq(0, 20, 2)  # even numbers

 [1]  0  2  4  6  8 10 12 14 16 18 20

seq(1, 20, 2)  # odd numbers

 [1]  1  3  5  7  9 11 13 15 17 19

seq(from=1, to=3, length.out=10)

 [1] 1.000000 1.222222 1.444444 1.666667 1.888889 2.111111 2.333333 2.555556 2.777778 3.000000

The related function rep() creates a vector by repeating elements. For example, we can create a vector consisting of 90 zeros:

rep(0, 90)

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[59] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Tip

Now we can finally solve the mystery of [1] that we see in front of every output: this number in square brackets corresponds to the index (the position in the vector) of the element right next to it. In the example, this means that the first 0 in the first line corresponds to index 1, whereas the first 0 in the second line corresponds to index 59 (the fifty-ninth element in the vector). This is convenient to determine indexes in long vectors without having to count from the first element.

The following lines show some additional usage examples of the rep() function:

rep(c(0, 1, 2), times=10)

 [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

rep(c(0, 1, 2), each=10)

 [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

rep(c(0, 1, 2), times=c(10, 10, 10))  # identical to each=10

 [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

Indexing

Very often, we need to extract only specific values from a vector (for example the first 10 elements of a long vector). The position of an element in a vector is called its index. R starts counting at 1 (in contrast to many other programming languages that start counting at 0), so the first element has index 1, the second element has index 2, and so on.

Let’s look at an example:

(x = 1:10)

 [1]  1  2  3  4  5  6  7  8  9 10

To extract one element of a vector, we can use square brackets containing the desired index:

x[4]  # fourth element

[1] 4

We can also extract more than one element with a corresponding vector of indexes inside the square brackets:

x[1:5]  # elements 1-5

[1] 1 2 3 4 5

x[c(1, 4, 8)]  # elements 1, 4, and 8

[1] 1 4 8

Important

We use c() to create a vector of indexes in the second example. Importantly, x[1, 4, 8] would not work, since we must provide a single vector inside the square brackets!

Negative indexes mean “all elements except those corresponding to negative indexes”:

x[c(-1, -10)]

[1] 2 3 4 5 6 7 8 9

x[-c(1, 10)]

[1] 2 3 4 5 6 7 8 9

Finally, logical vectors are also permitted as index vectors. Here, the result consists of all elements where the index vector is TRUE. This is useful because we can use comparisons (which result in logical vectors) to filter specific elements of a vector:

 [1]  1  2  3  4  5  6  7  8  9 10

x > 5

 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

x[x > 5]

[1]  6  7  8  9 10

Named vectors

Optionally, individual elements of a vector can be named, which allows us to access elements not only by index (position), but also by name. We create named vectors by passing argument names along with their values in the function call to c():

(vect = c(a=11, b=2, c=NA))

 a  b  c 
11  2 NA

vect[2]

b 
2

vect["b"]

b 
2

The function names() returns the names of a vector. If the elements are not named, it returns NULL (which is a special value in R to denote “nothing”).

names(vect)

[1] "a" "b" "c"

We can also use names() to assign element names to an existing vector:

x = 1:3
names(x)

NULL

names(x) = c("test", "value", "x")
x

 test value     x 
    1     2     3

Missing values

R represents missing values as NA (“not available”):

(vect = c(15, 1.12, NA, 12, NA, 33.22))

[1] 15.00  1.12    NA 12.00    NA 33.22

The function is.na() tests each element in a vector if it is missing or not. The result is a logical vector.

is.na(vect)

[1] FALSE FALSE  TRUE FALSE  TRUE FALSE

Important

Finding missing values with the equality operator == does not work:

vect == NA

[1] NA NA NA NA NA NA

We need to use is.na() instead.

Using logical negation !, we can determine all elements that are not missing and subsequently drop all missing values:

!is.na(vect)

[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE

vect[!is.na(vect)]

[1] 15.00  1.12 12.00 33.22

Tip

We can also use the fact that TRUE is interpreted as 1 and FALSE as 0 in calculations. Therefore, the sum of a logical vector corresponds to the number of TRUE values, and hence the number of missing values in the following example:

sum(is.na(vect))  # number of missing values

[1] 2

Exercises

Calculate the area and circumference of a circle with radius 5. Use variables for these three values when calculating the results.
Create a vector x consisting of numbers 4, 18, -7, 16, 4, 29, 8, and -44. Then generate a vector y which contains the squared elements of x. Finally, create a vector z by combining/concatenating x and y. How can you determine the number of elements in z?
Which elements of the following vector x are even? Which elements are odd?
```
x = c(44, 23, -56, 98, 99, 32, 45, 19, 22)
```
Note

When dividing even numbers by two, the remainder is zero. Odd numbers have a remainder of one. Use the remainder operator %% to compute these results.
Create the following three vectors:
- A sequence of numbers from 15 to 40.
- A sequence of descending numbers from 80 to 60 in steps of 3.
- A vector consisting of 77 equally-spaced numbers between 14 and 39.
Create a character vector with the following elements: the first 15 elements should be "placebo", the next 15 elements should be "group 1", and the last 15 elements should be "group 2".
Create a vector k with even numbers from 0 to 40. Then use appropriate indexing to create new vectors consisting of the following subsets:
- All elements of k except the eighth and ninth.
- The first five elements.
- Elements 2, 5, and 26.
- Alle elements of k which are greater than 10.
Generate the following vector t:
```
t = c(10, 20, NA, 30, 40)
```
Use mean() to compute the arithmetic mean. What is the result and how can you ignore the missing value?
Consider a vector consisting of six standard deviations:
```
std = c(1, 2.22, 11.3, 7.8, 3.4, 6)
```
How can you compute a vector of variances?