Clean up “everything” in RStudio

This is a tip for how to clean up your RStudio windows.

For workspace:

You can use rm() to clean all objects in current environment

rm(list=ls())

Or if you only want to remove specific object or only a group of new generated objects, try the following:

rm(list='obj_name')
obj.list <- ls()  #Save the names of the existing objects
....
rm(list=setdiff(ls(), obj.list))  #Remove any new generated objects

 

For console:

You can press Ctrl – L manually. Of course, it would be nice to do this programmatically. So try this:

cat("14")  # or cat("f")

 

For plot windows:

Try to use dev.off(), it will clost all existing graphical device and only keep Null device (device 1). If you have other graphical devices open (e.g. pdf or png) and don’t want them to be closed, you can use dev.list() to figure out which graphical device is RStudio’s.

dev.off(dev.list()["RStudioGD"]

 

Read data from Clipboard into R

I bet you have similar experience as I have: trying to copy the data directly into R. Now I will introduce the best solution when you’re in a hurry. Just simply use the read.table() function and read in the data on clipboard directly.
df <- read.table("clipboard")
If you want to keep the header, please add header = T option.
df <- read.table("clipboard", header = T)

Character string functions provided by base R

FunctionDescriptionExample
Basic character string functions
nchar(x)Return the string lengthnchar("Hello") #5
toupper(x)Upcase the stringtoupper("hello world") #"HELLO WORLD"
tolower(x)Lowcase the string
strtrim(x, width)Trim character strings to specified display widths.strtrim("Hello", 2) #"He"
paste(…, sep = " ")Concatenate vectors after converting to character.paste(x, 1:3, sep = "") #"x1" "x2" "x3"
paste(c("x", "y", "z"), 1:3, sep = "M") #"xM1" "yM2" "zM3"
paste("Hello", "World", sep = " ") #"Hello World"
Also work with regular expression patterns (fixed = )
substr(x, start, stop) or substr(x, start, stop)Extract or replace substrings in a character vector.substr("Hello World", 1, 5) #"Hello"
x <- "Hello World"
substr(x, 1, 5) <- "Goodbye"
x #Goodbye World
sub(pattern, replacement, x) or gsub(pattern, replacement, x)Sub and gsub perform replacement of the first and all matches respectively.sub("\\s", ".", "Hello World") #"Hello.World"
strsplit(x, split)Split the elements of a character vector x into substrings according to the matches to substring split within them.strsplit("a.b.c", ".", fixed = TRUE) #"a" "b" "c"
grep(pattern, x)Search for matches to argument pattern within each element of a character vectorgrep("foo", c("arm", "foot")) #2

In R, how do you test a vector to see if it contains a given element?

Three ways to test a vector to see if it contains a given element. Do not tell me instead of using functions,  you want to traverse the vector from first element toward the last.

1. match () : return the first appearance, if not exist return NA

> vt <- c('a', 'b', 'c')
> match('b', vt)
[1] 2
> match('d', vt)
[1] NA

 

2. %in% : return a Boolean

> vt <- c('a', 'b', 'c')
> 'a' %in% vt
[1] TRUE
> 'd' %in% vt
[1] FALSE

 

3. any () : Given a set of logical vectors, to see if at least one of the values is true

> vt <- c('a', 'b', 'c')
> any(vt=='a')
[1] TRUE
> any(vt=='d')
[1] FALSE

 

When the vector is big, the time cost is what need to be considered.  I do some simulation and it shows the efficiency ranking for these three functions is (shorter time first) :

any () > match () > %in%

 

 

Magic number 2.220446e-16

If you have seen one of my old posts: Interesting unequal math equation, you would know there is an accuracy problem in R. And I give an explanation in that post: “Most float number has no exact representation in binary format, just approximation”.  Here I decide to dig a litter bit deeper.

Let’s look at some examples first.

> 1.37+0.12-1.49
[1] 2.220446e-16
> 1.38+0.12-1.5
[1] 0
> 1.39+0.12-1.51
[1] -2.220446e-16

See, notice the number there, 2.220446e-16. Do you think it’s just a coincidence ?
Of course not.

Thanks to Google, I find a detailed explanation about this problem.

Real numbers in R are stored in double precision, which means that 53 bit floating point arithmetic in base 2 is used. This may be seen from

> 1 + 2^-52 == 1
[1] FALSE
> 1 + 2^-53 == 1
[1] TRUE

The number 1 + 2^-52 with a 53 bit mantissa is exactly representable, while 1 + 2^-53 with 54 bit mantissa is rounded to 1. The smallest difference between two consecutive representable numbers in the interval [1 , 2) is about 2.220446e-16 which exactly equals to 2^-52.

Double precision is the standard for numerical calculations, where speed is required. This cannot represent irrational numbers and rational numbers, whose denominator is not a power of 2. In particular, numbers with a finite number of decimal digits need not have a finite expansion as a binary number. This is the reason for the following

> 0.1 + 0.2 - 0.3
[1] 5.551115e-17

Similar effects may be demonstrated using decimal numbers. The reason for the above is similar to the reason, why 2/3 – 1/3 – 1/3 is not 0, if 1/3 and 2/3 are rounded to a finite number of decimal digits. With 5 digits, we get 0.66667 – 0.33333 – 0.33333 = 0.00001.

The fact that numbers like 0.1 are not represented exactly does not mean that we cannot get correct result, at least in simple cases, if the calculations are done with care. In particular, for correcting errors of addition and subtraction of fractional decimal numbers, the functions round() and signif() may be used.

 

 Reference:

http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy

http://stackoverflow.com/questions/6970705/why-cant-i-get-a-p-value-smaller-than-2-2e-16-in-r

Use the superassignment operator

One of the most important functional programming principle is that functions do not change non-local variables; that is, generally speaking, the code in a function only has read access to its non-local variables.  This is a quite important feature which can protect the higher-level variable from being changed by local functions. See the example below.

> x <- 10
> test <- function(x) {
+   x <- x - 5
+   print(x)
+ }
> test(x)
[1] 5
> x
[1] 10

However, sometimes you may wish to write to a global variable or any variable higher than the level at which your write statement exists. The superassignment operator, <<-, or the assign() function is what you want. Let’s look at the superassignment operator first.Continue reading

get() function in R

The get() function might be one of the most useful utilities in R. However I’ve never use this function before I write this page. Shame on me.

Well, let’s get to the point. The job of the get() function is actually quite simple: given the name of an object, it fetches the object itself. See the example below:

> x <- c(1:3)
> x
[1] 1 2 3
> get("x")
[1] 1 2 3

It’s easy to imagine how useful this function is.

 

Reference: The Art of R Programming by Norman Matloff

The difference between using subset() function and ordinary filtering

Well at first, I thought there is no difference between this two methods. And I normally use  these two methods interchangeable when I wrote the R code.

And actually there  is a small difference in how NA values are handled.

> x <- c(6, 1, NA, 10)
> x
[1]  6  1 NA 10
> x[x > 5]
[1]  6 NA 10
> subset(x, x > 5)
[1]  6 10

So when your data have some missing values, for example survey data, choose subset() or filtering method carefully. This tiny difference may cause unpredictable mistake which normally takes you a lot of time to debug the program.

 

Reference: The Art of R Programming by Norman Matloff

Interesting unequal math equation

Well, I saw an interesting problem this morning. See the code below.

> 1.37+0.12 == 1.49
[1] FALSE
> 1.36+0.12 == 1.48
[1] TRUE

It looks weird, right? I googled this problem and someone gives an explanation like this: “Most float number has no exact representation in binary format, just approximation”. The  interpretation isn’t so clear, but at least we know what’s going on.

> 1.37+0.12-1.49
[1] 2.220446e-16
> 1.36+0.12-1.48
[1] 0

So, if you need this kind of comparison in an if control structure, you may have some trouble. One solution is that writing code in this way: 1.37+0.12-1.49 > -1e-10 and 1.37+0.12-1.49 < 1e-10. Looks ugly, but it works.

And there is also a better way to handle this in R. The all.equal() function is what we need.  The function is used to test if two objects are nearly equal.

> if (1.37+0.12 == 1.49) {cat('Match')}
> if (-1e-10 < 1.37+0.12-1.49 & 1.37+0.12-1.49 < 1e-10)
+ {cat('Match')}
Match
> if (all.equal(1.37+0.12, 1.49)) {cat('Match')}
Match

Recursive or non-recursive list

In R, lists can be recursive, which means that you can have list within list.

> c(list(a=1, b=2, c=list(d=4, e=5)))
$a
[1] 1

$b
[1] 2

$c
$c$d
[1] 4

$c$e
[1] 5

The code above creates a two-component list, with c component of the main list itself being another list.

However, sometimes you may want to create a single list instead of a recursive list. You can do this by setting the optional argument recursive in c() function to TRUE. (It’s weird that setting recursive to TRUE actually gives you a non-recursive list.)

> c(list(a=1, b=2, c=list(d=4, e=5)), recursive=T)
  a   b c.d c.e
  1   2   4   5

 

Reference: The Art of R Programming by Norman Matloff