Clean up “everything” in RStudio

This is a tip for how to clean up your RStudio windows.

For workspace:

You can use rm() to clean all objects in current environment


Or if you only want to remove specific object or only a group of new generated objects, try the following:

obj.list <- ls()  #Save the names of the existing objects
rm(list=setdiff(ls(), obj.list))  #Remove any new generated objects


For console:

You can press Ctrl – L manually. Of course, it would be nice to do this programmatically. So try this:

cat("14")  # or cat("f")


For plot windows:

Try to use, it will clost all existing graphical device and only keep Null device (device 1). If you have other graphical devices open (e.g. pdf or png) and don’t want them to be closed, you can use dev.list() to figure out which graphical device is RStudio’s.["RStudioGD"]


Read data from Clipboard into R

I bet you have similar experience as I have: trying to copy the data directly into R. Now I will introduce the best solution when you’re in a hurry. Just simply use the read.table() function and read in the data on clipboard directly.
df <- read.table("clipboard")
If you want to keep the header, please add header = T option.
df <- read.table("clipboard", header = T)

Character string functions provided by base R

Basic character string functions
nchar(x)Return the string lengthnchar("Hello") #5
toupper(x)Upcase the stringtoupper("hello world") #"HELLO WORLD"
tolower(x)Lowcase the string
strtrim(x, width)Trim character strings to specified display widths.strtrim("Hello", 2) #"He"
paste(…, sep = " ")Concatenate vectors after converting to character.paste(x, 1:3, sep = "") #"x1" "x2" "x3"
paste(c("x", "y", "z"), 1:3, sep = "M") #"xM1" "yM2" "zM3"
paste("Hello", "World", sep = " ") #"Hello World"
Also work with regular expression patterns (fixed = )
substr(x, start, stop) or substr(x, start, stop)Extract or replace substrings in a character vector.substr("Hello World", 1, 5) #"Hello"
x <- "Hello World"
substr(x, 1, 5) <- "Goodbye"
x #Goodbye World
sub(pattern, replacement, x) or gsub(pattern, replacement, x)Sub and gsub perform replacement of the first and all matches respectively.sub("\\s", ".", "Hello World") #"Hello.World"
strsplit(x, split)Split the elements of a character vector x into substrings according to the matches to substring split within them.strsplit("a.b.c", ".", fixed = TRUE) #"a" "b" "c"
grep(pattern, x)Search for matches to argument pattern within each element of a character vectorgrep("foo", c("arm", "foot")) #2

The correct way of hardcoding

Sometimes even after a good attempt by clinical data management at cleaning and coding the data, you may still find the data contain some undesired values. Therefore, you may need to use hardcoding to override the data before you have time to fix them in data management system.

However, hardcoding is dangerous and it is better to avoid hardcoding in any circumstance. One big reason is that data often change over time and the hardcoding writing today may not be appropriate in the future. A hardcode can be easily forgotten and the left code normally will lead to an unpredictable error when you analyze the data.

If hardcoding must be done, some programming skills may be helpful to reduce that risk. See the example below, the &sysdate was used to force the hardcoding to expire at some date point.


data test;
  set test;
  * Hardcode approved by Someone on 12/13/2012;
  if identity = "NEMISIS" and "&sysdate"d <= "13Dec12"d then do;



In R, how do you test a vector to see if it contains a given element?

Three ways to test a vector to see if it contains a given element. Do not tell me instead of using functions,  you want to traverse the vector from first element toward the last.

1. match () : return the first appearance, if not exist return NA

> vt <- c('a', 'b', 'c')
> match('b', vt)
[1] 2
> match('d', vt)
[1] NA


2. %in% : return a Boolean

> vt <- c('a', 'b', 'c')
> 'a' %in% vt
[1] TRUE
> 'd' %in% vt


3. any () : Given a set of logical vectors, to see if at least one of the values is true

> vt <- c('a', 'b', 'c')
> any(vt=='a')
[1] TRUE
> any(vt=='d')


When the vector is big, the time cost is what need to be considered.  I do some simulation and it shows the efficiency ranking for these three functions is (shorter time first) :

any () > match () > %in%



How to get the data set variable list into a macro variable

Sometimes when you have a huge SAS dataset and would like to list or print the variable names in the dataset, it is better to store the list of variable names into a macro variable first and then you can use this macro variable to either print or select the specific columns which you wanted.

There are multiple ways to do this, for example using PROC CONTENTS or a better way below:

proc sql noprint; 
 select distinct name 
 into : varlist separated by ' '
 from dictionary.columns
 where upcase(libname)='WORK' and 

Magic number 2.220446e-16

If you have seen one of my old posts: Interesting unequal math equation, you would know there is an accuracy problem in R. And I give an explanation in that post: “Most float number has no exact representation in binary format, just approximation”.  Here I decide to dig a litter bit deeper.

Let’s look at some examples first.

> 1.37+0.12-1.49
[1] 2.220446e-16
> 1.38+0.12-1.5
[1] 0
> 1.39+0.12-1.51
[1] -2.220446e-16

See, notice the number there, 2.220446e-16. Do you think it’s just a coincidence ?
Of course not.

Thanks to Google, I find a detailed explanation about this problem.

Real numbers in R are stored in double precision, which means that 53 bit floating point arithmetic in base 2 is used. This may be seen from

> 1 + 2^-52 == 1
> 1 + 2^-53 == 1
[1] TRUE

The number 1 + 2^-52 with a 53 bit mantissa is exactly representable, while 1 + 2^-53 with 54 bit mantissa is rounded to 1. The smallest difference between two consecutive representable numbers in the interval [1 , 2) is about 2.220446e-16 which exactly equals to 2^-52.

Double precision is the standard for numerical calculations, where speed is required. This cannot represent irrational numbers and rational numbers, whose denominator is not a power of 2. In particular, numbers with a finite number of decimal digits need not have a finite expansion as a binary number. This is the reason for the following

> 0.1 + 0.2 - 0.3
[1] 5.551115e-17

Similar effects may be demonstrated using decimal numbers. The reason for the above is similar to the reason, why 2/3 – 1/3 – 1/3 is not 0, if 1/3 and 2/3 are rounded to a finite number of decimal digits. With 5 digits, we get 0.66667 – 0.33333 – 0.33333 = 0.00001.

The fact that numbers like 0.1 are not represented exactly does not mean that we cannot get correct result, at least in simple cases, if the calculations are done with care. In particular, for correcting errors of addition and subtraction of fractional decimal numbers, the functions round() and signif() may be used.



Invisible Character Alt-255

The text aligning and positioning in SAS output is really important if you want your report looks good. I usually use space to aligning text in titles, footnotes and columns, etc. However, SAS have its own rule to handle the blanks, especially the leading or trailing blanks, so sometimes the space cannot do what you want.

Here I’m introducing a new simple and elegant approach: using Alt-255. It looks like a blank space in the program code and SAS output but is processed and printed by many programs as a valid text character.

Now, how? First of all, remember you need to use a numeric keypad for typing the magic number 255.

You should follow the following steps to create an invisible Character.

1. Press and hold the “Alt” key and while holding it, type digit keys 255 from numeric keypad.
2. Release the “Alt” key and after releasing the cursor will moves to the next position so you will know that an invisible character has been inserted.

Actually, we can use Alt-N to enter any letter and a lot of graphical symbols. There is a nice place where you can check all Alt-N characters ( Alt-255 is of special interest just because it is invisible.

See the example below:

data test;
  input fname $;
* The blank before Alan is Alt-255, before Andy is space;

proc print data=test;

And the result:

1 Joe
2  Alan
3 Andy

Remove all labels and formats in SAS data set

I occasionally find that the labels in data set are annoying especially when this data set is from outside (means someone else create the data set). The labels will cover the variable names when you check the data and therefore you may incorrectly use the label instead of true variable name in programming. And It normally waste me much time to debug.

There is a very easy way to remove all labels in a single step:

* Remove all the labels and formats in data set;
proc datasets lib=work memtype=data;
  modify data_set_name;
  attrib _all_ label='';
  attrib _all_ format=;

Hope it also helps you !

A smart way to comment chunks of code in SAS

We all know there are two styles of comments in SAS: * ; and /* */. Normally when we want to disable a chunk of code, we will choose /* */.

But I bet you have such experience that you cannot do it well with /* */ since parts of the code itself might contain /* */ style comments. Therefore, in this case, only the code up to the first */ would be commented. So the best method to disable a chunk of code is to put it in a macro declaration and never call the macro, for example:

%macro comment;
%mend comment;