Best way to add a footnote to a plot created with ggplot2

There are different kinds of tools for data visualization and ggplot2 is always my favorite. It is powerful, elegant and easy to use except for one minor defect — difficulty of adding footnote. Unlike adding title, it’s no explicit statement available to add a footnote directly. Let’s use the following plot as example (according to the mpg data set included in ggplot2).

toyota <- mpg[which(mpg$manufacturer == 'toyota'), ]
p <- ggplot(toyota, aes(displ, hwy)) + facet_wrap(~ class, ncol = 2) + geom_point(aes(size=cyl))


As you can see, I create a 4-panel scatter plot using displ as x-axis and hwy as y-axis. Let’s see how we can add the footnote to the plot.

Continue reading

The correct way of hardcoding

Sometimes even after a good attempt by clinical data management at cleaning and coding the data, you may still find the data contain some undesired values. Therefore, you may need to use hardcoding to override the data before you have time to fix them in data management system.

However, hardcoding is dangerous and it is better to avoid hardcoding in any circumstance. One big reason is that data often change over time and the hardcoding writing today may not be appropriate in the future. A hardcode can be easily forgotten and the left code normally will lead to an unpredictable error when you analyze the data.

If hardcoding must be done, some programming skills may be helpful to reduce that risk. See the example below, the &sysdate was used to force the hardcoding to expire at some date point.


data test;
  set test;
  * Hardcode approved by Someone on 12/13/2012;
  if identity = "NEMISIS" and "&sysdate"d <= "13Dec12"d then do;



In R, how do you test a vector to see if it contains a given element?

Three ways to test a vector to see if it contains a given element. Do not tell me instead of using functions,  you want to traverse the vector from first element toward the last.

1. match () : return the first appearance, if not exist return NA

> vt <- c('a', 'b', 'c')
> match('b', vt)
[1] 2
> match('d', vt)
[1] NA


2. %in% : return a Boolean

> vt <- c('a', 'b', 'c')
> 'a' %in% vt
[1] TRUE
> 'd' %in% vt


3. any () : Given a set of logical vectors, to see if at least one of the values is true

> vt <- c('a', 'b', 'c')
> any(vt=='a')
[1] TRUE
> any(vt=='d')


When the vector is big, the time cost is what need to be considered.  I do some simulation and it shows the efficiency ranking for these three functions is (shorter time first) :

any () > match () > %in%



How to get the data set variable list into a macro variable

Sometimes when you have a huge SAS dataset and would like to list or print the variable names in the dataset, it is better to store the list of variable names into a macro variable first and then you can use this macro variable to either print or select the specific columns which you wanted.

There are multiple ways to do this, for example using PROC CONTENTS or a better way below:

proc sql noprint; 
 select distinct name 
 into : varlist separated by ' '
 from dictionary.columns
 where upcase(libname)='WORK' and 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recommended Blogs