Use the IN operator in macro

The IN operator is one convenient checking tool in SAS which can be used in place of lots of OR statements. And it’s quite common in data step.

data test;
  if 3 in (1, 2, 3, 4, 5, 6) then
    put "Value found within list.";
  else put "Value not in list.";

Sometimes, you will come across a situation where you have to write a macro program where a macro variable has more than one value. You may think to use multiple OR operators as below…

* Prior to SAS9.2, the following syntax was used;
%macro test(value); 
  %if &value=1 or &value=2 or &value=3 or &value=4 or
      &value=5 or &value=6 %then %put Value found within list.;
  %else %put Value not in list.;


Starting SAS 9.2, there is an IN operator for the macro language. The IN operator can now be used on the %IF statement when the MINOPERATOR option is set on the %MACRO statement or as a SAS system option.

%macro test(value)/minoperator;
  %if &value in 1 2 3 4 5 6 %then
    %put Value found within list.;
  %else %put Value not in list.;


Or you can use character # (new binary comparison operator) as an alternate operator to mnemonic IN operator.

%macro test(value)/minoperator;
  %if &value # 1 2 3 4 5 6 %then
    %put Value found within list.;
  %else %put Value not in list.;


They both work fine. MINOPERATOR option tells SAS to recognize the word ‘IN’ or # by the SAS macro facility as a logical operator in expressions.

And there is another way of writing code which is more close to the way we use IN operator in data step. By default, a space is used as the delimiter for the operator, but can be changed by using the MINDELIMITER= %MACRO statement option.

options minoperator;

%macro test(value)/mindelimiter=',';
  %if &value in 1,2,3,4,5,6 %then
    %put Value found within list.;
  %else %put Value not in list.;




Use the superassignment operator

One of the most important functional programming principle is that functions do not change non-local variables; that is, generally speaking, the code in a function only has read access to its non-local variables.  This is a quite important feature which can protect the higher-level variable from being changed by local functions. See the example below.

> x <- 10
> test <- function(x) {
+   x <- x - 5
+   print(x)
+ }
> test(x)
[1] 5
> x
[1] 10

However, sometimes you may wish to write to a global variable or any variable higher than the level at which your write statement exists. The superassignment operator, <<-, or the assign() function is what you want. Let’s look at the superassignment operator first.Continue reading

get() function in R

The get() function might be one of the most useful utilities in R. However I’ve never use this function before I write this page. Shame on me.

Well, let’s get to the point. The job of the get() function is actually quite simple: given the name of an object, it fetches the object itself. See the example below:

> x <- c(1:3)
> x
[1] 1 2 3
> get("x")
[1] 1 2 3

It’s easy to imagine how useful this function is.


Reference: The Art of R Programming by Norman Matloff

Use SAS system options to suppress Log output

For Windows SAS system, the Output, Log, and Program Editor windows can display approximately 99,999 lines each. Since the number is limited, sometimes the SAS Log windows will be full and the system will show a messages ‘SAS log window is full’. The running program will pause and wait for your action. It’s quite annoying, especially when you run a program with hundreds or thousands of loops.

I run a simulation program with a loop that went 100,000 times recently and it took more than 4 hours to run, even on the sever. And I want to suppress the log output because I don’t want that message to interrupt the program running.

I read the SAS documents carefully, and there are several option available to prevent the log window from filling up.

Solution 1: System options

There are four system options can be used to suppress SAS statements, system messages, and error message, respectively.Continue reading

The difference between using subset() function and ordinary filtering

Well at first, I thought there is no difference between this two methods. And I normally use  these two methods interchangeable when I wrote the R code.

And actually there  is a small difference in how NA values are handled.

> x <- c(6, 1, NA, 10)
> x
[1]  6  1 NA 10
> x[x > 5]
[1]  6 NA 10
> subset(x, x > 5)
[1]  6 10

So when your data have some missing values, for example survey data, choose subset() or filtering method carefully. This tiny difference may cause unpredictable mistake which normally takes you a lot of time to debug the program.


Reference: The Art of R Programming by Norman Matloff

Interesting unequal math equation

Well, I saw an interesting problem this morning. See the code below.

> 1.37+0.12 == 1.49
> 1.36+0.12 == 1.48
[1] TRUE

It looks weird, right? I googled this problem and someone gives an explanation like this: “Most float number has no exact representation in binary format, just approximation”. The  interpretation isn’t so clear, but at least we know what’s going on.

> 1.37+0.12-1.49
[1] 2.220446e-16
> 1.36+0.12-1.48
[1] 0

So, if you need this kind of comparison in an if control structure, you may have some trouble. One solution is that writing code in this way: 1.37+0.12-1.49 > -1e-10 and 1.37+0.12-1.49 < 1e-10. Looks ugly, but it works.

And there is also a better way to handle this in R. The all.equal() function is what we need.  The function is used to test if two objects are nearly equal.

> if (1.37+0.12 == 1.49) {cat('Match')}
> if (-1e-10 < 1.37+0.12-1.49 & 1.37+0.12-1.49 < 1e-10)
+ {cat('Match')}
> if (all.equal(1.37+0.12, 1.49)) {cat('Match')}

Recursive or non-recursive list

In R, lists can be recursive, which means that you can have list within list.

> c(list(a=1, b=2, c=list(d=4, e=5)))
[1] 1

[1] 2

[1] 4

[1] 5

The code above creates a two-component list, with c component of the main list itself being another list.

However, sometimes you may want to create a single list instead of a recursive list. You can do this by setting the optional argument recursive in c() function to TRUE. (It’s weird that setting recursive to TRUE actually gives you a non-recursive list.)

> c(list(a=1, b=2, c=list(d=4, e=5)), recursive=T)
  a   b c.d c.e
  1   2   4   5


Reference: The Art of R Programming by Norman Matloff

Avoiding Unintended Dimension Reduction

It’s a common scenario that you need to extract one row from a matrix and still want to put some matrix operation on this ‘one-row submatrix’.

> z <- matrix(1:8, nrow=4)
> z
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8
> r <- z[3, ]
> r
[1] 3 7
> attributes(z)
[1] 4 2

> attributes(r)
> str(z)
 int [1:4, 1:2] 1 2 3 4 5 6 7 8
> str(r)
 int [1:2] 3 7

See, when you extract a row from a four-row matrix, you got a vector not a one-row matrix. It seems natural, but in many case, it will cause trouble in programs that do a lot of matrix operation.

The good news is that R has a way to suppress this kind of dimension reduction, with the drop argument.

> r <- z[3,, drop=FALSE]
> r
     [,1] [,2]
[1,]    3    7

or you can always explicitly convert a vector to a matrix by using the as.matrix() function.

Plus: the drop option not only works for matrix, it also can be used in data.frame structure.


Reference: The Art of R Programming by Norman Matloff

Using seq() function to deal with the empty-vector problem

Well, for loop structure might be the most common control structure we used in R programming. The code normally looks like this:

for (i in 1:length(x)) {}

It works well for most of the case, how ever when the x vector is empty, 1:length(x) will be (1,0) , so the program will have an error. A better way to handle this is using seq() function.

for (i in seq(x)) {}

And let’s see how the seq() function handle the empty vector.

> x <- c(4, 10)
> seq(x)
[1] 1 2
> x <- NULL
> seq(x)

The seq() function gives the same result as the length() function, but correctly evaluates to NULL, if x is empty, resulting in zero iteration in the loop.


Reference: The Art of R Programming by Norman Matloff

Create a numeric vector in R: using : or c() ?

Did you know in R, : and c() are different when you want to create a numeric vector?

See the example below.

> x <- 1:2
> y <- c(1, 2)
> identical(x, y)
> typeof(x)
[1] "integer"
> typeof(y)
[1] "double"

So, : produces integers while c() produces floating-point number.


Reference: The Art of R Programming by Norman Matloff