Magic number 2.220446e-16

If you have seen one of my old posts: Interesting unequal math equation, you would know there is an accuracy problem in R. And I give an explanation in that post: “Most float number has no exact representation in binary format, just approximation”.  Here I decide to dig a litter bit deeper.

Let’s look at some examples first.

> 1.37+0.12-1.49
[1] 2.220446e-16
> 1.38+0.12-1.5
[1] 0
> 1.39+0.12-1.51
[1] -2.220446e-16

See, notice the number there, 2.220446e-16. Do you think it’s just a coincidence ?
Of course not.

Thanks to Google, I find a detailed explanation about this problem.

Real numbers in R are stored in double precision, which means that 53 bit floating point arithmetic in base 2 is used. This may be seen from

> 1 + 2^-52 == 1
> 1 + 2^-53 == 1
[1] TRUE

The number 1 + 2^-52 with a 53 bit mantissa is exactly representable, while 1 + 2^-53 with 54 bit mantissa is rounded to 1. The smallest difference between two consecutive representable numbers in the interval [1 , 2) is about 2.220446e-16 which exactly equals to 2^-52.

Double precision is the standard for numerical calculations, where speed is required. This cannot represent irrational numbers and rational numbers, whose denominator is not a power of 2. In particular, numbers with a finite number of decimal digits need not have a finite expansion as a binary number. This is the reason for the following

> 0.1 + 0.2 - 0.3
[1] 5.551115e-17

Similar effects may be demonstrated using decimal numbers. The reason for the above is similar to the reason, why 2/3 – 1/3 – 1/3 is not 0, if 1/3 and 2/3 are rounded to a finite number of decimal digits. With 5 digits, we get 0.66667 – 0.33333 – 0.33333 = 0.00001.

The fact that numbers like 0.1 are not represented exactly does not mean that we cannot get correct result, at least in simple cases, if the calculations are done with care. In particular, for correcting errors of addition and subtraction of fractional decimal numbers, the functions round() and signif() may be used.



Invisible Character Alt-255

The text aligning and positioning in SAS output is really important if you want your report looks good. I usually use space to aligning text in titles, footnotes and columns, etc. However, SAS have its own rule to handle the blanks, especially the leading or trailing blanks, so sometimes the space cannot do what you want.

Here I’m introducing a new simple and elegant approach: using Alt-255. It looks like a blank space in the program code and SAS output but is processed and printed by many programs as a valid text character.

Now, how? First of all, remember you need to use a numeric keypad for typing the magic number 255.

You should follow the following steps to create an invisible Character.

1. Press and hold the “Alt” key and while holding it, type digit keys 255 from numeric keypad.
2. Release the “Alt” key and after releasing the cursor will moves to the next position so you will know that an invisible character has been inserted.

Actually, we can use Alt-N to enter any letter and a lot of graphical symbols. There is a nice place where you can check all Alt-N characters ( Alt-255 is of special interest just because it is invisible.

See the example below:

data test;
  input fname $;
* The blank before Alan is Alt-255, before Andy is space;

proc print data=test;

And the result:

1 Joe
2  Alan
3 Andy

Remove all labels and formats in SAS data set

I occasionally find that the labels in data set are annoying especially when this data set is from outside (means someone else create the data set). The labels will cover the variable names when you check the data and therefore you may incorrectly use the label instead of true variable name in programming. And It normally waste me much time to debug.

There is a very easy way to remove all labels in a single step:

* Remove all the labels and formats in data set;
proc datasets lib=work memtype=data;
  modify data_set_name;
  attrib _all_ label='';
  attrib _all_ format=;

Hope it also helps you !

ggplot2 plotting over multiple pages

I bet you have done this before: tying to use ggplot to create graphs over multiple pages.  The first thing I thought about this question is wrapping the ggplot code all up in a for loop like below, in between the pdf() and functions. For example:

for (i in seq){ 
 ggplot(...) + geom_point(...) 

However, if you try to run this code, you will find that the for loop doesn’t seem to wait for ggplot to do its thing, and blazes through its loop very quickly and outputs an invalid PDF.

If you run pdf() first, then set i=1, run the above code inside the for loop, then set i=2, until finish the loop then turn off the device, the resulting PDF looks great.

So what’s really going on?

The answer is on Page 39 of the ggplot2 book. It tells us that when you create ggplot2 objects, you can “Render it on screen, with print(). This happens automatically when running interactively, but inside a loop or function, you’ll need to print() it yourself”. So the code below works.

for (i in seq){ 
 p <- ggplot(...) + geom_point(...) 



A smart way to comment chunks of code in SAS

We all know there are two styles of comments in SAS: * ; and /* */. Normally when we want to disable a chunk of code, we will choose /* */.

But I bet you have such experience that you cannot do it well with /* */ since parts of the code itself might contain /* */ style comments. Therefore, in this case, only the code up to the first */ would be commented. So the best method to disable a chunk of code is to put it in a macro declaration and never call the macro, for example:

%macro comment;
%mend comment;



Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recommended Blogs