Goldbach's Comet

Following this post, there is still room for improvement. Recall the last implementation (goldbach5)

12 goldbach5 <- function(n) {
13     xx <- 1 : n
14     xx <- xx[isprime(xx) > 0][-1]
15     
16     # generates row indices
17     x <- function(N){ 
18         rep.int( 1:N, N:1) 
19     }
20     # generates column indices
21     y <- function(N){ 
22         unlist( lapply( 1:N, seq.int, to = N ) ) 
23     }
24     z <- xx[ x(length(xx)) ] + xx[ y(length(xx)) ]
25     z <- z[z <= n]
26     tz <- tabulate(z, n )
27     tz[ tz != 0 ]
28 }
29

The first thing to notice right away is that when we build xx in the first place, we are building integers from 1 to n, and check if they are prime afterwards. In that case, we are only going to need odd numbers in xx, so we can build them dircetly as:

31     xx <- seq.int( 3, n, by = 2)
32     xx <- xx[isprime(xx) > 0]

The next thing is that, even though the goldbach5 version only builds the upper triangle of the matrix, which saves some memory, we don't really need all the numbers, since eventually we just want to count how many times each of them appears. For that, there is no need to store them all in memory.

The version goldbach6 below takes this idea forward implementing the counting in C using the .C interface. See this section of writing R extensions for details of the .C interface.

30 goldbach6 <- function( n ){
31     xx <- seq.int( 3, n, by = 2)
32     xx <- xx[isprime(xx) > 0]
33     out <- integer( n )
34     tz <- .C( "goldbach", xx =as.integer(xx), nx = length(xx), 
35       out = out, n = as.integer(n), DUP=FALSE )$out
36     tz[ tz != 0 ]
37 }
38

and the C function that goes with it

   1 #include <R.h>
   2  
   3 void goldbach(int * xx, int* nx, int* out, int* n){
   4     int i,j,k;
   5     
   6     for( i=0; i<*nx; i++){
   7         for( j=i; j<*nx; j++){
   8             k = xx[i] + xx[j] ;
   9             if( k > *n){
  10                 break;
  11             }
  12             out[k-1]++ ;
  13         }
  14     }
  15 }
  16

We need to build the shared object

$ R CMD SHLIB goldbach.c
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic  -g -O2 -c goldbach.c -o goldbach.o
gcc -std=gnu99 -shared -L/usr/local/lib -o goldbach.so goldbach.o   -L/usr/local/lib/R/lib -lR

and load it in R:

> dyn.load( "goldbach.so" )

And now, let's see if it was worth the effort

> system.time( out  system.time( out  system.time( out 

We could also have a look at the memoty footprint of each of the three functions using the memory profiler. 

> gc(); Rprof( "goldbach4.out", memory.profiling=T ); out  gc(); Rprof( "goldbach5.out", memory.profiling=T ); out  gc(); Rprof( "goldbach6.out", memory.profiling=T ); out  rbind( summaryRprof( filename="goldbach4.out", memory="both" )$by.total[1,] ,
+        summaryRprof( filename="goldbach5.out", memory="both" )$by.total[1,],
+        summaryRprof( filename="goldbach6.out", memory="both" )$by.total[1,] )
            total.time total.pct mem.total self.time self.pct
"goldbach4"      32.08       100     712.6      1.26      3.9
"goldbach5"       6.66       100     306.7      2.80     42.0
"goldbach6"       0.22       100       0.2      0.00      0.0

Goldbach's Comet - take 2

Romain François