Speed Problems: C vs. Gambit Scheme

My Gambit Scheme code is taking 500 times longer than the equivalent C program, and I don’t know why.

So, there’s a really horrible way to spend processor time to calculate pi that involves taking the unit square and inscribing a circle inside it with radius 1/2.  You then partition the square into N partitions on each axis (the code below uses 12,000).  This gives you a unit square and places it on a grid with 12,000 on each side, for a total number of grid squares of 12,000^2 = 144,000,000.  The algorithm then iterates through the center of each square and checks to see if it is inside of the circle.  It sums the number of points inside the circle, divides it my the total number of points, and multiplies by 4, which produces an approximation of pi.

For those already initiated, this is like a deterministic Monte Carlo approach.

Here’s the C implementation (from the professor of my Architecture of Parallel Computers course):

#include <stdio.h>

int main(int argc, char*argv[])

{
  int numPartitions = 12000;
  int circleCount = 0;
  double interval = 0, pi = 0;

  int i = 0, j = 0;

  interval = 1.0/(double)numPartitions;
  for (i = 0; i < numPartitions; i++) {

    double a = (i + .5)*interval;
    for (j = 0; j < numPartitions; j++) {
      double b = (j + .5)*interval;

      if ((a*a + b*b) <= 1) circleCount++;
    }
  }

  pi = (double)(4*circleCount)/(numPartitions * numPartitions);

  printf("Estimate of pi is: %10.8lf\n", pi);

  return 0;
}

Here’s the (mostly equivalent, I hope) Scheme implementation:

(define (main args)
  (let* ((partitions 12000)
         (increment (expt partitions -1))
         (count 0))

    (do ((y 0 (+ 1 y)))
        ((= partitions y) (* 4 (/ count (* partitions partitions))))
      (let ((b (* increment (+ y .5))))
        (do ((x 0 (+ 1 x)))

            ((= partitions x))
          (let ((a (* increment (+ x .5))))
            (if (< (+ (* a a) (* b b)) 1)
                (set! count (+ 1 count)))))))))

(display (string-append (number->string (exact->inexact (main '()))) "\n"))

OK, so, running these on my three-year-old Intel Core 2 Duo T7200 (under Linux 2.6.31), the C version takes 1.03 seconds, consistently.  The Scheme version, compiled with Gambit Scheme 4.6.0, takes about 8 minutes 19 seconds.  I am using a complied binary of the Scheme code, and Gambit was built for the machine I’m running it on, i.e. –enable-single-host was supplied to the configure script.

So, the point of this post is not really that Scheme is slow — in fact, I vastly prefer it.  Rather, I’m looking for some input from people more knowledgeable in Scheme as to what I’m doing wrong here.  I assume do is well optimized, but I’m I suffering because of the let* or nested lets?  What’s killing my performance?

I’ll try to profile it, but I’m not sure how to do that in a way that traces back to the Scheme source after it’s been compiled and linked.  The is the first time I’ve used Gambit (I used PLT in the past, and I’ve done quite a bit of work in Clojure on the JVM), so I could use advice from anyone that knows the C or Gambit environments.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Comments »

 
  • Gambiteer says:

    (declare (standard-bindings)
    (extended-bindings)
    (block)
    (not safe))

    (define (main args)
    (let* ((partitions 12000)
    (increment (fl/ (fixnum->flonum partitions))))
    (let y-loop ((y 0)
    (count 0))
    (if (fx= partitions y)
    (* 4 (/ count (* partitions partitions)))
    (let ((b (fl* increment (fl+ (fixnum->flonum y) 0.5))))
    (let x-loop ((x 0)
    (count count))
    (if (fx= partitions x)
    (y-loop (fx+ y 1)
    count)
    (let ((a (fl* increment (fl+ (fixnum->flonum x) 0.5))))
    (if (flstring (exact->inexact (main ‘()))) “\n”))

    :~> gcc -O3 -Wall -o pic pic.c
    :~> time ./pic
    Estimate of pi is: 3.14159492
    1.230u 0.000s 0:01.23 100.0% 0+0k 0+0io 0pf+0w
    :~> gsc pi
    :~> time gsi pi
    3.1415949166666666
    1.250u 0.000s 0:01.25 100.0% 0+0k 0+0io 0pf+0w

  • Gambiteer says:


    (declare (standard-bindings)
    (extended-bindings)
    (block)
    (not safe))

    (define (main args)
    (let* ((partitions 12000)
    (increment (fl/ (fixnum->flonum partitions))))
    (let y-loop ((y 0)
    (count 0))
    (if (fx= partitions y)
    (* 4 (/ count (* partitions partitions)))
    (let ((b (fl* increment (fl+ (fixnum->flonum y) 0.5))))
    (let x-loop ((x 0)
    (count count))
    (if (fx= partitions x)
    (y-loop (fx+ y 1)
    count)
    (let ((a (fl* increment (fl+ (fixnum->flonum x) 0.5))))
    (if (flstring (exact->inexact (main '()))) "\n"))

  • Axio says:

    /tmp % time gsi-gambit loop2
    3.1415949166666666
    gsi-gambit loop2 101.80s user 1.65s system 97% cpu 1:45.65 total
    [adrien]/tmp % time ./loop-c
    Estimate of pi is: 3.14159492
    ./loop-c 1.11s user 0.00s system 99% cpu 1.124 total

    (declare (block) (standard-bindings) (extended-bindings) (not safe))

    (define (main args)
    (let ((num-partitions 12000.)
    (num-partitions-ex 12000))
    (let ((interval (fl/ 1. num-partitions)))
    (let loop ((i 0) (circle-count 0))
    (if (< i num-partitions-ex)
    (let ((new-circle-count
    (let loop2 ((j 0) (circle-count circle-count))
    (if (< j num-partitions-ex)
    (let ((a2 (flexpt (fl* (+ i .5) interval) 2.))
    (b2 (flexpt (fl* (+ j .5) interval) 2.)))
    (loop2 (+ j 1)
    (if (string (exact->inexact (main ‘()))) “\n”))

    I guess that I’m still far from Gambiteer’s performance, but it’s still better than your original code ^^

  • Axio says:

    Yeah. It’s the same code with more explicit typing and better precalculation.

  • Andrew says:

    Or you could take your original code, convert all the integers to floats and compile it with a different scheme compiler.

    Using bigloo 3.3a I get the following timings…

    bash-3.00$ bigloo -O3 -suffix scheme -native pi.scheme -o pi3
    bash-3.00$ time ./pi3
    3.1415949166667

    real 0m1.913s
    user 0m1.910s
    sys 0m0.000s
    bash-3.00$ time pi-c
    Estimate of pi is: 3.14159492

    real 0m1.570s
    user 0m1.560s
    sys 0m0.010s

    Using the following code…


    (module pi
    (main main2))

    (define (main2 args)
    (display (string-append (number->string (exact->inexact (main '()))) "\n")))

    (define (main args)
    (let* ((partitions 12000.)
    (increment (/ 1. partitions))
    (count 0.))

    (do ((y 0. (+ 1. y)))
    ((= partitions y) (* 4. (/ count (* partitions partitions))))
    (let ((b (* increment (+ y .5))))
    (do ((x 0. (+ 1. x)))

    ((= partitions x))
    (let ((a (* increment (+ x .5))))
    (if (< (+ (* a a) (* b b)) 1.)
    (set! count (+ 1. count)))))))))

 

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>