Etherplex

Rick Dillon's home on the net…

Speed Problems: C vs. Gambit Scheme

with 7 comments

My Gambit Scheme code is taking 500 times longer than the equivalent C program, and I don’t know why.

So, there’s a really horrible way to spend processor time to calculate pi that involves taking the unit square and inscribing a circle inside it with radius 1/2.  You then partition the square into N partitions on each axis (the code below uses 12,000).  This gives you a unit square and places it on a grid with 12,000 on each side, for a total number of grid squares of 12,000^2 = 144,000,000.  The algorithm then iterates through the center of each square and checks to see if it is inside of the circle.  It sums the number of points inside the circle, divides it my the total number of points, and multiplies by 4, which produces an approximation of pi.

For those already initiated, this is like a deterministic Monte Carlo approach.

Here’s the C implementation (from the professor of my Architecture of Parallel Computers course):

#include <stdio.h>

int main(int argc, char*argv[])

{
  int numPartitions = 12000;
  int circleCount = 0;
  double interval = 0, pi = 0;

  int i = 0, j = 0;

  interval = 1.0/(double)numPartitions;
  for (i = 0; i < numPartitions; i++) {

    double a = (i + .5)*interval;
    for (j = 0; j < numPartitions; j++) {
      double b = (j + .5)*interval;

      if ((a*a + b*b) <= 1) circleCount++;
    }
  }

  pi = (double)(4*circleCount)/(numPartitions * numPartitions);

  printf("Estimate of pi is: %10.8lf\n", pi);

  return 0;
}

Here’s the (mostly equivalent, I hope) Scheme implementation:

(define (main args)
  (let* ((partitions 12000)
         (increment (expt partitions -1))
         (count 0))

    (do ((y 0 (+ 1 y)))
        ((= partitions y) (* 4 (/ count (* partitions partitions))))
      (let ((b (* increment (+ y .5))))
        (do ((x 0 (+ 1 x)))

            ((= partitions x))
          (let ((a (* increment (+ x .5))))
            (if (< (+ (* a a) (* b b)) 1)
                (set! count (+ 1 count)))))))))

(display (string-append (number->string (exact->inexact (main '()))) "\n"))

OK, so, running these on my three-year-old Intel Core 2 Duo T7200 (under Linux 2.6.31), the C version takes 1.03 seconds, consistently.  The Scheme version, compiled with Gambit Scheme 4.6.0, takes about 8 minutes 19 seconds.  I am using a complied binary of the Scheme code, and Gambit was built for the machine I’m running it on, i.e. –enable-single-host was supplied to the configure script.

So, the point of this post is not really that Scheme is slow — in fact, I vastly prefer it.  Rather, I’m looking for some input from people more knowledgeable in Scheme as to what I’m doing wrong here.  I assume do is well optimized, but I’m I suffering because of the let* or nested lets?  What’s killing my performance?

I’ll try to profile it, but I’m not sure how to do that in a way that traces back to the Scheme source after it’s been compiled and linked.  The is the first time I’ve used Gambit (I used PLT in the past, and I’ve done quite a bit of work in Clojure on the JVM), so I could use advice from anyone that knows the C or Gambit environments.

Written by Rick

March 14th, 2010 at 8:36 pm

Posted in Programming

7 Responses to 'Speed Problems: C vs. Gambit Scheme'

Subscribe to comments with RSS or TrackBack to 'Speed Problems: C vs. Gambit Scheme'.

  1. (declare (standard-bindings)
    (extended-bindings)
    (block)
    (not safe))

    (define (main args)
    (let* ((partitions 12000)
    (increment (fl/ (fixnum->flonum partitions))))
    (let y-loop ((y 0)
    (count 0))
    (if (fx= partitions y)
    (* 4 (/ count (* partitions partitions)))
    (let ((b (fl* increment (fl+ (fixnum->flonum y) 0.5))))
    (let x-loop ((x 0)
    (count count))
    (if (fx= partitions x)
    (y-loop (fx+ y 1)
    count)
    (let ((a (fl* increment (fl+ (fixnum->flonum x) 0.5))))
    (if (flstring (exact->inexact (main ‘()))) “\n”))

    :~> gcc -O3 -Wall -o pic pic.c
    :~> time ./pic
    Estimate of pi is: 3.14159492
    1.230u 0.000s 0:01.23 100.0% 0+0k 0+0io 0pf+0w
    :~> gsc pi
    :~> time gsi pi
    3.1415949166666666
    1.250u 0.000s 0:01.25 100.0% 0+0k 0+0io 0pf+0w

    Gambiteer

    16 Mar 10 at 18:16


  2. (declare (standard-bindings)
    (extended-bindings)
    (block)
    (not safe))

    (define (main args)
    (let* ((partitions 12000)
    (increment (fl/ (fixnum->flonum partitions))))
    (let y-loop ((y 0)
    (count 0))
    (if (fx= partitions y)
    (* 4 (/ count (* partitions partitions)))
    (let ((b (fl* increment (fl+ (fixnum->flonum y) 0.5))))
    (let x-loop ((x 0)
    (count count))
    (if (fx= partitions x)
    (y-loop (fx+ y 1)
    count)
    (let ((a (fl* increment (fl+ (fixnum->flonum x) 0.5))))
    (if (flstring (exact->inexact (main '()))) "\n"))

    Gambiteer

    16 Mar 10 at 18:17

  3. Gambiteer

    16 Mar 10 at 18:24

  4. /tmp % time gsi-gambit loop2
    3.1415949166666666
    gsi-gambit loop2 101.80s user 1.65s system 97% cpu 1:45.65 total
    [adrien]/tmp % time ./loop-c
    Estimate of pi is: 3.14159492
    ./loop-c 1.11s user 0.00s system 99% cpu 1.124 total

    (declare (block) (standard-bindings) (extended-bindings) (not safe))

    (define (main args)
    (let ((num-partitions 12000.)
    (num-partitions-ex 12000))
    (let ((interval (fl/ 1. num-partitions)))
    (let loop ((i 0) (circle-count 0))
    (if (< i num-partitions-ex)
    (let ((new-circle-count
    (let loop2 ((j 0) (circle-count circle-count))
    (if (< j num-partitions-ex)
    (let ((a2 (flexpt (fl* (+ i .5) interval) 2.))
    (b2 (flexpt (fl* (+ j .5) interval) 2.)))
    (loop2 (+ j 1)
    (if (string (exact->inexact (main ‘()))) “\n”))

    I guess that I’m still far from Gambiteer’s performance, but it’s still better than your original code ^^

    Axio

    22 Mar 10 at 00:25

  5. Yeah. It’s the same code with more explicit typing and better precalculation.

    Axio

    22 Mar 10 at 00:40

  6. Or you could take your original code, convert all the integers to floats and compile it with a different scheme compiler.

    Using bigloo 3.3a I get the following timings…

    bash-3.00$ bigloo -O3 -suffix scheme -native pi.scheme -o pi3
    bash-3.00$ time ./pi3
    3.1415949166667

    real 0m1.913s
    user 0m1.910s
    sys 0m0.000s
    bash-3.00$ time pi-c
    Estimate of pi is: 3.14159492

    real 0m1.570s
    user 0m1.560s
    sys 0m0.010s

    Using the following code…


    (module pi
    (main main2))

    (define (main2 args)
    (display (string-append (number->string (exact->inexact (main '()))) "\n")))

    (define (main args)
    (let* ((partitions 12000.)
    (increment (/ 1. partitions))
    (count 0.))

    (do ((y 0. (+ 1. y)))
    ((= partitions y) (* 4. (/ count (* partitions partitions))))
    (let ((b (* increment (+ y .5))))
    (do ((x 0. (+ 1. x)))

    ((= partitions x))
    (let ((a (* increment (+ x .5))))
    (if (< (+ (* a a) (* b b)) 1.)
    (set! count (+ 1. count)))))))))

    Andrew

    4 May 10 at 15:44

  7. Hi!
    send you post: to my @aqiuuyhw twitter

    effiltUnina

    28 Oct 11 at 05:23

Leave a Reply