Sunday, June 10, 2007

Mandelbrot Set on FPGA

Mandelbrot set is a set of points in the complex plane that forms a fractal. It's popular outside of mathematics because of its aesthetic appeal. It's also a typical parallel computing application implemented on a variety of platforms. My professor suggested me to try it on soft multiprocessor on FPGA.

The reference code for algorithm is from sf.net. The most impressive design is QuickMAN by Paul Gentieu with core algorithm implemented in assembly code. It's incredibly fast. The disadvantage is that it works on Windows only and doesn't support multicore yet.

The reference code for FPGA is from Xilinx. First it's implemented on one-PowerPC architecture on FPGA. However, it's very slow because there is no FPU. Next, I added a Microblaze processor with FPU. It gets much faster. In the third step, I added four Microblaze processors as following shows.


There is still some area on the chip and I can add more processors to accelerate. I updated BlazeCluster to add PowerPC support and I can use it to add processors easily. All source code and additional document are available at http://quickwayne.googlepages.com/

This implementation, however, is still much slower than QuickMAN running on a normal PC. The major reason is probably: 1) low CPU speed. Currently all microblaze runs at 100Mhz; 2) Inefficient floating point libarary. I use the floating library from Xilinx. It's efficient in area but probably not best in performance; 3) slow external memory access. The reason I can't make it more detail is that I can't make profiling on PowerPC work. I did what's in the manual and got a 250MB profiling file which is wrong. Even it works, it's still difficult to look into it because software-intrusive profiling is not accurate enough. It's better to have some on-chip profiling counters as Pentium or Athlon has.

Below is some tricks of the code.

1) Source code on all microblaze processors are identical. The only difference is actually compile options. There is a macro for software to know on which processor it's running. The only difference in binary code is communication dual port memory base address. In other words, the macro decides the address during compiling.

2) I use single-precision floating point because the FPU on microblaze doesn't support double-precision. Meanwhile, in case of double precision, it doesn't use FPU at all. It just emulate it.