20110927

Notes on Matlab's Parallel Tools

A couple days ago I sat through a Matlab information session on Matlab's parallel tools. This session largely convinced me that Matlab's parallel tools are not worth using.

Basically, Matlab's simple parallel tools do not offer much benefit over conventional Matlab optimizations, and Matlab's advanced parallel tools are no easier to use than the tools in say, Python, or other simple parallelism solutions.

Matlab's simple parallel tools (parfor) offer some advantage over unoptimized code, but not much. Parfor parallelizes loop execution on a single machine. However, the matrix operators that underly even single threaded matlab execution are already parallelized through multithreaded implementations of BLAS. Therefore, most applications gain automatic single machine multiple core parallelism simply by vectorizing the code and expressing your operations in terms of matrix operators.

At an intermediate level, for embarrassingly parallel tasks, it is not difficult to script job submission, and to do some weak communication through the filesystem. If you are even a vaguely competent programmer, this far is more pleasant than forking over $20K of your grant money for the distributed computing toolbox, and then learning to use Matlab's distributing computing API.

Matlab's more advanced tools, such as distributed computing and CUDA interfaces, require additional training beyond typical Matlab use. I would argue that any programmer who has the ability to use open source equivalents would benefit form doing so. For an experienced programmer, using these features in matlab is no more difficult than using existing open source solutions. Furthermore, if you are using an open source framework, you never have to worry about the cost of licenses or that people will be unable to use your code because they cannot afford Matlab.

For me, the only benefit of Matlab is that more scientists seem to know it as their only programming language, and so it facilitates collaboration. The major (major major) downside of Matlab is that it is F*cking expensive, and so it effectively a elitist platform confined to large universities and corporations, and so coding publicly funded science in Matlab limits participation, and is in some sense unethical, in my opinion. But, to get back to the point, Matlab's adcanced parallel tools are new, and require additional training to learn how to use. This means that parallel code in Matlab can't be as readily shared within the scientific community, and also that existing open source solutions actually have an older and larger body of users. Thus, the two main advantage of Matlab is not present for its parallel toolboxes.

Finally, having used PyCUDA, I believe that the Python bindings for GPU are actually more advanced than those in Matlab, and the greater expressiveness of the Python language makes working with CUDA kernels much less of a hassle. Python's numerical routines are on par with Matlab's in terms of speed, and the development branch of MatPlotLib is in some ways nicer than Matlab's plotting utilities.

I would argue that there is no longer any reason not to use Matlab over Python. Of course, I am biased. I learned to use Pylab before I learned to use Matlab. I believe that the Python language is as easily learned as Matlab. Both Matlab and Python have quirks that will allow you to shoot yourself in the foot in terrible ways, and I'm not sure either is strictly better in this regard. Their core numerical and plotting routines are comparable. The only advantage that Matlab has to offer is that more scientific code has already been written in Matlab for historical reasons. We can help to change this by favoring Pylab for developing new analysis code, and publishing out Python code to the wider scientific community. Andreas Klöckner sets an excellent example in this regard.

I should add that this is just my personal opinion, and that I have more experience using Python and CUDA, as well as scripting over clusters and using the filesystem for inter-process-communication, than I do with using Matlab's parallel tools. Therefore, I am naturally biased toward familiar, open, free, but perhaps slightly more time consuming, solutions.

No comments:

Post a Comment