Bob’s blog.

Two Crocodiles have emerged from the water.


R project map-reduce-filter tidbit

The R-project is a great tool set for statistical computing (this is its speciality) and even just to have around for quick calculations, plots [1], and data manipulations. The community is large and the source is open. It provides a nifty Unix-like environment to work in and is available on three major operating systems. </advertisement>

Because of the large community size and the highly-interpreted nature of the language, there is definitely an “impure” feeling about using it since some packages have procedures that will call their own C or Fortran code while others use the language directly. I personally like to see it as a good platform for exploratory data analysis/prototyping ideas, and like to leave the more heavy-lifting to something..different.

That said, since its internals are kind of Scheme-like, the expressiveness of the language for data manipulation in particular can be quite handy [2]. The introduction describes a function “tapply” which is useful for applying functions to groups of items with the same category. In that neighborhood there is also “lapply/sapply/mapply” [3] which are like the traditional “map” function. “subset” is very much like the traditional “filter” function.

Not-so-advertised are the “Map“, “Reduce“, and “Filter” functions (tricky-little capital letters). The differences between the traditional FP functions above and the two R analogs listed in the last paragraph are mostly conveniences for the way R treats its data.

If you use R or are interested in experimenting with it, keep these functions in mind because they can make just a few lines of code do some pretty awesome things.

[1] Gnuplot and matplotlib are also pretty good open source alternatives to plotting, and there are of course any non-open source options. In my opinion, experimentation with R is definitely worth the time if you’re playing with plots, and willing to side-step a bit from the Python bandwagon, since Python does have some well-developed statistical and scientific computing tools.

[2] See their “Manuals” section for some good introductory documentation and language definition. Many scripting languages these days support higher order functions.

[3] “mapply” is a neat sort-of multimodal map.


Link advertisement: I came across Texify a while back. It is great for inserting TeX in HTML notes/documentation/blog entries without bothering with the more in-depth Latexing better suited for article creation.