Consider the Nimrod Programming Language

Mid-April Update: Thanks for the thoughts everyone! The post quickly grew to >10,000 views and reached #10 on Hacker News for a while. I continue to enjoy the language for a multitude of applications from basic scripting to ‘hard work’ tasks.

I  love to play around with new computer programming languages. Even though I spend most of my time in industry tested standards for their respective applications (e.g. Java, C, C++, Python, Javascript, …), I think there are a lot of good reasons–especially these days–to learn and experiment with new languages. The impact of modern language development isn’t limited to a cottage industry of computer scientists and programmers. Take the growing Scala language as an example.  Twitter transitioned from a framework primarily using Ruby to Scala to scale their service and to maintain a programming model they desired. I also believe we are finally beginning to see languages that are elegant, expressive, and, importantly, fast. For example, these days, a company using two industry standards like Python and C++ might do ‘heavy lifting’ in C++ and write a lot of ‘high level’ wrapper code and scripts in a language like Python. Why not use just one language? Why is Python ‘slow’ for some tasks and C++ ‘unpleasant’ for scripting tasks? A good language should be expressive enough to elegantly express domain-specific tasks while allowing the programmer to make the things that need to be fast, fast.

Why the competition may not quite fit the bill

I could just list out great Nimrod features and say: ‘consider it’, but I don’t think that these features are very useful without some explanation of why these features provide an overall better experience than other compelling languages.  When it comes to picking  a programming language that attempts a speed-elegance unification, there are a lot of choices. The five on the ‘short list’ that I discuss in this post are:

There are other options that I could put on this list like Haskell or Go, and I have my reasons for picking the 5 above, but I don’t want to discuss them right now.  What I would like to do is convince you that Nimrod is a particularly nice language to consider since the design decisions they made, to me, result in an elegant, expressive, and fast language (though I  understand people have different syntactic preferences).   These are my initial thoughts after nearly three weeks of coding a lot in Nimrod. I am writing this because I think the language needs to get more attention than it has, and it deserves to be taken seriously as a worthy competitor to the other four mentioned above.

Of course this is not a super-detailed comparison, but overall, I hope I provide some  reasons you may want to consider Nimrod over these other very nice languages. They all have stuff to offer over Nimrod and vice-versa. And there are also a number of overlapping features.  Ideally I would like to have a highly expressive, fast language that is what I call “K&R-memorable” which basically means that it approximately as easy to understand it as it is to understand C (all you do is read K&R and you’re good).

C++11 has really brought C++ a long way. Coding with it results in a lot less boiler-plate code and it did a reasonable job of incorporating higher-order functions and handy value-semantic preserving features such as move semantics.  However, there’s still a lot of boiler plate (e.g. it’s 2014 and I’m still writing header files separate from source because compilation time with header-only files is too slow?), and now I need to implement more operators for classes to preserve value semantics (don’t forget to implement the move assignment operator!). So C++11 is nice and incorporates some modern features, esp. because it works with all other C code, but it’s much too complex, and I think, far less elegant than the other alternatives.

Scala and Rust are both very interesting languages (in general, simpler to understand than the totality of C++11). I have had a good deal of experience with Scala and have played with Rust for a couple of minor tasks. Both languages implement traits. To me, traits are a far more elegant way of adding similar functionality to different objects when compared with multiple inheritance. But my experience with Scala has shown me that while it is easy to use libraries, it is harder to design them in the midst of a complex graph of how objects and traits are related to one another. I spent a lot of time engineering the types to be just right, which is great, but it was also frustrating and I felt that the safety I desire at compile time would be more easily achieved without such a complex system.  I will discuss some design decisions made by Nimrod below that I think result in less time spent on type nitpicking and more time spent on getting the ‘job done right’ with reasonable safety features.   Rust provides more built-in memory safety, which is great, but understanding how it all works and the ‘conversations’ with the compiler after coding can be frustrating. Believe it or not, sometimes hard guarantees with types/traits and memory are not worth the pain when programming (i.e. the programmer effort required, mental model and syntactic complexity).  I think this is precisely why the adoption of a slow dyamically duck-typed language like Python has been so successful. They’re ‘easy’ to program in.  I think Nimrod is a happier medium.

Julia’s motivation comes from two places. The language resembles typical scientific programming syntax (ala Matlab and Pylab) that executes fast when compiled, and offers extensive and intuitive metaprogramming capabilities since it is homoiconic like Lisp. (And the scientist in me really likes the IJulia notebook feature that they have apparently worked quickly to develop.) I will show some examples below on how Nimrod offers a powerful and elegant metaprogramming environment without necessarily being homoiconic.      My only real concern with Julia is  lower-level systems programming. Forced garbage collection can be a game-changer here, and I’m not sure I think its choice of being ‘largely’ dynamically typed is a good one in this setting either.   Providing a library developer some level of type annotation and type class restriction can be useful for engineering purposes and more helpful when dealing with compile-time errors.    I work in the area of computational biology and I am left wondering: is Julia the right language to build the fastest read aligners, gene expression estimators, etc.? These tools are often written in C/C++, so Julia code would have to beat that!  A similar sentiment applies to Scala: it’s dependence on the JVM has actually resulted in very poor performance in even a simple multicore application, in my experience.

Quick start with Nimrod

OK, so you should read the tutorial and eventually the manual on the web site to get a quick start and get to know the language better, but I’ll tell you how I started using it: as a scripting language. I know this isn’t the best for ‘performance’ testing, but any language that has this ‘unification’ quality should be equally good at scripting as it is for high-performance applications. Here is a simple example:

import os

proc shell(cmd: string) =
    if os.execShellCmd(cmd) != 0:
       raise newException(EOS, cmd & "returned non-zero error code")

proc fexists(fname: string) : bool =
    try: discard Open(fname)
    except EIO: return false
    return true

const fromScratch = false

shell "clear"

if fromScratch:
    echo "Removing cached files and log"
    shell "rm -rf nimcache log.txt"
    echo "All output in log.txt"
    echo "Compiling ..."
    shell "g++ -fPIC -O3 -shared -std=c++11 bla.so bla.cpp 2>&1 >> log.txt"

# More of the pipeline, e.g.

if not fexists("blah.txt"): createBlah()
else: useBlah()

This, to me is a very clean way to do basic shell scripting while having the power of a full programming language.

Nimrod avoids ‘over-objectifying’

OK, so that was relatively straightforward. Here is a simple example of how to create a matrix type (partially inspired from a stackoverflow post):


type Matrix[T] = object
    nrows, ncols: int
    data: seq[T]

proc index(A: Matrix, r,c: int): int {.inline.} =
    if r<0 or r>A.nrows-1 or c<0 or c>A.ncols-1:
        raise newException(EInvalidIndex, "matrix index out of range")
    result = r*A.ncols+c

proc alloc(A: var Matrix, nrows,ncols: int) {.inline.} =
    ## Allocate space for a m x n matrix
    A.nrows = nrows
    A.ncols = ncols
    newSeq(A.data, nrows*ncols)

proc `[]`(A: Matrix, r,c: int): Matrix.T =
    ## Return the element at A[r,c]
    result = A.data[A.index(r,c)]

proc `[]=`(A: var Matrix, r,c: int, val: Matrix.T) =
    ## Sets A[r,c] to val
    A.data[A.index(r,c)] = val

iterator elements(A: Matrix): tuple[i:int, j:int, x:Matrix.T] =
    ## Iterates through matrix elements row-wise
    for i in 0 .. <A.nrows:
        for j in 0 .. <A.ncols:
            yield (i,j,A[i,j])

proc `$`(A: Matrix) : string =
    ## String representation of matrix
    result = ""
    for i in 0 .. <A.nrows:
        for j in 0 .. <A.ncols:
            result.add($A[i,j] & " ")
        result.add("\n")

The first thing to notice is that a matrix is an object type that contains data and its number of rows and columns. All the methods take a matrix as the first argument. This matrix is generic on any type Matrix.T. An alternative syntax where ‘[T]’ comes after a procedure name may also be used. Nimrod uses a uniform call syntax that implies these two calls are equivalent:

A.alloc(nr,nc): ...
alloc(A,nr,nc)

Notice that ‘elements’ is an iterator. This is a very efficient iterator called an ‘inline’ iterator. You can read more about this in the tutorial and manual. The `$` operator before a variable is the standard ‘to string’ operator. This allows you to do:

echo A

and a matrix will be printed out.

The uniform call syntax is a simple way to support a lot of ‘call-chaining’ like behavior commonly seen in object-functional programming and avoids forcing methods to be in objects. As an example, say I have a BigInt, and a little int and I want to be able to support addition of them. In Nimrod, you simply write a procedure that overloads the ‘+’ operator and it works (otherwise you get a compile time error). In a language like Scala, you define what’s called an ‘implicit conversion’ to do this for you. The added idea of an implicit conversion on objects and having to define them so explicitly seems more complex than just overloading the operator.  Note that there are other cases where you would like to use implicit conversions and Nimrod provides this capability. Calls to procedures in Nimrod can be ‘pass by reference’ in C++ world:

proc test(x:var int) =
   x=5

var x = 3
echo x
test(x)
echo x

results in:

3
5

The compiler will chose the appropriate method at compile time to call based on the types in the procedure. Nimrod also supports multiple dispatch.

Nimrod has an intuitive type system

As mentioned above, traits are a nice way of defining components of functionality tied to an object and the compiler will error out if certain traits are required, but missing, for example. I also mentioned that this can lead to complexities in library design and engineering (which may be good or bad depending on your perspective and the outcome).

One feature of Nimrod that’s appealing is that it offers the programmer type classes — the ability to group types into a single type (e.g. define float and int to be of type number), and distinct types — the ability to create two different types corresponding to the same underlying data type (e.g. dollars and euros are both ints in their tutorial example). Similar to type classes, Nimrod also allows constraints on generic types, and support for additional constraints is in the works. So the compiler will provide an error message if a method is not defined for a particular class of types its defined on or if a desired method is missing. Traits appear to be a formalism that could be useful, but might result in a lot of added complexity given the capabilities already provided by type classes and distinct types. Nimrod also supports an effects system which allows for additional compile-time safety checks.

You will want to metaprogram in Nimrod

Nimrod makes it easy to extend the language and the abstract syntax tree to generate the code you want. Say I wanted to do an openMP-like parallel for using Nimrod’s threads over a shared sequence of data. The thread code looks a lot like this:

template parallelFor[T](data:openarray[T], i:expr, numProcs:int, body : stmt) : stmt {.immediate.} =
    let numOpsPerThread = (data.len/numProcs).toInt
    proc fe(j:int) {.thread.} =
        for q in 0 .. numOpsPerThread-1:
            let i = j*numOpsPerThread+q
            body # so something with data[i]
    var thr: array[0..numProcs-1, TThread[int]]
    for j in 0 .. numProcs-1:
        createThread(thr[j], fe, j)
    joinThreads(thr)

But using the template as defined above, I can just do:

parallelFor(sequence, i, numProcs):
    # do something with sequence[i]

Note: this is just a toy example showing how to do some handy metaprogramming using a lower level version of threads.  The developers are working on a much better way of handling threads and parallelism at a higher level. The tutorial defines a debug statement (a variation of it which I use a lot) that shows how you can actually modify the abstract syntax tree.

C code and Memory Allocation

The typical compilation scheme in Nimrod I use is to compile to C code. They introduce a nimcache/ directory with all the C and object code for me to inspect to see how efficient it is compared to what I would write in C. It’s fairly straightforward to link into C code if you must.

The C code Nimrod generates often appears indistinguishable in speed when compared to hand-crafted C code I made in certain examples. Nimrod is much more pleasurable to program in than C, and the compile-time and run-time error messages are far better than C.

Also, I’d like to note that Nimrod allows for manually allocated memory and low-level operations to provide the developer ‘C-like’ control.  In most cases the standard libraries using the GC are appropriate, but in some cases you may want to manage your own data on the heap and Nimrod allows for this.

Young language, helpful community

The Nimrod language is young and has a handful of developers working on making it to a 1.0 release. The Nimrod community has been very helpful to me and I think it has a lot of potential.

I’m writing this post based on my experiences so far. I would really appreciate any feedback if I’m wrong or misrepresented a language I discussed. The post will be modified accordingly with acknowledgement.

Thanks to Rob Patro and the Nimrod community for useful discussions.

Advertisements

A couple of recent Python tidbits

Five years ago the two most practical and used computer languages for me were C and Perl.

While I still use C for the nitty-gritty stuff that needs to be fast, I’m finding that a lot of stuff can get done, and get done very fast in a scripting language.  Over a year ago I learned the wonders of Ruby (which, to me, is basically a superior replacement to Perl from a ‘it’s fun to code in’ perspective  and is easy to transition to from Perl).

But overwhelmingly, I’ve found myself enjoying and using Python.  The biggest selling point from my perspective is the high quality scientific computing and plotting support (which in many cases has replaced my use of R-project for these types of things).

Here are three little tidbits that I’ve recently found handy and take virtually no effort to begin using:

(1) First,to speed up the things that need the speed, calling C functions from Python is super-handy.  I really like the ctypes module because in many cases, as long as your C functions take the default types as inputs, you can simply expose your function via a dynamic library with no real extra effort.

(2) List comprehensions are a fun and very useful feature in Python.  But many times, you don’t want to create the whole list before iterating through it.  Here’s where generator expressions come in.  Instead of brackets around your comprehension, just put parens, and suddenly you can iterate over these elements without creating the entire list in advance (especially useful with a module like itertools).

(3) Finally, one thing that had bugged me a little in Python, especially since I’m used to C, is that there really is no real scanf equivalent [1].  What I end up doing is parsing out my variables from a string and on separate lines explicitly setting their type.  This just takes too many lines (and I could often do it in fewer lines in C)!   After some thought, a lab mate and I converged on:

>>> types = [float, str, int]

>>> fields = [‘3.14159’, ‘Hello’, 10]

>>> pi, hi, ten = [f(x) for f,x in zip(types,fields)]

[1] Note that while there is a suggestion in the Python docs on how to do this, it just suggests how to extract different types with regular expressions, not concisely convert them.

Inkscape and Latex

I’ve been a fan of the open source Inkscape for some time now, especially for lower-level vector graphics drawing. But for diagramming purposes, I’ve tended to use OmniGraffle (made only for the Mac). I’ve found it pretty handy to use OmniGraffle with the program LatexIt (which comes bundled with the MacTex distribution).

Little did I know that Inkscape comes with batteries included Latex rendering support.  And with its “connectors” tool, Inkscape is a very competitive alternative to diagramming that is cross-platform and is open source.  For Latex rendering, it converts your favorite Latex equation into SVG (via Extensions->Render->LaTeX Formula).

As you can see from the image, I rendered an equation and then rotated the summation symbol 90 degrees counter-clockwise since it is just another SVG object to play with in Inkscape.

This is a powerful feature that comes with the Inkscape distribution, but unfortunately you may not see it in your menu.  You can Google around and figure how to get this to work based on various forum posts (though depending on your setup this may take a while).

Because it can be kind of a pain to figure out how to get this “default” feature to work properly, I thought I’d explain it for Ubuntu and OS X in one place so it would be potentially easier for others to get it going.

First of all, by “default” or “batteries included”, I mean that this is a Python extension that is included by default in the Inkscape software distribution.  According to the Python file itself, for the plugin to work properly:

eqtexsvg.py
functions for converting LaTeX equation string into SVG path
This extension need, to work properly:
– a TeX/LaTeX distribution (MiKTeX …)
– pstoedit software: <http://www.pstoedit.net/pstoedit&gt;

eqtexsvg.pyfunctions for converting LaTeX equation string into SVG pathThis extension need, to work properly:    – a TeX/LaTeX distribution (MiKTeX …)    – pstoedit software: <http://www.pstoedit.net/pstoedit&gt;

I’m assuming you have a Latex package installed (e.g. on Ubuntu, something like ‘texlive-full’ or ‘lyx’ or on OS X, the MacTex distribution).

The plugin basically takes the equation you feed in, runs latex and dvips on it to create a Postscript file.  The real meat comes in the program ‘pstoedit‘: it converts your postscript file to SVG.

But you have to make sure this program is installed properly.  If you try to install it from source with the default settings, it may not work because for SVG output in pstoedit, you need the GNU plotutils library.

The easy solution for both of these is to install pstoedit from a package repository such as apt on Ubuntu and MacPorts on OS X and it should depend on plotutils.  Unfortunately, the install on OS X may take some time because a larger list of dependencies are actually all compiled before pstoedit itself is installed.

But after you install these packages, restart Inkscape and the plugin should show up in the menu and work.

Quickly Creating Ajax Web Demos

In about 50 lines of Python code, you can create the skeleton for a web demo that uses Ajax. This way you can have an interactive demo with HTML links, access to your executables and scripts, etc.; but you’re now free from the system restrictions of JavaScript (which exist for security reasons).

The reason why the 50 lines is impressive is that in order to free yourself from the system restrictions of JavaScript, your web page needs to communicate data back and forth with a web server, and potentially with a database if you have lots of persistent data. Usually you have to install and configure all sorts of servers and settings to get this working.

Since I want my demo to exist within a self-contained directory, Python’s batteries included philosophy comes to the rescue. With WSGI and SQLite, I can (1) create a minimalistic web server that delivers a JavaScript demo and can (2) select what to display from potentially massive amounts of data.

It’s just one file that runs the server. I can then open my browser to “localhost:8000” and the demo can proceed. Of course, I just placed a skeleton below. You can make things more fancy regarding Ajax and JSON as well as by using more heavy-weight WSGI implementations than wsgiref.

Update: a Vim ‘:TOhtml’ issue was fixed in the code below

from wsgiref.simple_server import make_server

ajax_html = """
<html>

<head>
<title>AJAX + wsgiref Demo</title>
<script language="Javascript">
function ajax_send()
{
    hr = new XMLHttpRequest();

    hr.open("POST", "/", true);
    hr.setRequestHeader("Content-Type",
        "application/x-www-form-urlencoded");

    hr.onreadystatechange = function()
    {
        if (hr.readyState == 4)
            document.getElementById("result").innerHTML =
                hr.responseText;
    }
    hr.send(document.f.word.value);
}
</script>
</head>

<body>
<center>
<form name="f" onsubmit="ajax_send(); return false;">
    <p>
        <input name="word" type="text">
        <input value="Do It" type="submit">
    </p>
    <div id="result"></div>
</form>
</center>
</body>
</html>
"""

def intact_app(environ, start_response):
    if environ["REQUEST_METHOD"] == "POST":
        start_response("200 OK", [("content-type", "text/html")])
        clen = int(environ["CONTENT_LENGTH"])
        return [environ["wsgi.input"].read(clen)]

    else:
        start_response("200 OK", [("content-type", "text/html")])
        return [ajax_html]

httpd = make_server("", 8000, intact_app)
httpd.serve_forever()

Git on that train

Warning: As the title suggests, I’m on a train.

In the last week and a half or so I’ve been using Git for a project amongst coworkers and most recently for my own code and text files.  I was a bit skeptical.  But after going through their excellent documentation, seeing the videos, and most importantly, a lot of tinkering, I’m realizing that it’s making life better for me.

There are a lot of resources that compare SCMs, so I don’t want to worry here about which is better and why. But I’d like to share two things that I’ve really liked about using Git.

  1. First off, using something like Git doesn’t necessarily have to be for a big collaborative project.  This may be a sort of different take than the usual, but I like to see it as a tool that helps me see different “views” of my files depending on the job.  Anyone who’s written a lot of text or code realizes that it’s actually quite hard to make things as modular as one would like and that sometimes we’re relegated to grabbing snippets of text here and there rather than making black boxes.  While this is not encouraged as an engineering practice, it’s sometimes very useful for play. With Git, I’m less concerned about making all my source code fit in a super-consistent modular framework and more concerned with focusing on doing a task cleanly.  This is possible because I can branch and merge with relative ease (which I understand is a pain in the ass with other SCMs).  In a branch I can move some files around and do whatever I want without affecting the source.  I can then cherry-pick commits I like from that into another branch.  These branches can look wholly different, but the code can still be updated.  This flexibility makes me worry less about organization and directory structure and more about just choosing the right, minimalistic view for the job.  I now see things in terms of diffs and commits and Git provides the machinery to do real work with them.
  2. Git is minimalistic, local, and fast. Which is great.  It’s a  small source code base that compiles quickly and gives me handy command line utilities.  Proper use of them = power (though even with good documentation there’s a learning curve involved).  Unlike a lot of SCMs, Git is designed to be local.  While it can do a lot of stuff over the network, it’s modus operandi is in a local repository (which is just one .git/ directory in your root directory).  I, in fact, don’t even use the SSH/SSL features layered on top.  Git helps me realize what I’ve changed and worked on and I build patches from there.  You can email them to whoever, and applying them is easy.

And if I’m frustrated with some apparent inadequacy, it’s likely I can find some post with Linus himself justifying it with a little intellegence (e.g.).

Handy Makefile

The following is a snippet for a handy and concise Makefile that will make executables for all C files in a directory.  It’s good for “sandboxing” this illustrates some of Make’s useful features without before consulting a larger resource.  Tip of the hat to Erik for helping me polish it up.

# Flags for gcc
FLAGS = -D_GNU_SOURCE -O3 -g

# All C files as sources, and chop off the .c for targets
SOURCES = $(wildcard *.c)

TARGETS = $(patsubst %.c, %, $(SOURCES))

all: $(TARGETS)

# All targets without an extension depend on their .c files
%: %.c
	@echo "Building $@"
	@gcc $(FLAGS) $< -o $@

# The "@" symbol suppresses Make actually displaying the command. 

clean:
	@echo "Removing hidden files"
	@rm -rf .*.swp *.dSYM ._* 2> /dev/null
	@echo "Removing executables"
	@rm -rf $(TARGETS)        2> /dev/null

The nice thing about Make is that it’s useful not only for things like C code. I’ve even used it (quite some time ago) to piece together tracks of music using ecasound.