Network biology note

I’ve been reading a bit more on biological networks lately (specifically those within the cell). There are a lot of interesting open questions and results that draw from molecular biology, network theory, and statistical physics. A few reviews stand out:

Strogatz’s Sync

Evidently, as Feedburner suggests, there are quite a few more than the two or three suspected subscribers to this log’s feed. I’ll be amused to see if that number goes down as I post a bit more consistently, but for the moment it’s nice to know there’re ears out there.

I recently got done reading a book called Sync written in ’03 by Steven Strogatz (doctoral advisor to the professor I’m working with at the moment). Disclaimer: if I write about something I read it’s not really meant to be a “review” so much as “random wonderments” post-reading. My hope is that this style keeps the reading somewhat interesting and digestible while maintaining some semblance of coherence.

As an example, one of the first problems he motivates the entire book with is fathoming how exactly the many many tiny pacemaker cells of the heart fire in, you guessed it, sync to contribute to a heart beat:

…Each pacemaker cell is abstracted as an oscillating electrical circuit … a constant input current causes the voltage across the capacitor to charge up, increasing its voltage steadily. Meanwhile, as the voltage rises, the amount of current leaking through the resistor [wired in parallel] increases, so the increase slows down. [By definition,] when the voltage reaches a threshold, the capacitor discharges and the voltage drops to zero.

Next, Peskin idealized the cardiac pacemaker as an enourmous collection of those mathematical oscillators [all equal and able to communicate to one another]. … Specifically, when one oscillator fires, it instantly kicks the voltages of all the others up by a fixed amount.

The effects of the firing are mixed: Oscillators that were close to the threshold are knocked closer to the firing oscillator but those close to baseline are knocked farther out of phase. In other words, a single firing has synchronizing effects for some oscillators and desynchronizing effects for others. The long-term consequences of all these rearrangements are impossible to fathom by common sense alone.

He goes on, with a colleague, to help extend Peskin’s proof that two of them will indeed synchronize in the long-term to an arbitrary number of them (a bit more of a difficult task). Moreover, he emphasizes that this “so the increase slows down” part of the model described above is crucial for synchrony to occur.

The rest of the book extends on that type of oscillation problem quite a bit more. It undertakes a massive pop-sci sampling of examples of synchrony in a wide variety of areas. Sometimes he’d drift a bit off the topic of synchrony, but eventually some thread brought it right back into play. All of it was entertainingly explained, while he reavealing his own personal connection to the work in an anachronistic way. The text is littered with some of the funniest scientific analogies I’ve seen. For instance, his description of a laser begins with:

Imagine you wake up one morning and find yourself on on alien planet, entirely deserted except for a watermellon and step stool beside it. …

He also made it a point to illustrate where we’ve verified certain types of synchrony in practical science (e.g. Josephson arrays, circadian rhythms) while others are less pragmatic to verify (heart and neural studies, for example).

Something that definitely stuck in my mind afterwards is the way the more traditional research of nonlinear dynamical models and their applicability to practical systems evolved into the more current research of complex systems. Discussing some of his earlier work with Art Winfree:

In 1981, nonlinear dynamics had certainly not advanced to the stage where it could predict the behavior of such rotating waves in three dimensions. There was no hope of calculating their evolution in time, their lashing about, their swirling patterns of electrical turbulence. Even if the calculations were possible (assisted by a supercomputer, perhaps), any such attempt would be premature, since one wouldn’t know how to interpret the findings. … So Winfree felt that the first step would learn how to recognize them, to anticipate their features in his mind’s eye; he would worry about their modus operandi later.

This sentiment appearing back in the ’80s for exploring the dynamics of excitable media (with potential applications to an earnest start at modeling the systems-level chemistry of the intenstines, stomach, and heart) also seems to apply to today’s research in complex networks where the pure mathematics and statistics of dynamical null models for systems needs to be developed in tandem with useful methods to probe and explore a system of interest. I can’t help but feeling that a good balance of the two vantage points where both feed each other is a good way to approach this young field.

Monte carlo sampling the modularity space

Warning: I’m not really doing this entry justice, but it should be good-enough for logging purposes.

For the past few years, the complex networks community has spawned a thread in community detection: very roughly the problem of detecting subgraphs within a network that have a greater density of edges within-community than not.

Though greedy algorithms optimizing a very successful criteria called modularity (obviously there are many alternative methods to optimizing modularity overviewed in the literature) are good for very large networks, one side-issue is properly analyzing the variety of partitions that could be generated that have similar modularities. This has been explored a bit (too many links for the spirit of this post).

One way to tackle this issue is to use a technique called Markov Chain Monte Carlo (MCMC), in which the transition probabilities in your Markov chain are from one possible partitioning of the network to another. Since the number of possible partitions of n nodes in the network is equivalent to the number of possible set partitions with n members, we can’t possibly exhaustively explore the entire space for even moderately sized networks. But we can, if careful, sample from this space with proportional to a bias of our choice because it can be shown that the Markov chain will eventually reach a state of equilibrium.

This technique, among other things, helps us learn more about the properties of the space of partitions as well as analyze whether the partitioning buys us much (i.e. if the nodes really stay in one partition a lot or not for the ensemble of partitions generated at equilibrium). I’ve come across and have been pointed to literature where it’s in statistics, quantum chemistry, and statistical physics, and am most certain it’s used in many more fields of study. Part of the appeal in using the technique is that it allows for a surprising amount of flexibility in choosing the transition probabilities and is conceptually easy to grasp (I wonder how badly it’s abused).

I can now link you to some code I wrote.

It uses the same transition scheme described in Massen-Doye ’06, but samples directly proportional to the modularity rather than via parallel tempering on the Boltzmann distribution (an exploratory suggestion from Aaron Clauset). The results show that this works quite well itself, and the modularity of the partitions at equilibrium usually hover below the maximum determined modularity, as expected.

With this “starting point” (the cleaned up code) I have tried a variety of other transition schemes as well as done some thinking on what would be valuable in terms of the target distribution. This exploration serves as a good basis for continuing to play with modularity, its connection to hierarchical structure, and even dynamical systems.

On a less technical level, I have been slowly gathering notes on what’s actually happening (a.k.a WAH?) in the field of statistical analysis of complex networks (both static and dynamic) as a whole and what certain analysis techniques buy us for particular networks. Right now I’m still engrossed in seminal work, but as they become more refined I’ll post a link to them so that I can perhaps get more feedback on my “worldview” of the area.

And even more tangentially related, those of you who may reproducing research, there is a slight error in the reporting of the modularity of the karate club network for the fast algorithm (the author has been aware of this) and that the second pass of the CNM algorithm should be run at step MAXQ-1 to get the partition with optimal modularity.

Communities of papers in arXiv

So as a first-pass exploration into the community structure aspect of complex networks, in particular arXiv, I read-up, then scripted up an exploration on some test citation networks. End result: some marginal progress as expected, but it bought me some increased maturity and satisfaction of concrete play. Also, who doesn’t like to click on nodes and see the communities?

You can see the web page and a TeX-note if you’re interested in killing a few minutes of your time.

Special thanks post-exploration to Aaron Clauset and Michelle Girvan who gave me helpful guidance and suggestions on continuing with this community analysis research after a particularly insightful talk given today. Now finally on to more interesting reading/work on a topic I’ve meant to learn properly for a while.

What is an energy landscape?

This note simply tries to get across as accurately and conceptually as possible what a potential energy landscape is. It is part of an inquiry into complex networks and physical structure. The field has quite a bit of activity, and the discussion can be seen as a simple introduction to the topic. It is difficult to find introductions at this level, so it may be of use to others, but mostly I just wrote it to log some progress. Thanks to a friend for his thoughts and I welcome other thoughts and better-yet corrections (though I don’t expect any comments at such an early stage of me web logging).

You can download the PDF directly via this link.

Why do we care about the small world properties of atomic clusters?

This summary is to try to keep a bigger picture in mind before getting too specific too quickly: a brief survey of a topic of interest in a text. It is specifically in the context of global optimization. I first wrote this post a few months back. Most of meat is obtained from Wales’s Energy Landscapes and Doye and Massen’s “Characterizing the network topology…atomic clusters”. Finding the global minimum and local minima on an energy landscape (a purely theoretical thing determined from pure quantum-mechanical models or hybrid quantum-classical models) gives us cues as to how a set of molecules will structurally stabalize. Practically, this information is invaluable to molecular structure prediction, which helps us better understand how molecules are actually built.

The issue lies in finding the global minimum. I don’t pretend to deeply know this area or detail it here by any means, but I may soon! Finding the global optimum in general is an NP-hard problem in computational complexity. In other words, “there is no known algorithm that is certain to solve the problem within a time that scales as a power of the system size”. So what do we do? Exploit structural properties of a given atomic cluster to drastically improve the running time of the algorithm. (Wales, “Global optimisation”)

This subfield in itself is very active with many different algorithms suited for all sorts of structures. But as far as small worlds go, it seems a basin hopping global optimization approach that could take advantage of the relatively scale-free properties of the landscape. But as implied (Wales, “Small worlds”), we should take caution in considering the cluster we’re applying these algorithms to, how exactly the global optimization search is improved, and whether the small world results apply generally enough to a broader class of molecules or not:

It is not yet clear how general this result may be for other potential energy surfaces [than Lennard-Jones]. For example, the connectivity of the constrained polymer conformations was found to exhibit a Guassian rather than a power law distribution for a noninteracting lattice polymer. For the Lennard-Jones polymer…Doye has found that the connectivity distribution does not really follow a power law, but the scaling is stronger than exponential…If low energy minima are generally highly connected then they should be easier to locate in global optimisation searches…However, the best way to utilise this information more efficiently is not yet clear.

Though it has been a few years since that was written, it seems to still hold true from my cursory understanding. Most of the work that has been done seems to be applied to Lennard-Jones clusters. Needless to say, I’m motivated to dig a little deeper.