A fun electromagnetic induction demo

A couple of days ago, I had a lunchtime-preoccupation with antennas and electromagnetic induction. We observe a neat phenomenon if you put two wires near each other, and Maxwell stated the phenomenon quite eloquently in his treatise on electromagnetism.  Basically, the instant you change the voltage across one of the wires, the wire next to it (but not touching it) responds with a current that quickly goes away.

By wiggling around the voltage in certain patterns, we can find interesting ways to transmit information back and forth between one another, and through induction we can do this without direct physical contact.  These days it’s easy to take for granted the all the internets sent to us via WiFi and forget about how highly developed an area radio is.

But to get taste of the physical communication, not much is required: basically some power sources and wires. A wire can in fact be practically used as an antenna for certain radio frequencies.

A simple way to check out induction with a laptop is to see if you can transmit a song that usually plays from your speaker through the air and back to yourself.  Since the sound card does both audio input and output as well as the analog-to-digital conversion, a lot of the hairy parts are taken care of for us.

To do this, we need to have the sound output (i.e. from your headphone audio jack) go to a wire.  Next to that wire is another wire that goes into the microphone or “line in” audio jack.  Keep in mind, these are just wires–that’s all.  I just forced myself to part with the headphone that came with my music player and another that I got on an airline flight.

Wires out of headphone and microphone jacks

As you can see, the ends of the two headphones lay prostrate next to the two wires connected into the microphone and headphone jacks:

Close up of decapitated headphones microphone and headphone jacks

So, now all you have to do is start up some music and record the input with some audio recording program (I used GarageBand since I’m on a Mac):

Capturing inducted audio

The green bars towards the middle-right wiggle around while recording indicating that the wire is receiving something.  Doing the same thing without the wires in the line in and out yields no green bars wiggling around.

Because there’s so much power loss, the signal is faint, so I digitally amplified it some in the software.  But horray! I rather shittily transmitted sound back tomyself over air via just two wires.  This, I think, is a neat way to illustrate induction.  Listen for yourself. You’ll need to turn up your volume a bit.

MacBook and 64 bit address spaces

I occasionally like to show off how I can do some pretty sophisticated calculations and experiments right on my Macbook. For these tasks, over-speced machines and special purpose hardware are sometimes assumed to be the right solution, but a judicious use of the resources on a MacBook may get the job done in a similar time.

One thing that would be nice is if the OS actually supported 64 bit address spaces (after all, the Core 2 Duo processor is a 64 bit one).  This comes in handy when one wants to mmap large files (e.g. greater than a few gigabytes).  On 32-bit architectures, the address space is limited by the word size (in this case 2^{32} bits or about 4 GB of space and even less-so with kernel limitations).

Supposedly the next version of OS X will have support for this (and will come on the MacBook Pro installations, but I assume not the MacBook).

Histograms and density estimation

It is a common practice in the statistical analysis of data to group the samples into bins or categories.  The goal is usually to extract some structure that is useful and often hidden from us when looking at direct plots of the sample set.

For instance, a time series plot of the change in stock price may appear erratic and contain no order, but observing the frequency of certain price-changes may help us realize that on average there is no change as well as give us an idea as to the practical bounds to how much the price will change.

The purpose of this note is to speak a little to the role of histograms in statistics and probability.   While a variety of complications arise when using them for the estimation of a probability density function, they serve as a very practical (and efficient) tool in exploratory data analysis and provide a grounding for other forms of density estimation.

Histogram Construction

  • For a given a sample, what bin does it belong to?
  • How many samples are in a given bin?
  • What is the numerical range or volume of a given bin?

From here forward, we consider data samples which can lie in some region of the real number line (e.g. integers, fractions, and roots of integers). We’ll represent a sample of numbers by the vector \mathbf{x}, say \mathbf{x}=(1.2, 3.002, 1, 1, 2).

Basic histogramThe diagram outlines some relevant quantities of a histogram.  The minimum and maximum values of the sample (1 and 3.002 in the example above) help define the range that we will distribute our bins in.  We represent the bin width as h.  It is not required that a histogram have fixed-width bins, but this is one of the most commonly used and useful formulations.

The starting offset (0 \leq b < h) is also a useful quantity.  For instance, if the first bin starts at \min(\mathbf{x}) or b=0, any number falling in the range (\min(\mathbf{x}) \leq x < h) contributes to the quantity of elements in that bin.  For h=0.3, the range of the first bin in the example above is (1 \leq x < 1.3).  The two 1s and 1.2 will fall in the first bin.

Suppose we shift the bins a little to the left (i.e. b=0.1).  The range of the first bin is now (.9 \leq x < 1.2).  The two 1s fall in the first bin while 1.2 falls in the second.  So it is easy to see how both the bin width and offset affect our view of the histogram.

The total number of bins in a histogram can be represented as:

m=\left \lceil \frac{\max{(\mathbf{x})}-\min{(\mathbf{x}})}{h} \right \rceil + 1

where the \lceil x \rceil represents the next integer larger than x, and this follows analogously for \lfloor x \rfloor.  We add 1 to the traditional bin count to accommodate our offset b.

Integer bins in histogram

When constructing a histogram on a computer, we desire a function \text{bin}(x) that maps a sample to an appropriate bin index k=(0,1,\ldots,m-1).  Assuming that we begin the bins at 0, the bin index can be determined by dividing the sample from the bin width: \lfloor x/h \rfloor.  For h=0.3, it’s easy to see that numbers less than 0.3 will fall in bin 0, while a number twice as large will fall in the second bin, and so on.  Accounting for the offset, we have:

\text{bin}(x) = \left \lfloor \frac{x-\min{(\mathbf{x})}-b}{h} \right \rfloor

We also desire a function that will allow us to compute the start of the interval for a given bin index:

\text{interval}(k) = kh + \min{(\mathbf{x})}-b

The formulation above allows us to experiment with the two essential qualities of a histogram (bin width and offset) and it should be straightforward to see how this simple approach generalizes for multi-dimensional histograms of equal bin widths, except we then consider more generally the bin volume (e.g in the three-dimensional case the volume v=h^3).

In the terminology of computer science, these are constant time procedures.  If the bin widths are not fixed and arbitrarily set by the user then a more complicated method is required.  One can use a specialized function or binary search for finding the appropriate bin in this case [1].   Histograms can also accept a  variety of types (e.g. integers, single- or double-precision floating-point numbers) as inputs and if the compiler knows about this information ahead of time, it can optimize its code even further while providing a general interface of accumulators [2].

Histograms as density estimates

We now turn to a more mathematical perspective on histograms.  It is natural to treat a histogram as the estimation of a probability density function.  Once armed with the density function a variety of tools from probability and information theory are at our disposal, but we must take caution that the estimates of the density function hold true to the data sample and question whether we should be estimating the densities in the first place to solve the desired problem.

To get a taste for what this is like, we will derive the maximum likelihood estimator for a mathematical step function that represents the histogram.

Step function view of a histogram

Here, the function has m parameters: \mathbf{c}=(c_0,c_1,\ldots,c_{m-1}).  Provided our data sample and a given bin k, what is the optimal height, c_k^*, of the step function?

To find an estimate of the density function p_H(x; \mathbf{c}) we express the likelihood of our sample of data given the model parameters as:

\mathcal{L}(\mathbf{x}|\mathbf{c}) = \displaystyle \prod_{i=0}^{n-1}p_H(x_i; \mathbf{c}) = \displaystyle \prod_{k=0}^{m-1} c_k^{n_k}

Here, n_k represents the number of observed samples in bin k.  In other words, we’d like find c_k^* that maximizes the probability that we’ll find n_k items in bin k.  This optimization is subject to the density function constraint:

\displaystyle \sum_{k=0}^{m-1} c_k v_k = 1

where v_k is the volume of the given histogram region.

Constrained optimizations of this form can be solved using the method of Lagrange multipliers.  In this case, we have a multidimensional function:

f(\mathbf{c}) = \log \mathcal{L}(\mathbf{x}|\mathbf{c}) = \displaystyle \sum_{k=0}^{m-1} \log c_k^{n_k} = \displaystyle \sum_{k=0}^{m-1} n_k\log c_k

subject to the constraint that:

g(\mathbf{c})= 1- \displaystyle \sum_{k=0}^{m-1} c_k v_k = 0

Note that we took the log of the likelihood function so we can deal with sums rather than products.  This will come in handy when taking derivatives in a moment and is a common practice.

The gradient of a multidimensional function results in a vector field where each point–in our case represented by \mathbf{c}–has an associated vector of steepest ascent or descent on the multidimensional surface [3].  Our goal is to find where the steepest ascent of f(\mathbf{c}) is equal in magnitude and direction to that of g(\mathbf{c}).  This equality is represented as:

\nabla f(\mathbf{c}) = \lambda \nabla g(\mathbf{c})

And we can express this combined optimization with:

\nabla \Lambda(\mathbf{c}, \lambda) = \nabla f(\mathbf{c}) - \lambda \nabla g(\mathbf{c}) = 0

where \Lambda(\mathbf{c}, \lambda) is called the “Lagrangian”.  Computing the derivatives:

\frac{\partial \Lambda(\mathbf{c}, \lambda)}{\partial c_k} = \frac{n_k}{c_k} - \lambda v_k = 0

From this, we can see that

\frac{1}{\lambda}n_k = c_kv_k

Since \sum_{k=0}^{m-1} c_k v_k = 1 and \sum_{k=0}^{m-1} n_k = n, \lambda=n.  Thus our optimal height and estimate of the density is:

c_k^*= \hat{p}_H(x) = \frac{n_k}{nv_k}, \; \text{for all } x \text{ in bin } k

While this estimator is actually quite an intuitive result, it is important to observe that this only in the special case of step functions with varying volumes predefined.  What if, for example, we wanted to see what the best estimator is given a piecewise linear function or a spline between bins?  Or what if we want to find out what the bin volumes v_k should be as well?  This question motivates a general discussion on the estimation of the probability density function given the data sample \mathbf{x}, and these “data-driven” approaches are dubbed non-parametric [4].

Similar issues as those brought up in density estimation also arise in the estimation of mutual information [5].


[1] Given a sample, start at the center-most bin.  If the sample is greater than that value, then start at the center most bin of the right half, otherwise do the same to the left.  Recurse this procedure until the desired bin is found.  The running time of this scales like \log_2(m).  The GNU scientific library provides utilities in this spirit

[2] See the C++ Boost Accumulator library for more information on this.

[3] For a more detailed tour of why this is as well as an intuitive contour plot geometrical interpretation of this, consult an introductory applied math text.  Setting f=c, we can study the function’s “surface contours”.  By definition, a vector \mathbf{x} along this contour does not change the value of f, therefore \nabla \cdot \mathbf{x} = 0.  Because of this, \nabla f must be normal to \mathbf{x}.

[4] Scott DW, Sain SR. “Multidimensional density estimation”. Handbook of Statistics: Data Mining and Computational Statistics 23. 297-306 (2004). Gentle JE. “Nonparametric estimation of probability density functions”. Elements of Computational Statistics. 205-231 (2002).

[5] Paninski L. “Estimation of entropy and mutual information”. Neural Computation 15. 1191-1253 (2003). Fraser AM, Swinney HL. “Independent coordinates for strange attractors from mutual information”. Physical Review A 33. 1334-1140 (1986).

How can I trust you, code?

I’ve been programming a bit lately in C++ Boost for two reasons (1) I find the template metaprogramming aspects intriguing and the generic programming aspects super-convenient. Yet, when I’m really concerned with knowing precisely what’s going on to avoid wasting CPU cycles and memory, I go back to C. But then of course C is doing all sorts of things for me as well (just see the difference in speed in your code with the -O3 optimization flag).

One trend I’ve noticed after reading a lot of code is that bit-twiddlers will often over-optimize. I’m more and more convinced that this not only affects the readability of the code, but it may also be indicative of a lack of trust in the compiler’s optimization. The more control you have, the better you feel, right?

On the other end of things, some people trust objects and functions as black boxes and could care less what’s going on underneath. This post is not for them.

I hate to pull the big CS “language” and “compiler” words out of my pocket again [1], but how can I trust you language and compiler? When do I know I’m over-twiddling and it’s better that the compiler do the work for me? While there are guidelines, I think what it comes down to is a decent understanding of the compiler you’re dealing with and then, basic experimentation.

Most of my questions about whether certain optimizations are actually occuring are typically answered by doing timing tests, creating little “experiment” code bits to test a simple hypothesis, or look directly at the assembly code (e.g. the “-S” flag in gcc).

This did get me thinking, though, with so many different compilers and architectures–even with standards and benchmarks–a lot of these little questions aren’t answered. This process of creating precise tests and getting the answers seems to be the best approach [2].

[1] But I do believe these topics have some of the most interesting research opportunities from a CS perspective. As I foray into scientific computing and its applications to the sciences, I hope I will have the chance to tackle a CSee problem or two along the way.
[2] Wouldn’t it be nice if there was a nice repository of contributed tests for a variety of architectures, compilers, and languages, and that these tests can be tagged with all this information? In the spirit of my last post, I say let’s start a massive conglomerate blog where people can post these tests. And then there could be another one where people just bitch about the goodness and badness of the tests. It’ll be like a somewhat moderated coding newsgroup for today’s age. Oh how I’ve blurred the line between honest desire and sarcasm here.

All my blog pipes are clogged

It seems that there’s been a lot of hype lately about blogs selling out and the more personal touch being lost (so much so, that it’s not even worth linking to articles). It’s as if there’s this big trend and direction that we’re attempting to extract for where things are headed. To me, the partition between a blog with a personal twist and a more professional media one is quite clear (and it certainly doesn’t have to happen when you’re sponsored by a company).

For the obviousness it’s worth, I certainly prefer the random thoughts over the “professional updates”, and there’s of course still room for both.