The Return of the Great Filesystem Hierarchy

The problem: I have data of the same general ‘type’ spread across many web services and my personal files stored on my computer.  For example, I have PDFs of books in a ‘Books’ directory in my file system and books from Amazon in my kindle.  Or I have movies/videos on my computer, on the Play store, on Amazon, etc.    There is no one place where I can see/browse my entire movie, music, or book collection across these services.  I want a way to still be able to use certain  apps and services while still maintaining a high-level view of where all my stuff is.  For example, when I have a Kindle book, I want to still use the Amazon reader and all of its features, but for a PDF I may prefer to use OS X’s Preview or some other reader.   I want a way to reconcile the fact that, for example, a ‘book’ is the type of thing I really care about organizing with the fact that applications like Amazon’s cloud reader want your ‘thing’ to be your entire  library of books.

The file system is a solution:  Remember the days when all of your stuff was on a single file system organized hierarchically?   It’s not so convenient anymore to store all of your data in one place, and super awesome and useful cloud apps have contributed significantly to this.   However, by using your file system as your primary view, I’d argue that most of your ‘things’ across can be organized again.  One solution is to use file types that represent links.   For example, in OS X these are ‘.webloc‘ files (just drag a link from your browser to the Desktop and you will see how this looks).   You can double click on one of these and it will open up the link in a browser.   Between apps on my computer and cloud apps, using these links covers most of my cases (e.g. you can link to a specific book on Amazon or to an album on Google Play Music).  There’s even an app in Google Drive (by default using .glink extension) that properly handles .webloc files.

A natural extension to this approach is the ability to associate each thing with metadata (e.g. like a row in a database) allowing you to view  your data in multiple ways.

In any case, using files and links to web apps seems to be a good interim solution to this problem (which first appeared as a G+ post).

Why Isn’t 23andMe’s Disclaimer Good Enough?

23 and Me Spit Kit

[Image from]

The company 23andMe offers a different kind of ‘tech’ than I usually talk about on this blog, but it is meant to be a useful service for you (and for the company to collect information across many individuals).  Give them $100, and they will mail you a ‘spit kit’ with a tube that you literally spit in and send to it to them for analysis. They then use genotyping to determine whether you have certain variations in your DNA that are associated with your ancestry, disease, and other potentially genetically-derived traits.

I personally used their services about a year ago, mostly interested in ancestry.  Honestly, I wasn’t  too interested in their disease associations (though I believe this will become more and more useful as we sequence more individuals and develop better computational/statistical methods to analyze the data).  In the scientific community,  genetic associations with diseases are still, for good reason, met with skepticism (see, for example, this recent article).    This skepticism has worked its way up to politicians, resulting in various efforts to regulate laboratory developed tests.  I believe genetic testing and screening has a lot of potential and am excited that so many efforts exist to provide individuals with these kind of services, but as many scientists would probably agree, we are just beginning to understand how complex traits and diseases are associated with genetic variations across individuals.

23andMe recently received a letter from the FDA to “immediately discontinue marketing the PGS [Personal Genome Service] until such time as it receives FDA marketing authorization for the device”.

What surprised me when reading the letter was this statement:

Some of the uses for which PGS is intended are particularly concerning, such as assessments for BRCA-related genetic risk and drug responses (e.g., warfarin sensitivity, clopidogrel response, and 5-fluorouracil toxicity) because of the potential health consequences that could result from false positive or false negative assessments for high-risk indications such as these. For instance, if the BRCA-related risk assessment for breast or ovarian cancer reports a false positive, it could lead a patient to undergo prophylactic surgery, chemoprevention, intensive screening, or other morbidity-inducing actions, while a false negative could result in a failure to recognize an actual risk that may exist.

Perhaps because I have training in computational biology, I always assumed this information is meant to be more like a ‘hint’ or ‘suggestion’ that you might want to investigate a medical issue further with a physician who can conduct specialized follow-up tests and recommend further courses of action.  For instance, I have a minor version of Beta-thalassemia that apparently doesn’t require medication or affect my daily activities. 23andMe correctly identified that I have this trait.  This disorder can be discovered via blood testing by a physician, so if I didn’t already have these tests, my 23andMe report may have alerted me to investigate this further.   I think this is useful information.

My 23andMe page also displays this disclaimer:

The genotyping services of 23andMe are performed in LabCorp’s CLIA-certified laboratory. The tests have not been cleared or approved by the FDA but have been analytically validated according to CLIA standards. The information on this page is intended for research and educational purposes only, and is not for diagnostic use.

I do think that it could be made more prominent perhaps with a clause that states that some of these associations are in a stage of ‘early research’, but I am writing this post more to pose a general question to anyone that has some insight into the legal aspects of this recently newsworthy topic: why isn’t the disclaimer above (or a better one) good enough for the FDA?   Certainly, I understand that the general public doesn’t have the same knowledge I do about genetic tests, but given a disclaimer like this, why can’t 23andMe sell me their services and let me make follow-up decisions?

Some FDA links that are relevant:

Think Before You Tech

First, this blog has been revived!   The last post before the Nexus 7 Experience was a long time ago.   Since then, I have become a Ph.D candidate, transferred from University of Maryland to Carnegie Mellon, and made a web page summarizing some of the work at my recently re-acquired domain.  At first I was thinking of moving my blog over to Weebly which is what I used to host the web page. (I decided not to hand-code my web page this time and it was so much faster to get the design I wanted.)   I decided to stick with WordPress because I think they do a really good job of distributing posts to a larger audience and I have had a number of useful interactions on their platform.

To inaugurate this second coming, I wanted to mention something more philosophical that has been on my mind lately, and that’s the role of tech and gadgets in our lives.  I occasionally post on various social media effusing my excitement on new gadgets or technologies.   Are these just my toys?  Short answer: YES! I enjoy playing with new tech toys.  This being said, I believe it’s important to emphasize a “think first” approach to new technologies.  Social networks like Facebook and Twitter are great: and new devices can make it that much easier to access all the new news feeds and information. But these feeds can also be thoughtless diversions from valuable human experiences: sharing a great conversation with someone, helping others out, and as someone I know pointed out recently on a private G+ conversation: even doing basic chores like vacuuming (this was in reference to an automatic iRobot cleaner)!

For me, the benefit of new technologies is to inspire users to create and share ideas and enhance the overall experience of learning and growing as human beings.  Another benefit is to reduce needless suffering in the world (e.g. lack of shelter, hunger).    I appreciate when big companies like Apple and Google focus on enhanced experiences as opposed to introducing just another new technology.  For example, the recent Google Plus keynote hinted at this a little when Vic Gundotra gave the example of capturing a special moment with his children.   I think we are just scraping the surface here.    In the future, when I talk about my experiences with technologies and products, I will try to focus even more on the aspects I mentioned at the beginning of the paragraph.

Wikipedia on your coffee table!

So I recently thought I had this awesome idea that you could take the “featured articles” in Wikipedia and turn them into a book or volume of books that people could buy and just keep one on their coffee table.  You may agree with me that there’s a certain serendipity associated with traditional binded books or journals that is hard to replicate when browsing the web on a desktop, laptop or iPad-like device.

I was not too surprised, a little disappointed, and pretty excited to find out that this has been done.  The most comical creation of a Wikipedia book has to go to Rob Matthews for actually attempting to bind all the featured articles.  On a more serious front, I was happy to see that PediaPress, wiki-to-print publishing service, is partnered with the Wikimedia foundation to do exactly this.  It’s integrated quite well with Wikipedia so you can click on the articles you want to include in your own book or you can select books that have already been compiled by PediaPress.

As an alternative to purchasing a book, you can also generate a PDF that you can print at your own leisure.  My only gripe when immediately trying this is that scalable content Latex equations are embedded in the PDFs as if they are images.  It still looks decent, though, and is pretty cool!

PubCasts for Journals

To practice a presentation I recently gave on a paper, I recorded myself speaking (an odd experience).   It seems that a very common way people hear about research is through presentations, so why not take this to the next level and post a nice screencast with an article you publish?

Well, this actually happens quite a bit, and after a recent lunch conversation, I was motivated to look more into the sources of “screencasts for papers” that I have watched in the past.

While I think many students and researchers are familiar with video lectures (such as MIT’s opencourseware or, one form of screencasts that focuses on research articles stood out to me as having technical enough content to properly motivate a full research paper: SciVee.

This website promotes YouTube-like screencasts of research papers and has coined the term “PubCast“.

I’m definitely all for this, so what’s the problem?

Thankfully, due in large part to the open access movement, I find myself visiting the journal sites themselves more and more for the content and supplementary information.  Especially for open access journals like PLoS, researchers have less of a reason to post a PDF version of their article on their web page.  Rather, why not just link directly to the open access content which makes the article visible in HTML and PDF?

While SciVee is partnered with PLoS (awesome), I noticed that the associated PLoS pages never link to the SciVee video (as far as I can tell).  Moreover, PLoS Bio and Comp Bio videos seem to stop existing on SciVee after around 2008 or so.

This is a shame, because even if there’s a lot of great video content out there associated with these papers, no one knows about if they visit the journal’s home page.  Why not just stick the video link here?:

I think the basic idea is out there and there are some early-adopters of this technique (not necessarily limited to SciVee’s services, of course).  What I think it needs is better marketing and accessibility from journal websites.  Intuitively, it seems like well-done screencasts promoted by the journals themselves (perhaps even made a part of the editorial process) could really be good for getting the journal and its papers more attention.

Quickly Creating Ajax Web Demos

In about 50 lines of Python code, you can create the skeleton for a web demo that uses Ajax. This way you can have an interactive demo with HTML links, access to your executables and scripts, etc.; but you’re now free from the system restrictions of JavaScript (which exist for security reasons).

The reason why the 50 lines is impressive is that in order to free yourself from the system restrictions of JavaScript, your web page needs to communicate data back and forth with a web server, and potentially with a database if you have lots of persistent data. Usually you have to install and configure all sorts of servers and settings to get this working.

Since I want my demo to exist within a self-contained directory, Python’s batteries included philosophy comes to the rescue. With WSGI and SQLite, I can (1) create a minimalistic web server that delivers a JavaScript demo and can (2) select what to display from potentially massive amounts of data.

It’s just one file that runs the server. I can then open my browser to “localhost:8000” and the demo can proceed. Of course, I just placed a skeleton below. You can make things more fancy regarding Ajax and JSON as well as by using more heavy-weight WSGI implementations than wsgiref.

Update: a Vim ‘:TOhtml’ issue was fixed in the code below

from wsgiref.simple_server import make_server

ajax_html = """

<title>AJAX + wsgiref Demo</title>
<script language="Javascript">
function ajax_send()
    hr = new XMLHttpRequest();"POST", "/", true);

    hr.onreadystatechange = function()
        if (hr.readyState == 4)
            document.getElementById("result").innerHTML =

<form name="f" onsubmit="ajax_send(); return false;">
        <input name="word" type="text">
        <input value="Do It" type="submit">
    <div id="result"></div>

def intact_app(environ, start_response):
    if environ["REQUEST_METHOD"] == "POST":
        start_response("200 OK", [("content-type", "text/html")])
        clen = int(environ["CONTENT_LENGTH"])
        return [environ["wsgi.input"].read(clen)]

        start_response("200 OK", [("content-type", "text/html")])
        return [ajax_html]

httpd = make_server("", 8000, intact_app)

Uh oh, it’s begun …

I just had the urge to sit down without my laptop and check out something on the net.  And I thought, “it would be nice to browse in a space larger than a smart phone.”  I didn’t want to code, type much on a keyboard+play with a mouse, or sit upright staring at my laptop.

It does seem like the iPad has gotten an unnecessarily bad reputation given that it’s exploring a rather different space of human-computer interaction than your usual PC or smart phone.  I was actually less interested in the MacBook Air, which seemed like a slicker, lighter-weight computer that I would feel less inclined to run my code on because it’s significantly slower than my current 13” MacBook (which I payed much less for).

But the notion of a tablet that I use solely for internet and communication + some additional bonus stuff sounds not-so-bad.  I’m tempted to get one just to see how much I would use it.  I did this with a NetBook. I figured, being a computer-geek, that $250 is worth seeing whether the thing improves my life.  It turns out I didn’t really find a NetBook that handy.  Part of why I feel I didn’t benefit much from the NetBook is similar to the reason I wasn’t interested in the MacBook Air — it’s still a computer that I open up, type on, and use a mouse — so the weight/slickness benefit doesn’t outweigh the sacrifice in performance compared to my current laptop. The iPad seems to have some benefits over an e-Book reader in the sense that it allows a more interactive experience with the display that can show web pages like you would see them on your laptop.   This, in combination with the fact that I lose the keyboard and mouse, might be enough to make it a useful secondary computing device.

Wikipedia-like blog entries?

This is a sort-of half-baked notion that may have some flame war started somewhere on the net, but I can’t resist.

The combination of a noble call for better Wikipedia articles in a specific subject with a recent example of someone blogging to correct his own entry makes me fantasize: what if individuals wrote blog-like entries that were, in effect, articles like those on Wikipedia?

Why not just create an HTML page or contribute to Wikipedia?  As far as contributing to Wikipedia, don’t get me wrong: I think that’s great. But as outlined in linked “noble call”, there are some legitimate issues–most notably that of authorship.  Blogging allows one to retain authorship and control but, unlike HTML, it facilitates commenting and allows RSSee updates to interested readers and contributors.

In the second link, the creator of Bittorent is correcting his own Wikipedia entry. This makes sense at some level for the sake of clarification and posterity.  There’s no guarantee that if he edits his own entry (say anonymously), the changes will persist, and it may not be appealing to constantly keep watch of minor changes to articles either.

In the days of web one point yore (say before Wikipedia and Wolfram’s Math/Physics pages), if I Googled “Maxwell’s equations”, any page put up by some schmuck like me would probably be dicey at best.  But now, I can just feel lucky and get some pretty decent information from Wikipedia.

But due to the number of people editing the entry, there is a certain lack of authorship and potentially unwillingness to even contribute in the first place.

These days, we see lots of academic bloggers posting their lecture notes on specific topics online (some even making the lecture notes themselves posts rather than linking to PDFs), so parts of me feel like we’re already seeing this kind of behavior, though lectures tend to be less encyclopedic.

The thing I like about this approach is that it’s decentralized and personable in the sense that we can use the fact that we trust/value certain people’s discourse more than others to our advantage.  Potential downsides: (1) we’re that much more reliant on a YourFavoriteArticleRank algorithm to rank the articles for us (for example, when you link to another article, do you link to your favorite one? the top-ranked one? eh hem, Wikipedia?), (2) more people/groups would need to get blogs, and (3) the notion of a “collaborative article” is substituted with a “first author” (the writer of the post) and “contributers” (commenters).  But those concerns don’t seem that dire.

Science in the recent stimulus

Prodded by prof Tao’s post, I spent a little too much time sifting through official documents on the congressional budget as well as other articles related to NSF funding and the stimulus amendment.

Jake Young at Pure Pedantry has a nice post discussing the issue and outlines recent attention given to this subject by Science and Nature. Something I thought I’d forward on….