Archive for the ‘Science’ Category

Accidental Pilish: Unintentionally Constrained Writing in English Literature

Posted March 26th, 2010 on Bespoke

Background:

This post is a little late for Pi Day, but it’s never a bad time for discourse related to everyone’s favourite mathematical constant. Twas on Pi Day of this year that I somehow came across this site, which describes the Constrained Writing task of Pilish, in which the length of each word in letters corresponds to the digits of pi:

The first word in this sentence has 3 letters, the next word 1 letter, the next word 4 letters, and so on, following the first fifteen digits of the number π.  A longer example is this poem with ABAB rhyme scheme from Joseph Shipley’s 1960 book Playing With Words:

But a time I spent wandering in gloomy night;

Yon tower, tinkling chimewise, loftily opportune.

Out, up, and together came sudden to Sunday rite,

The one solemnly off to correct plenilune.

Michael Keith, the author of the above website, has created several works in Pilish, including a full-length book covering the first 10,000 digits of pi!

Trying to write under such constraints can feel extremely awkward, but this made me wonder: How often would strings of words adhering to the constraints of Standard Pilish occur unintentionally? Afterall, with the amount of text out there – the sheer rate at which words are being put together by people all over the world every second of every day – it is to be expected that these things should occur with some frequency p > 0. Such is the Law of Large Numbers.

In order to determine this, I would need a large data set. Luckily, such things are readily available. I settled upon the Project Gutenberg ebook catalog – specifically the union of the July 2006 DVD (17,000 books) and the March 2007 Science Fiction Bookshelf CD (most of PG’s Sci-Fi titles). Altogether, this gave me almost 9GB of text (although I later discovered this contained many duplicates, it’s still a hell of alot of words!)

Next I hacked together a small python script which would find, for each file, the longest string of Standard Pilish. Code for this can be checkedout from my SVN repository: http://svn.nfitz.net/pilish

Results:

Somewhat disappointingly, the longest of any Pilish string was 8 digits of pi. The vast majority of books had a longest Pilish string of around 3-5 words. See the histogram below (note the logarithmic scale in the y-axis).

Five books achieved this 8-digit benchmark, listed below, with the section of Pilish text bolded:

Dismounting and throwing the reins over his horse’s head he came to her smiling, sombrero in hand. “Buenas dias, Senorita. Please may I have a drink?”

“Certainly, Mr. Holmes ; help yourself.” She pointed to the olla hanging in the shade of the ramada.

I was weary of the humdrum life of idling on shore or aimless sailing up and down the channel. The admiral’s was a peaceful mission, and no fighting was expected, but I felt a great curiosity to behold new scenes.

And I have a great Objection to firing with powder only amongst People who know not the difference, for by this they would learn to despise fire Arms and think their own Arms superior, and if ever such an Opinion prevailed they would certainly attack you, the Event of which might prove as unfavourable to you as them.

One was part of the empire, the other was enclosed in Poland, and they were separated by Polish territory. They did not help each other, and each was a source of danger for the other. They could only hope to exist by becoming stronger. That has been, for two centuries and a half, a fixed tradition at Berlin with the rulers and the people. They could not help being aggressive, and they worshipped the authority that could make them successful aggressors.

With the most ambitious of the longer poems–”The Four Monarchies”– and one from which her readers of that day probably derived the most satisfaction, we need not feel compelled to linger. To them its charm lay in its usefulness. There were on sinful fancies; no trifling waste of words, but a good, straightforward narrative of things it was well to know, and Tyler’s comment upon it will be echoed by every one who turns the appallingly matter-of-fact pages…

That last one is the only of the five to have one word of double-digit length, thus covering two digits of pi (‘straightforward = 15 letters = ’15′).

Future Work:

I would like to do a similar analysis of an even larger dataset of more modern language. One possibility is a full archive of Wikipedia. I wonder what is the longest string of unintentional Pilish ever produced?

Another interesting question is how the maximum length of Pilish sections in a document scales with the length of the document, and how well this can be modelled with a simple statistical model such as a Markov Chain.


EVENT: Multidisciplinary Undergraduate Research Conference

Posted March 3rd, 2010 on Terry

As part of UBC’s Celebrate Research week, a great event is happening this Saturday at UBC:

MULTIDISCIPLINARY UNDERGRADUATE RESEARCH CONFERENCE (MURC)
Irving K. Barber Learning Centre, Jubilee Room (4th floor)

Saturday March 6, 2010

MURC celebrates the contributions of undergraduate research at UBC.  The conference provides an opportunity for students in any discipline from across campus to present a research project they have been working on while engaging in scholarly debate amongst each other.  Students have the choice of giving an oral, poster or performing/visual arts presentation of their work.  Presentations are judged by graduate students, and prizes are awarded at the end of the conference day during a celebratory gala.  The conference is held every year in March as the kick-off event to UBC’s Celebrate Research Week.

There is a great variety of presentations spanning the full range of subjects from the Humanities and Science, from Literary Criticism, to Molecular Biology, all researched and presented by undergraduates from UBC and UBC-O. A full list of all presentations and posters, and a schedule of the day’s proceedings can be found here: MURC 2010 Program

Shameless Plug! I will be presenting my own research project entitled “ASSESS: Abstractive Summarization System for Evaluative Statement Summarization” at 4pm in room 355!

The Music of the Stars

Posted January 15th, 2010 on Terry

As the latest in our Terry obsession with Science-Inspired Music, check out Jim Bumgardner’s “Wheel of Stars

Image: European Space Agency/Hubble

To make this, I downloaded public data from Hipparcos, a satellite launched by the European Space Agency in 1989 that accurately measured over a hundred thousand stars. The data I downloaded contains position, parallax, magnitude, and color information, among other things.

As the stars cross zero and 180 degrees, indicated by the center line, the clock plays an individual note, or chime for each star. The pitch of the chime is based on the star’s BV measurement (which roughly corresponds to color or temperature). The volume is based on the star’s magnitude, or apparent brightness, and the stereo panning is based on the position on the screen (use headphones to hear it better).

Jim has a series of other fascinating projects blending mathematics and geometry with music, including the Whitney Music Box.

NASA to Asplode Moon

Posted June 16th, 2009 on Terry

In what experts are calling “totally sweet”, NASA has announced plans to crash a 2000 kg impactor into the moon, triggering a 28m-diameter crater and a six-mile high plume of debris which will be visible from Earth via telescope. No, this isn’t April 1st; the four-month mission commences next Thursday, with intended impact on October 8. The purpose of the mission is to settle the question of whether there is in fact frozen water in the craters of the moon’s south pole.

edus_arrival1

Water ice in the ejected dust cloud will sublime (convert from a solid to a gaseous state) under the influence of solar irradiation. The rate of this sublimation depends primarily on the water to ice ratio and particle size, with ~0.1 mm dust rich (1% water ice) particles subliming their water in several minutes. The surface brightness of the cloud increases as the ejecta expands; in 40 seconds, the ejecta cloud will fill a 1 arc second observing aperture as seen from Earth-based telescopes. Subsequently, the water molecules will be dissociated by solar UV radiation and the OH molecule will be observable in emission at 308nm.

(source)

What I find interesting about this is how NASA can do something like this without consulting the international community. Sending rovers or astronauts to make observations is one thing, but creating giant explosions seems rather unilateral. Granted, an explosion of this size will cause no lasting changes, except for adding one more crater to the already pockmarked Luna, and the potential gain to the space program if water is indeed found would be huge, but it is interesting that there are no laws to govern this sort of thing. There was an attempt at a Moon Treaty in 1979, but since no major space-power signed the agreement, it is considered a failure. I imagine it is only a matter of time before we start to experience tensions over jurisdictions in regards to celestial bodies, especially with China and India’s rapidly expanding space programs.

More Links

A Letter to the PM: Regarding Minister Goodyear

Posted March 17th, 2009 on Terry

In response to the shocking revelation of comments by Canada’s Minister of State for Science and Technology on evolution.

(see here for some great responses from the research community)


To the office of the Prime Minister of Canada:

As a student in scientific field, I wish to express the deep concern I felt reading about Minister Goodyear’s comments on a central fact of scientific knowledge - evolution. I was disappointed to learn that the man in charge of scientific development in this country is so deeply ignorant of his domain. The fact is that evolution is a central pillar of many avenues of scientific research today - from biomedical advances which increase our ability of understand and fight diseases, to even seemingly unrelated fields such as my own - artificial intelligence - where concepts of evolution have been adapted into successful computational techniques. Far from being a controversial issue, as some dishonest partisans imply, there is no controversy amongst scientists; Evolution is a fact, and an important one.

Moreover, it is confusing that the Minister would frame the question as a matter of belief in the first place - evolution is the result of overwhelming evidence and consistent data from a wide array of research avenues. To frame the issue as one of personal belief or even as a matter of religious freedom is to miss the point entirely, and suggests a frightening lack of understanding on the Minister’s part.

To have the Minister of Science be so ignorant of a central fact of scientific knowledge is absurd - as absurd as if the Finance Minister did not “believe” in supply-and-demand, or if the Minister of Defense did not “believe” in the existence of Iraq. How can Canada hope to remain relevant and competitive as a location for research if those in charge are so incompetent? As a student looking towards graduate school, such revelations about our country’s leadership make me seriously question whether I wish to continue my studies in Canada, or go elsewhere.

I sincerely hope that further clarifications will be made on the Minister’s stance on this issue and that, if it his found that he is as ignorant as his previous comments suggest, a more suitable replacement will be found.

Beyond the comments on evolution, I am further concerned that the minister hinted at an approach the research focusing on commercial applications. Such a focus on research that will sell will harm the research community in Canada; pure research is important and valuable, and it should not be the domain of the government to decide which avenues are likely to be the most profitable.

Sincerely,
Nicholas FitzGerald

DNA-Radio: The Human Bod-Cast

Posted March 10th, 2009 on Terry

Do you get bored with Top-40 radio, or the same 500 songs you’ve been listening to for months on your iPod? Ever feel like what you’d really like to be listening to is a robotic voice reading off all the known base-pairs of the human genome? Well now you can! The same people who produced the fascinating DNA-Rainbow have a new project, DNA-Radio, which streams the sounds of the human genome 24/7. At the rate at which it reads it will take about 23.5 years to complete. Listen to the exciting live action here!

This is one of a number of projects which have sprung up recently to turn the human genome into art. Companies such as DNA 11 offer to sequence your DNA to produce unique personalized wall hangings. There is much artistic potential in the code which creates human life. Perhaps next someone should try and write a short story using only the 4 letters of the genome…

Or not.

Human Computation: Harnessing the Power of Procrastination

Posted February 26th, 2009 on Terry

In order to comment on a Terry post, you have to fill out a form which looks like this:

recaptcha-example

If you’ve spent any time at all on the interwebs, you’re probably fairly familiar with this kind of thing. They’re called CAPTCHAs, which stands for “Completely Automated Public Turing-test to tell Computers and Humans Apart”, a rather awkward acronym which nonetheless admirably describes their function of screening the automated scripts which might otherwise be hawking their manhood-enhancing wares on our poor, unsuspecting readers. The basic idea behind CAPTCHAs is to use a test which is easy for humans, but impossible for current AI systems - such as reading highly distorted text.

But this specific CAPTCHA used by Terry, and many other sites around the web, is of a very special variety. It is called ReCAPTCHA, and is the work of a group at CMU lead by the brilliant Luis Von Ahn, whose goal is to harness the work people do when filling out CAPTCHAs into a useful purpose. Believe it or not, every day an estimated 150,000 man-hours are spent world-wide filling out these infernal boxes! Wouldn’t it be great if that time could be spent doing something useful? That’s the idea behind ReCAPTCHA…

What ReCAPTCHA does is to combine bot-filtering with another useful project - the digitization of hard-copy texts such as old books. Modern OCR is highly accurate (>99%), but there are still cases where an OCR is unable to ready a given word accurately - usually the result of some damage or distortion to the text itself. In these cases, the given word is converted into an image for ReCAPTCHA and fed to a human, who can succeed where the computer failed. So every time you fill out a ReCAPTCHA, you are helping to digitize and preserve old books!

This is an incredibly clever example of a new field of development called Human Computation (also called crowdsourcing). The idea is to out-source certain elements of computation to humans, who can perform these tasks better than the computer. The challenge comes in creating the incentive for a human to participate. One technique, as used in ReCAPTCHA, is to harness work which is done by humans anyways, such as filling out CAPTCHAs. Another used by Von Ahn’s group is to turn the work into a game - making the incentive fun! To this end they created gwap (Games With A Purpose)- a site devoted to games which accomplish useful work, from image-tagging (useful for improving image web-search and making the web more accessible to the visually-impaired) to text-summary. By their estimates, if their image-tagging game ESP was played the same amount as popular flash games such as Bejewelled, all images on the internet would be completely tagged in a matter of months. The power of procrastination, properly managed, is truly a wonder to behold!

But the power of the Human Computation paradigm extends beyond those application in which it is explicitly designed. Many examples of internet social networking sites can be seen as a form of Human Computation. For example, sites such as Digg or StumbleUpon act as a powerful filter for vast sea of content available online - only the best content bubbles to the top (in theory anyway…). Furthermore, the large data collection of Audioscrobbler and Last.fm acts as a form of music-similarity algorithm, simply by clustering artists based on the people who listen to them. Dave previously wrote about the power of Google Trends to predict flu epidemics. There is exciting potential here.

Human Computation is an incredibly powerful idea which will continue to develop more interesting and useful applications as the techniques are developed further. If anyone can think of ways we could harness the power of procrastination to solve the problems we discuss on Terry, we could really be in business!

The Future of the History of Science

Posted February 19th, 2009 on Terry

This is a re-hash of a topic I previously posted on my own blog, but I’m hoping the larger audience of Terry might provoke a more… lively discussion than the one spam comment it has so far received…

The occasion was my having just read Uncertainty: Einstein, Heisenberg, Bohr, and the Struggle for the Soul of Science by David Lindley, a very enjoyable and instructive look into the history and personalities surrounding the development of early Quantum Mechanics which I would recommend to any who, like me, knows less about physics than they would like to. Or if you’re stuck on an airplane with terrible film selection, as was the case.

uncertainty

Something which struck me was that many of the insights the book provided into the personalities and private arguments surrounding the historical events were gleaned from letters which the major players had sent to each other, detailing their thoughts and perspectives on the issues.

This got me thinking: how will future science historians gain similar insights into modern scientists, when the nature of modern communication is so transitive? In a world where email, IP-telephony and instant messaging are the dominant modes of discourse, what will remain as a public record for the documentation of scientific development “as-it-happened”? Few people keep old emails forever, and once they are deleted they are pretty much gone forever (unless future historians will be both remarkably skilled in forensic data recovery, and remarkably lucky). Heck, even writing in the margins of hard-copy books, which has historically provided insight into the reader’s personality, and maddening enigmas to spur development, may soon be a thing of the past.

Interestingly, unlike paper media like letters and books, which are more likely to survive if jealously guarded by their owners and more liable to entropy with use, digital data gains longevity from heavy trafficking. Newsgroup, forum and blog posts are likely to have long shelf-lives with services like WayBackMachine and Google caching, whereas private emails, instant-message and Skype conversations will likely be lost. So there’s an interesting conflict between privacy concerns and public interest. Perhaps Google or Amazon or Facebook storing private data might have long-term practical benefits - despite the backlash such occurances generally produce? Now, I’m not for one second suggesting that I like the idea of multinational corporations trawling my private data in order to subtly sell me things, but I do wonder how future historians will gain insight into the personailities of today’s important developers without some storing of personal information. Perhaps we should be giving greater thought to the preservation of digital data - even if it is private?

10 Random Things About 10 Random Things

Posted February 13th, 2009 on Terry

For my first post on the Terry project, I thought I’d riff on the popular chain-letter currently infecting Facebook profiles everywhere. But rather than write about myself (a subject less interesting than any of the items on the following list) I thought I’d write about 10 cool ideas which are currently rattling around my brain. Feel free to continue this modified version of the meme!

(Modified) Rules: Once you’ve been tagged, you are supposed to write a note with 10 random things, facts, or ideas about something OTHER than yourself. At the end, choose some people to be tagged. You have to tag the person who tagged you. If I tagged you, it’s because I want to know more about your thoughts.

1. Though it is cliché to say, I really do not often pay attention to chain-letters. This one caught my attention for two reasons - first that it was so popular, and secondly that it turned up in different unconnected parts of my social web within a very short period of time. I was intrigued that the meme could spread to so many disparate social groups so quickly. I realized that this was just one more example of the Small World Phenomenon - a property of sufficiently connected graphs such as social networks, wherein the average path length between any two nodes is surprisingly small. This phenomenon underlies popular games such as “Six Degrees of Kevin Bacon” or the more modern “Six Degrees of Wikipedia“.

2. About 20 minutes from the time of writing this (meaning it will have long since passed by the time I post this), UNIX time will read 1234567890. UNIX time, a count of the number of seconds since midnight UTC on January 1, 1970, is the system of time-counting used by UNIX-like computer systems. That means you Mac and GNU/Linux users, and likely the webserver on which the Terry site is hosted. This “cool” event is (as far as I can tell) the last significant number which will occur before UNIX time suffers it’s own version of the Y2K-problem at 03:14:07 UTC on Tuesday, 19 January 2038. (Update: WOOHOOOOO!)

3. Pangolins are pretty awesome. I first learned about them watching a random nature documentary on the BBC over winter holidays (thanks, David Attenborough!). They are independently evolved ant eaters whose fur have fused into a scaly shell, with powerful digging claws for attacking termite colonies. They also have the largest tongue-body ratio of any animal - now that’s impressive.

4. I’m annoyed by the chauvinism some philosophers apply to discussions of intelligence and consciousness. The best definitions we seem to be able to come up with for intelligence are based around certain capabilities that are apparent in humans. But as soon as a computer system is designed which has that capability, the definition of intelligence is refined to exclude that capability! Some philosophers (ie. Searle) go so far as to say that even if a computer system were functionally and empirically indistinguishable from a human, it would not be intelligent or conscious. How much more human-centric can one get? This comic (from Ray Kurzweil) sums it up nicely (click picture for larger version):

KurzweilAI

5. If you mess around with statistics for long enough you learn some pretty surprising things. For instance, take the problem of Authorship Attribution, which is an area of Artificial Intelligence research where you try to determine the author of a particular anonymous text by comparing it to a corpus of writing from several different authors. Intuitively, you might think that the best way to do this would be to examine the grammatical and semantic structure of the piece, the words used, etc. But in fact, the some of the most successful approaches do something much less literary, much more statistical. What you do is count each bigram - combinations of two letters (so for example the word “write” would yield the bigrams set {wr, ri, it, te}). Measuring the relative frequencies of each bigram used, you get a “fingerprint” for each piece and each author. Comparing how close the fingerprint of the anonymous piece is to the fingerprints of the various authors is one of the best current ways to attribute authorship. Who woulda thunk it! (source1, source2[pdf])

6. Renaissance astronomer (and webcomic namesake) Tycho Brahe was long thought to have died in a pretty hilarious way. The story goes that he was at a banquet and really needed to go to the bathroom, but felt it would be rude to leave before the dinner was finished. The subsequent bladder infection was thought to have been the cause of his death. Recent research, however, has overturned this legend in favour of mercury poisoning and a possible murder plot. History is cool!

7. This is an amazingly clever application of simulated annealing. Here’s me after about 5 hours:

nick_evol

8. Most people don’t know this, but Alan Turing, widely considered the founder of modern computer science, was homosexual. He was prosecuted for this under the laws of the time, forced to undergo hormone-therapy, and ultimately committed suicide by ingesting an apple laced with cyanide. The old Apple Computers logo (rainbow apple with a bite missing) is thought to be a reference to this, though this has never been confirmed.

9. No matter how unlikely it seems, I secretly hold out hope that P=NP.

10. It is very important that people hear about the issue of Net Neutrality. What makes the internet such a valuable construct is that anyone can post anything they want and it their data will be given equal priority to all else. But recently, ISPs have been threatening to change this situation by - for example- giving faster data rates to websites who pay them for the privilege. The end result of this could be a situation where the internet becomes like cable TV - you pay for a subscription package which gives you access only to a certain set of websites, and have to pay extra to see other sites. For more information on the legal process relating to this in Canada see: http://saveournet.ca/

Was it good for you, too?

Posted February 13th, 2009 on Bespoke

(follow up to this)

1234567890