Containing the Semantic Explosion

Yesterday afternoon, I delivered a talk to the PhiloWeb Workshop at the WWW2012 Conference titled “Containing the Semantic Explosion” with Cameron Buckner and Colin Allen. It is an overview of the InPhO Project architecture, known as dynamic ontology, and a preview of some forthcoming data mining tools. [slides]

The explosion of semantic data on the information web, and within digital philosophy, requires new techniques for organizing and linking these knowledge repositories. These must address concerns about consistency, completeness, maintenance, usability, and pragmatics, while reducing the cost of double experts trained both in ontology design and the target domain. Folksonomy approaches address concerns about usability and personnel at the expense of consistency, completeness, and maintenance. Upper-level formal ontologies address concerns about consistency and completeness, but require double experts for the initial construction and maintenance of the representation. At the Indiana Philosophy Ontology (InPhO) Project, we have developed a general methodology called dynamic ontology, which alleviates the need for double experts, while addressing concerns about consistency, completeness and change through machine learning over a domain corpus, and concerns about usability and pragmatics through human input and semantic web standards. This representation can then be used by other projects in digital philosophy, such as the Stanford Encyclopedia of Philosophy (SEP) and PhilPapers, along with resources outside of digital philosophy enabled by the LinkedHumanities project. [slides]

Comments off

Grad School: The Right Place

If you like where you live, if you like what you do,
If you like what you’re seein, when you’re lookin at you,
If you like what you’re sayin, when you open your face,
Then you got the right feeling, you’re in the right place.
Monsters of Folk – “The Right Place”

In November, I delivered two lectures to student organizations on campus and realized that I really miss teaching. Despite the amazing flexibility of a career in research and development, I won’t be able to find fulfillment until I am working with students. The only way to realize that goal is to become a professor, and in order to realize that I need a PhD, so I applied to graduate schools in December.

After visiting the available options, I’ve decided to continue my studies at Indiana University, pursuing the Joint PhD In Cognitive Science and Computer Science. All in all, IU just feels like the right place. I’m well-positioned to make a lasting impact, both in my own studies and in the community, and there’s no break for moving to a new city and building a new professional network. Plus, there is a large amount of social and financial stability in Bloomington, which helps maintain my sanity.

As for now, I’m off to Lyon, France to give a presentation titled “Containing the Semantic Explosion”, covering work with the Indiana Philosophy Ontology Project. An abstract and slides will follow later this week.

Comments (1)

2011 in Music

Once again it is time to do a musical year-in-review. I feel some of my scrobble counts are off this year due to the launch of the Amazon Cloud Player, which I’ve been using at work. Of course, my 2009 play counts were also off due to sporadic iPod syncing, but this is still fairly accurate.

’11 Artist ’10 Change
1 Cold War Kids 68 (+67)
2 The Avett Brothers 1 (-1)
3 Death Cab for Cutie 21 (+18)
4 Wilco 7 (+3)
5 Radiohead 2 (-3)
6 Kanye West 78 (+72)
7 Say Hi 8 (+1)
8 Daft Punk 34 (+26)
9 Kings of Leon 9 (–)
10 Counting Crows 12 (+2)

Right below this list of top artists is a significant number of new discoveries. In the folk scene, I’ve been listening to Ryan Adams, The Head and the Heart, and The Goat Rodeo Sessions. In the indie scene, I’ve been listening to Death Cab for Cutie’s newest album, Cold War Kids, and Florence + the Machine. Sonic Youth has been an awesome discovery — Goo is making weekly appearnces in my playlists.

TV on the Radio is the coolest band I’ve discovered this year. Their arrangements are superb, and I really like their use of horns. The first song I heard (and subsequently fell in love with) is “Things You Can Do”. The new album, Nine Types of Light, has an accompanying movie that is an essential viewing for fans of Waking Life. Also, the movie has some amazing quotes: “It’s an unspeakable name. You don’t say it, you just look at it.”

The biggest musical change this year may not be reflected in play counts, but rather in consumption practice. I’ve been going to way more concerts in the past few months including Paul Simon, Punch Brothers, Gillian Welch & David Rawlins, Taj Mahal, Cold War Kids, They Might Be Giants, Main Squeeze, End Times Spasm Band, Joe Pug, and Say Hi. Bloomington has an astonishing number of bands come through, and because it’s a smaller town, we get to see them in smaller venues.

I’ve also continued switching to Amazon MP3, which has gotten even better with the advent of the Amazon Cloud Player, with clients for Windows, Max OS X, Linux, and Android. It’s nice having easy, instant access to my music anywhere. My only complaint is that the Amazon MP3 Downloader doesn’t have a 64-bit Linux client.

Comments off


Last week I wrote and then gave two lectures on “Categorization” and “Practical Parallelism”. It was a ton of fun to prepare them, and actually giving them made me realize how much I miss teaching. Abstracts and slides follow.


Student Organization for Cognitive Science (SOCS)
November 15, 2011 @ 5:30pm

Abstract: Categorization is a fundamental problem in cognitive science that goes by a multitude of names: In artificial intelligence, categorization is known as clustering; in mathematics, the problem is partitioning. There are many applications in linguistics, vision, and memory research. In this talk, I will provide a brief overview of exemplar vs. prototype models in the cognitive sciences (Goldstone & Kersten 2003), followed by an introduction to three different general-purpose clustering algorithms: k-means (MacQueen 1967), qt-clust (Heyer et al 1999), and information-theoretic clustering (Gokcayso & Principe 2002). Open-source Python implementations of each algorithm will be provided.


Practical Parallelism

CS Club Tech Talk
November 17, 2011 @ 7pm

Abstract: In this talk, I will give a brief overview of several key parallelism concepts and practical tools for several languages. After this talk, attendees should have the resources to recognize and solve “painfully parallel problems”. Topics will include: threads vs. processes, Amdahl’s Law, shared vs. distributed memory, synchronization, locks, pipes, queues, process pools, futures, OpenMP, MapReduce, Hadoop, and GPU programming.


Comments off

A New Chapter

In July, the Indiana Philosophy Ontology (InPhO) Project was awarded a new NEH-DFG Bi-lateral Digital Humanities Program grant with the University of Mannheim for linking and populating digital humanities databases. Our current grant ends in December, so this brought tons of relief, injecting $172,215 into the project. The DFG’s contribution of €126,400 allows InPhO co-founder Mathias Niepert to return to the project, along with his team at the University of Mannheim. All in all, the project will be able to continue for another two years.

As a result of the grant, I was offered a full-time, salaried faculty position as a Visiting Research Associate with the IU Cognitive Science Program, continuing work on the InPhO Project. During this time, I will be working on new methods of knowledge representation and machine learning with applications in document classification, ontology evaluation, and taxonomy alignment, bringing the digital humanities into the Linked Open Data initiative. I’ll also be working on a new bibliography management system for the Stanford Encyclopedia of Philosophy, using a tool developed for Cognitive Science Program faculty publication records.

I started the new position on August 16th. The new full-time job, plus the move to a my own 1-bedroom apartment, along with joining the band, have me falling more and more in love with Bloomington. For the first time in a long, long time, I’m satisfied with where I am. Looking forward to this new chapter of post-college life.

Comments off

Summer 2011

Figured it’s been another 4 months, so it’s time for another life update. This was an incredibly productive summer with the open-sourcing of the InPhO Project, an extremely successful refactoring, and two publications hitting press. It was also fun, as I started gigging with Afro-Hoosier International and took a road trip up California 1 with my brothers. All in all, a great bookend on this past chapter of life.


All of the InPhO code has been open-sourced and uploaded to GitHub in two repos. The inpho repo contains our data mining code, while the inphosite repo contains our API and website. Most of the code in the inpho repo was newly ported from Java so that we could use NLTK and integrate with the ORM. We hired a new undergraduate, Evan Boggs, to help refactor the code, and after a long summer, were able to cut 10,000 lines form the code base.

In July, I quit the Syriac Reference Portal (SRP), after several months of work deploying Semantic MediaWiki and the new COGS Bibliography Engine. I learned a lot about generalizability of the InPhO code, and what the humanities side of digital humanities needs, but ultimately the data provenance goals of the historical community are still an open question for semantic web research and standardization, and I want to focus my research efforts elsewhere. I hope the project finds success and will continue to support it through work on the COGS Bibliography Engine.

Publications-wise, the work on speciation and clustering was accepted as a full paper at the European Conference on Artificial Life (ECAL). I’m really pleased with the biological narrative we were able to weave, and am working on some further work with Larry Yaeger and Sean Dougherty on adapting the clustering tool to larger datasets. Also, Colin and I’s paper on the InPhO API from last year’s Chicago Colloquium on Digital Humanities and Computer Science was finally published.


In May, I joined Afro-Hoosier International, a local afropop and world music dance band. Five gigs in, it’s been crazy fun to play sax with other people again. We’re an 11-piece band, with three horns, three vocalists, keyboard, guitar, bass, kit, and auxillary, and we groove. We’ll be hitting the studio sometime soon to put togehter an album — I’m really pumped. This is a recording from my second gig with the band in Bryan Park:

At the end of July, I finally got to take a little vacation from the grind. For the first time ever, both of my brothers and I headed out to California at the same time to visit my Dad. While we were there, we took a road trip up the North Coast on California 1 to the Avenue of the Giants, the Black Sands, and Arcata in Humboldt County. We managed to make no plans at all, and took things at a completely leisurely pace, stopping and going as we pleased. I kept my cell phone and e-mail turned off for a record 5 days.

Comments (1)

InPhO for All: Why APIs Matter

This month Colin Allen and I published “InPhO for All: Why APIs Matter” in the Journal of the Chicago Colloquium on Digital Humanities and Computer Science (JDHCS). It’s a short piece setting up the API development narrative for digital humanists. Abstract, citation, and paper link follow.

The unique convergence of humanities scholars, computer scientists, librarians, and information scientists in digital humanities projects highlights the collaborative opportunities such research entails. Unfortunately, the relatively limited human resources committed to many digital humanities projects have led to unwieldy initial implementations and underutilization of semantic web technology, creating a sea of isolated projects without integratable data. Furthermore, the use of standards for one particular purpose may not suit other kinds of scholarly activities, impeding collaboration in the digital humanities. By designing and utilizing an Application Platform Interface (API), projects can reduce these barriers, while simultaneously reducing internal support costs and easing the transition to new development teams. Our experience developing an API for the Indiana Philosophy Ontology (InPhO) Project highlights these benefits.

Jaimie Murdock and Colin Allen. InPhO for All: Why APIs Matter. In Journal of the Chicago Colloquium on Digital Humanities and Computer Science (JDHCS). Evanston, Illinois, 2011. [paper]

Comments (1)

Reflections on Privacy

For many people, the primary privacy concern is the "no parents" concept – we don’t care who sees things as long as our "parents" don’t see it (where parents can be anyone we don’t want to see things – professional contacts, straight-edge acquaintences, terrorists, Julian Assange, etc.). This is what I term the exclusive privacy model: start with the public and begin cutting people out. However, this "public minus parents" idea doesn’t make sense. Online, you just have to logout to see this information. Offline, all someone has to do is talk. Facebook was originally marketed this way: here is a place to post information where only Harvard/Ivy League/college students can see it.

This exclusive model is the most common privacy misperception. Information spreads, and by consciously recognizing this privacy becomes synonymous with trust. For example, you send an e-mail, confide in a friend, or upload a photo. This is private information, but is capable of being shared or forwarded in any number of ways, both online and offline (e.g., gossip). Its reach is mitigated by social convention and our own discretion.

Google+ gets this inclusive privacy model right. First, it always explicitly states who an item is being shared with, not who is being excluded. When resharing an item that was shared with a limited circle, it notifies you of the original intent, highlighting the priviledge and trust placed in you. Just like an e-mail program’s forward button, each piece of content has a share button and the API will allow for all data to be federated outside of Google+. However, you also can disable the reshare for each posting. Someone else can always copy-paste your content, but it won’t be computationally linked to you.

Privacy isn’t just about information, it’s about image as well. Google+ enables full control over your profile. Instead of posting to your wall or tagging you in a photo, people communicate with you directly through limited shares which do not appear on your public profile. Photo tags don’t appear in your albums until they are approved. A box in the upper right corner of your profiles allows you to view it as any other user. Voyeurism is all but eliminated, as you do not see a constant stream of external interactions. Facebook has some of these settings, but they are not as pervasive in the profile.

The Next Step

Google+ seems to have figured out a better way to handle privacy – both in terms of information and image – but the next social networking revolution is targeting: I don’t care who sees what I post, but I am self-conscious about overloading people with irrelevant information. My ideal publishing model wouldn’t be about circles of people, but streams of tagged content. If there existed a service where you could follow a person, but mute certain content streams (such as local events, politics, etc.), we’d have perfection. For example, friends in Kentucky don’t care about tornadoes around Bloomington. Professional contacts may be extremely interested in my philosophy and technology content, but don’t care about what concerts I’m going to. People who aren’t in the same circles (hometown friends, college friends, professional contacts, etc.) may share interests in internet humor or politics, while others consider unfollowing me because of it. None of this information is private, but I don’t want to innundate the world with extraneous chatter. If a social network can figure this out, that’s where I’ll plant my flag.

Comments off

Speciation and Information Theory

For the past two semesters, I’ve been doing some exploratory work marrying speciation with information theory in the framework of the Polyworld artificial life simulator. The simulation gives us a nice framework for mathematically “pure” evolutionary theory and exploration of neural complexity. We’ve applied clustering algorithms to the genetic information, revealing evidence of both sympatric and allopatric speciation events. The key algorithmic intuition is that genes which are highly selected for will conserve, while those which are not will descend to a random distribution (and thus high entropy), so each dimension (gene) can be weighted by its information certainty to alleviate the curse of dimensionality.

The work was accepted as a poster and extended abstract for the Genetic and Evolutionary Computing Conference (GECCO), and was accepted as a full paper for the European Conference on Artificial Life (ECAL). The full paper is substantially revised from the initial GECCO submission, and provides an introduction to several problems of biological, computational, and information theoretic importance. The visualizations, including several videos showing the cluster data, were especially fun to create, and I’m proud of the finished product.

There are still several more research directions from this work: the allopatric and sympatric effects have not been differentiated, only one environment was analyzed (consistent with past work on evolution of complexity), the clustering algorithm’s thresholds were not explored for hierarchical effects, alternate clustering algorithms were not explored (future open-source project for me: clusterlib), … Still, the present work is encapsuled, the source is in the Polyworld trunk, and it was accepted for publication.

Abstract, citation, and paper follow.

Complex artificial life simulations can yield substantially distinct populations of agents corresponding to different adaptations to a common environment or specialized adaptations to different environments. Here we show how a standard clustering algorithm applied to the artificial genomes of such agents can be used to discover and characterize these subpopulations. As gene changes propagate throughout the population, new subpopulations are produced, which show up as new clusters. Cluster centroids allow us to characterize these different subpopulations and identify their distinct adaptation mechanisms. We suggest these subpopulations may reasonably be thought of as species, even if the simulation software allows interbreeding between members of the different subpopulations, and provide evidence of both sympatric and allopatric speciation in the Polyworld artificial life system. Analyzing intra- and inter-cluster fecundity differences and offspring production rates suggests that speciation is being promoted by a combination of post-zygotic selection (lower fitness of hybrid offspring) and pre-zygotic selection (assortative mating), which may be fostered by reinforcement (the Wallace effect).

Jaimie Murdock and Larry Yaeger. Identifying Species by Genetic Clustering. In Proceedings of the 2011 European Conference on Artificial Life. Paris, France, 2011. [paper]

Comments (1)

Spring 2011

With the passing of another semester comes another life update post. Even though I am no longer a student, being embedded in academia means progress is still measured by semesters.

Recently, I was awarded the Provost’s Award for Undergraduate Research and Creative Activity, which was a really nice capstone on my undergraduate experience. Since I did not walk at graduation, the Honors Convocation was a good opportunity to give my family closure on this chapter of my life.

Throughout these few months, I’ve been busy writing up a storm – one week in April saw 30 pages of manuscripts submitted. My previous post details the accepted poster summary on "Genetic Clustering for Species Identification" and the accepted book chapter on "Evaluating Dynamic Ontologies". There are two more papers in review and preparation right now. One is an expansion of the speciation work for a (hopeful) full-paper presentation. The other details work on taxonomy alignment carried out this semester.

I’ve still been travelling a ton. In December, I headed to Berkeley for my first California Christmas with Dad and Justin and my first non-business trip in 4 months. Three weeks later, I went back to California for a site visit at the Stanford Encyclopedia of Philosophy, Big Data Camp, and the O’Reilly Strata Conference. Strata was amazing – learned a ton, and met some really great people. Definitely planning to go again next year. I was scheduled to go to the Digital Humanities API Workshop but snow delays forced me to cancel, and last minute logistics chagnes made PyCon and ThatCamp SE impossible to attend. These three were certainly disappointments, but after being in an airport every month for 8 months, it was kind of nice to stay rooted for a while. Earlier this week, I visited Princeton University and Beth Mardutho: The Syriac Institute, as part of my work with the Syriac Reference Portal.

On a more personal note, the diaspora of friends has been steadily widening since graduation, including my roommate of 3 years. This has been disturbed, however, by just as many friends changing their plans to either stay in Bloomington or move back. While we will no longer have a single house to hang out in all the time, I’m excited about the social continuity next year.

Comments off