Author: Jaimie Murdock

  • A New Chapter

    In July, the Indiana Philosophy Ontology (InPhO) Project was awarded a new NEH-DFG Bi-lateral Digital Humanities Program grant with the University of Mannheim for linking and populating digital humanities databases. Our current grant ends in December, so this brought tons of relief, injecting $172,215 into the project. The DFG’s contribution of €126,400 allows InPhO co-founder Mathias Niepert to return to the project, along with his team at the University of Mannheim. All in all, the project will be able to continue for another two years.

    As a result of the grant, I was offered a full-time, salaried faculty position as a Visiting Research Associate with the IU Cognitive Science Program, continuing work on the InPhO Project. During this time, I will be working on new methods of knowledge representation and machine learning with applications in document classification, ontology evaluation, and taxonomy alignment, bringing the digital humanities into the Linked Open Data initiative. I’ll also be working on a new bibliography management system for the Stanford Encyclopedia of Philosophy, using a tool developed for Cognitive Science Program faculty publication records.

    I started the new position on August 16th. The new full-time job, plus the move to a my own 1-bedroom apartment, along with joining the band, have me falling more and more in love with Bloomington. For the first time in a long, long time, I’m satisfied with where I am. Looking forward to this new chapter of post-college life.

  • Summer 2011

    Figured it’s been another 4 months, so it’s time for another life update. This was an incredibly productive summer with the open-sourcing of the InPhO Project, an extremely successful refactoring, and two publications hitting press. It was also fun, as I started gigging with Afro-Hoosier International and took a road trip up California 1 with my brothers. All in all, a great bookend on this past chapter of life.

    Work

    All of the InPhO code has been open-sourced and uploaded to GitHub in two repos. The inpho repo contains our data mining code, while the inphosite repo contains our API and website. Most of the code in the inpho repo was newly ported from Java so that we could use NLTK and integrate with the ORM. We hired a new undergraduate, Evan Boggs, to help refactor the code, and after a long summer, were able to cut 10,000 lines form the code base.

    In July, I quit the Syriac Reference Portal (SRP), after several months of work deploying Semantic MediaWiki and the new COGS Bibliography Engine. I learned a lot about generalizability of the InPhO code, and what the humanities side of digital humanities needs, but ultimately the data provenance goals of the historical community are still an open question for semantic web research and standardization, and I want to focus my research efforts elsewhere. I hope the project finds success and will continue to support it through work on the COGS Bibliography Engine.

    Publications-wise, the work on speciation and clustering was accepted as a full paper at the European Conference on Artificial Life (ECAL). I’m really pleased with the biological narrative we were able to weave, and am working on some further work with Larry Yaeger and Sean Dougherty on adapting the clustering tool to larger datasets. Also, Colin and I’s paper on the InPhO API from last year’s Chicago Colloquium on Digital Humanities and Computer Science was finally published.

    Play

    In May, I joined Afro-Hoosier International, a local afropop and world music dance band. Five gigs in, it’s been crazy fun to play sax with other people again. We’re an 11-piece band, with three horns, three vocalists, keyboard, guitar, bass, kit, and auxillary, and we groove. We’ll be hitting the studio sometime soon to put togehter an album — I’m really pumped. This is a recording from my second gig with the band in Bryan Park:

    At the end of July, I finally got to take a little vacation from the grind. For the first time ever, both of my brothers and I headed out to California at the same time to visit my Dad. While we were there, we took a road trip up the North Coast on California 1 to the Avenue of the Giants, the Black Sands, and Arcata in Humboldt County. We managed to make no plans at all, and took things at a completely leisurely pace, stopping and going as we pleased. I kept my cell phone and e-mail turned off for a record 5 days.

  • InPhO for All: Why APIs Matter

    This month Colin Allen and I published “InPhO for All: Why APIs Matter” in the Journal of the Chicago Colloquium on Digital Humanities and Computer Science (JDHCS). It’s a short piece setting up the API development narrative for digital humanists. Abstract, citation, and paper link follow.

    The unique convergence of humanities scholars, computer scientists, librarians, and information scientists in digital humanities projects highlights the collaborative opportunities such research entails. Unfortunately, the relatively limited human resources committed to many digital humanities projects have led to unwieldy initial implementations and underutilization of semantic web technology, creating a sea of isolated projects without integratable data. Furthermore, the use of standards for one particular purpose may not suit other kinds of scholarly activities, impeding collaboration in the digital humanities. By designing and utilizing an Application Platform Interface (API), projects can reduce these barriers, while simultaneously reducing internal support costs and easing the transition to new development teams. Our experience developing an API for the Indiana Philosophy Ontology (InPhO) Project highlights these benefits.

    Jaimie Murdock and Colin Allen. InPhO for All: Why APIs Matter. In Journal of the Chicago Colloquium on Digital Humanities and Computer Science (JDHCS). Evanston, Illinois, 2011. [paper]

  • Reflections on Privacy

    For many people, the primary privacy concern is the "no parents" concept – we don’t care who sees things as long as our "parents" don’t see it (where parents can be anyone we don’t want to see things – professional contacts, straight-edge acquaintences, terrorists, Julian Assange, etc.). This is what I term the exclusive privacy model: start with the public and begin cutting people out. However, this "public minus parents" idea doesn’t make sense. Online, you just have to logout to see this information. Offline, all someone has to do is talk. Facebook was originally marketed this way: here is a place to post information where only Harvard/Ivy League/college students can see it.

    This exclusive model is the most common privacy misperception. Information spreads, and by consciously recognizing this privacy becomes synonymous with trust. For example, you send an e-mail, confide in a friend, or upload a photo. This is private information, but is capable of being shared or forwarded in any number of ways, both online and offline (e.g., gossip). Its reach is mitigated by social convention and our own discretion.

    Google+ gets this inclusive privacy model right. First, it always explicitly states who an item is being shared with, not who is being excluded. When resharing an item that was shared with a limited circle, it notifies you of the original intent, highlighting the priviledge and trust placed in you. Just like an e-mail program’s forward button, each piece of content has a share button and the API will allow for all data to be federated outside of Google+. However, you also can disable the reshare for each posting. Someone else can always copy-paste your content, but it won’t be computationally linked to you.

    Privacy isn’t just about information, it’s about image as well. Google+ enables full control over your profile. Instead of posting to your wall or tagging you in a photo, people communicate with you directly through limited shares which do not appear on your public profile. Photo tags don’t appear in your albums until they are approved. A box in the upper right corner of your profiles allows you to view it as any other user. Voyeurism is all but eliminated, as you do not see a constant stream of external interactions. Facebook has some of these settings, but they are not as pervasive in the profile.

    The Next Step

    Google+ seems to have figured out a better way to handle privacy – both in terms of information and image – but the next social networking revolution is targeting: I don’t care who sees what I post, but I am self-conscious about overloading people with irrelevant information. My ideal publishing model wouldn’t be about circles of people, but streams of tagged content. If there existed a service where you could follow a person, but mute certain content streams (such as local events, politics, etc.), we’d have perfection. For example, friends in Kentucky don’t care about tornadoes around Bloomington. Professional contacts may be extremely interested in my philosophy and technology content, but don’t care about what concerts I’m going to. People who aren’t in the same circles (hometown friends, college friends, professional contacts, etc.) may share interests in internet humor or politics, while others consider unfollowing me because of it. None of this information is private, but I don’t want to innundate the world with extraneous chatter. If a social network can figure this out, that’s where I’ll plant my flag.

  • Speciation and Information Theory

    For the past two semesters, I’ve been doing some exploratory work marrying speciation with information theory in the framework of the Polyworld artificial life simulator. The simulation gives us a nice framework for mathematically “pure” evolutionary theory and exploration of neural complexity. We’ve applied clustering algorithms to the genetic information, revealing evidence of both sympatric and allopatric speciation events. The key algorithmic intuition is that genes which are highly selected for will conserve, while those which are not will descend to a random distribution (and thus high entropy), so each dimension (gene) can be weighted by its information certainty to alleviate the curse of dimensionality.

    The work was accepted as a poster and extended abstract for the Genetic and Evolutionary Computing Conference (GECCO), and was accepted as a full paper for the European Conference on Artificial Life (ECAL). The full paper is substantially revised from the initial GECCO submission, and provides an introduction to several problems of biological, computational, and information theoretic importance. The visualizations, including several videos showing the cluster data, were especially fun to create, and I’m proud of the finished product.

    There are still several more research directions from this work: the allopatric and sympatric effects have not been differentiated, only one environment was analyzed (consistent with past work on evolution of complexity), the clustering algorithm’s thresholds were not explored for hierarchical effects, alternate clustering algorithms were not explored (future open-source project for me: clusterlib), … Still, the present work is encapsuled, the source is in the Polyworld trunk, and it was accepted for publication.

    Abstract, citation, and paper follow.

    Complex artificial life simulations can yield substantially distinct populations of agents corresponding to different adaptations to a common environment or specialized adaptations to different environments. Here we show how a standard clustering algorithm applied to the artificial genomes of such agents can be used to discover and characterize these subpopulations. As gene changes propagate throughout the population, new subpopulations are produced, which show up as new clusters. Cluster centroids allow us to characterize these different subpopulations and identify their distinct adaptation mechanisms. We suggest these subpopulations may reasonably be thought of as species, even if the simulation software allows interbreeding between members of the different subpopulations, and provide evidence of both sympatric and allopatric speciation in the Polyworld artificial life system. Analyzing intra- and inter-cluster fecundity differences and offspring production rates suggests that speciation is being promoted by a combination of post-zygotic selection (lower fitness of hybrid offspring) and pre-zygotic selection (assortative mating), which may be fostered by reinforcement (the Wallace effect).

    Jaimie Murdock and Larry Yaeger. Identifying Species by Genetic Clustering. In Proceedings of the 2011 European Conference on Artificial Life. Paris, France, 2011. [paper]

  • Spring 2011

    With the passing of another semester comes another life update post. Even though I am no longer a student, being embedded in academia means progress is still measured by semesters.

    Recently, I was awarded the Provost’s Award for Undergraduate Research and Creative Activity, which was a really nice capstone on my undergraduate experience. Since I did not walk at graduation, the Honors Convocation was a good opportunity to give my family closure on this chapter of my life.

    Throughout these few months, I’ve been busy writing up a storm – one week in April saw 30 pages of manuscripts submitted. My previous post details the accepted poster summary on "Genetic Clustering for Species Identification" and the accepted book chapter on "Evaluating Dynamic Ontologies". There are two more papers in review and preparation right now. One is an expansion of the speciation work for a (hopeful) full-paper presentation. The other details work on taxonomy alignment carried out this semester.

    I’ve still been travelling a ton. In December, I headed to Berkeley for my first California Christmas with Dad and Justin and my first non-business trip in 4 months. Three weeks later, I went back to California for a site visit at the Stanford Encyclopedia of Philosophy, Big Data Camp, and the O’Reilly Strata Conference. Strata was amazing – learned a ton, and met some really great people. Definitely planning to go again next year. I was scheduled to go to the Digital Humanities API Workshop but snow delays forced me to cancel, and last minute logistics chagnes made PyCon and ThatCamp SE impossible to attend. These three were certainly disappointments, but after being in an airport every month for 8 months, it was kind of nice to stay rooted for a while. Earlier this week, I visited Princeton University and Beth Mardutho: The Syriac Institute, as part of my work with the Syriac Reference Portal.

    On a more personal note, the diaspora of friends has been steadily widening since graduation, including my roommate of 3 years. This has been disturbed, however, by just as many friends changing their plans to either stay in Bloomington or move back. While we will no longer have a single house to hang out in all the time, I’m excited about the social continuity next year.

  • Two New Publications

    This past week brought two publication deadlines, a conference submission deadline, and preparation for a software demo at Harvard. Needless to say, I am exhausted, but it was well worth the effort.

    The first publication is a 2-page summary of work I’ve been doing with Prof. Larry Yaeger looking at speciation mechanisms in artificial life simulations. This was a condesnation of a paper submission for the Genetic and Evolutionary Computing Conference, and I’m really pleased with how much we were able to squeeze in. Abstract, citation, and link follow:

    Artificial life simulations can yield distinct populations of agents representing different adaptations to a common environment or specialized adaptations to different environments. Here we apply a standard clustering algorithm to the genomes of such agents to discover and characterize these subpopulations. As evolution proceeds new subpopulations are produced, which show up as new clusters. Cluster centroids allow us to characterize these different subpopulations and identify their distinct adaptation mechanisms. We suggest these subpopulations may reasonably be thought of as species, even if the simulation software allows interbreeding between members of the different subpopulations. Our results indicate both sympatric and allopatric speciation are present in the Polyworld artificial life system. Our analysis suggests that intra- and inter-cluster fecundity differences may be sufficient to foster sympatric speciation in artificial and biological ecosystems.

    Jaimie Murdock and Larry Yaeger. Genetic Clustering for Species Identification. In Proceedings of the Genetic and Ecolutionary Computation Conference (GECCO) 2011. Dublin, Ireland, 2011. [paper]

    The second publication is an expansion of the work on ontology evaluation presented last year at the 2010 International Conference on Knowledge Engineering and Ontology Development (KEOD) in Valencia, Spain. We’ve completely rewritten the section on our volatility score, and tightened up the language throughout. The 20-page behemoth will be published as a chapter in an upcoming volume of Springer-Verlag’s Communications in Computer and Information Science (CCIS) series. Abstract, citation, and link follow:

    Ontology evaluation poses a number of difficult challenges requiring different evaluation methodologies, particularly for a "dynamic ontology" generated by a combination of automatic and semi-automatic methods. We review evaluation methods that focus solely on syntactic (formal) correctness, on the preservation of semantic structure, or on pragmatic utility. We propose two novel methods for dynamic ontology evaluation and describe the use of these methods for evaluating the different taxonomic representations that are generated at different times or with different amounts of expert feedback. These methods are then applied to the Indiana Philosophy Ontology (InPhO), and used to guide the ontology enrichment process.

    Jaimie Murdock, Cameron Buckner and Colin Allen. Evaluating Dynamic Ontologies. Communications in Computer and Information Science (Lecture Notes). Spencer-Verlag. 2011. [chapter]

  • Farewell to a Friend

    Today is the memorial service for Helga Keller, a dear friend who has changed so many lives. Since the first weekend in Bloomington, Helga has been a surrogate grandmother for me. We met after church the first weekend I was here and had an instant bond: A German immigrant, her first home in America was my hometown – Murray, Kentucky – and she knew many friends from home. She also was the administrative assistant for Douglas Hofstadter, one of the major inspirations for coming to Indiana. These common bonds of faith, people, and place brought us together throughout the years.

    There are so many things for which I am extremely grateful to her. One day I sent her an e-mail inquiring about the CopyCat program and where I could find articles about it. She responded with a three-page e-mail, with the article attached, links  to all subsequent research, contact information for all of the authors, an offer to introduce me to them, and an invitation to the CRCC lab meetings. As if that weren’t enough, the next time I encountered her she gave me an autographed copy of the book the study appeared in, along with photocopies of the articles mentioned in the e-mail.

    This is but one of many stories of her overwhelming kindness and dedication. May she rest in peace.

  • 2010 in Music

    A year ago, I reflected upon my musical evolution during 2009. As another year passes us by, it’s time to dive into my last.fm data once again for an empirical reality-check on what I’m listening to:

    ‘10 Artist ‘09 Change
    1 The Avett Brothers 3 (+2)
    2 John Mayer 29 (+27)
    3 Radiohead 2 (-1)
    4 The Black Keys 40 (+36)
    5 The Beatles 5
    6 Laura Veirs 52 (+46)
    7 Wilco 1 (-6)
    8 Say Hi 6 (-2)
    9 Kings of Leon 22 (+13)
    10 Nada Surf 10

    All in all, this year was a transition year for music, and I feel it’s continuing to move. Whereas 2009 was largely a solidification of 2008’s favorites, 2010 saw dramatic shifts, with 4 bands making double digit jumps, and 5 bands falling off the top 10 entirely. The only music from last year’s top 10 which became more popular was The Avett Brothers, who leaped to a commanding first place lead on the strength of I and Love and You and a belated discovery of Four Theives Gone.

    As a whole, this year was a move towards a stronger rock influence, as opposed to a folk influence. John Mayer, The Black Keys, Kings of Leon, and the Gaslight Anthem have all landed in my favorite artist lists. In particular, three albums have stood out as essential listening: The Black Keys’ Brothers, The Gaslight Anthem’s The ’59 Sound, and Kings of Leon’s Because of the Times.

    This year I turned 21, which opened up some new opportunities for concerts. Electric Six, Regina Spector, Guster, The Avett Brothers, John Mayer, The Tallest Man on Earth, Joe Pug, and Sufjan Stevens were all particularly memorable nights. Sufjan’s concert was mind-blowingly amazing, both in terms of on-stage production and overall musicianship. I had not heard Age of Adz until that night, but the epic 25-minute “Impossible Soul” is one of the coolest experiments ever. (Sufjan ranked 12th in 2010.)

  • Gentoo: Subversion not permanently accepting SSL certs

    Today I had a rather frustrating issue, as svn would not allow me to permanently accept an SSL Cert under Gentoo, rather just offering me the option to reject or accept temporarily.

    Error validating server certificate for \'xxxxxxxx\':
     - The certificate is not issued by a trusted authority. Use the
       fingerprint to validate the certificate manually!
     - The certificate has an unknown error.
    Certificate information:
     - Hostname: xxxxxxxx
     - Valid: from xxxxxxxx until xxxxxxxx
     - Issuer: xxxxxxxx
     - Fingerprint: xxxxxxxx
    (R)eject or accept (t)emporarily?
    

    After some Googling, I found Bug 295617: subversion won’t save bad certificates permanently with Neon 0.29. By this point Neon 0.28 had left the portage tree, so downgrading was not an easy option. However, a comment on Bug 238529 hinted at a workaround: build Neon without GnuTLS.

    To fix this issue, the easy fix is:

    echo \'net-libs/neon -gnutls\' >> /etc/portage/package.use
    emerge -DN subversion
    

    Neon should rebuild and all will be well!

    Error validating server certificate for \'xxxxxxxx\':
     - The certificate is not issued by a trusted authority. Use the
       fingerprint to validate the certificate manually!
     - The certificate has an unknown error.
    Certificate information:
     - Hostname: xxxxxxxx
     - Valid: from xxxxxxxx until xxxxxxxx
     - Issuer: xxxxxxxx
     - Fingerprint: xxxxxxxx
    (R)eject, accept (t)emporarily or accept (p)ermanently?