Author: Jaimie Murdock

  • The InPhO Topic Explorer

    This week, I launched The InPhO Topic Explorer. Through an interactive visualization, The InPhO Topic Explorer exposes one way search engine results are generated and allows more focused exploration than just a list of related documents. It uses the LDA machine learning algorithm, the explorer infers topics from arbitrary text corpora. The current demo is trained on the Stanford Encyclopedia of Philosophy, but I will be expanding this to other collections in the next few weeks.

    Click for interactive topic explorer

    The color bands within each article’s row show the topic distribution within that article, and the relative sizes of each band indicates the weight of that topic in the article. The full width of each row indicates the similarity to the focus article. Each topic’s label and color is arbitrarily assigned, but is consistent across articles in the browser per topic.

    Display options include topic normalization, alphabetical sort and topic sort. By normalizing topics, the full width of each bar expands and topic weights per document can be compared. By clicking a topic, the documents will reorder acoording to that topic’s weight and topic bars will reorder according to the topic weights in the highest weighted document.

    By varying the number of topics, one can get a finer or coarser-grained analysis of the areas discussed in the articles. The visualization currently has 20, 40, 60, 80, 100, and 120 topic models for the Stanford Encyclopedia of Philosophy.

    In contrast to a search engine, which displays articles based on a similarity measure, the topic explorer allows you to reorder results based on what you’re interested in. For example, if you’re looking at animal consciousness (80 topics), you can click on topic 46 to see those that are closest in the “animals” category, while 46 shows “consciousness” and 42 shows “perception” (arbitrary labels chosen). Some topics have a lot of words like “theory”, “case”, “would”, and “even”. These general argumentative topics can be indicative of areas where debate is still ongoing.

    In early explorations, the visualization already highlights some interesting phenomena:

    • For central articles, such as kant (40 topics), one finds that a single topic (topic 30) comprises much of the article. By increasing the number of topics, such as to kant (120 topics), topic 77 now captures the “kant”-ness of the article, but several other components can now be explored. This shows the value of having multiple topic models.
    • For creationism (120 topics), one can see that the particular blend of topics generating that article is truly an outlier, with the probability only just over .5 of generating the next closest document; compare this to the distribution of top articles related to animal-consciousness (120 topics) or kant (120 topics).  Can you find other outliers in the SEP?

    The underlying dataset was generated using the InPhO VSM module’s LDA implementation. See Wikipedia: Latent Dirichlet Allocation for more on the LDA topic modeling approach or “Probabilistic Topic Models” (Blei, 2012) for a recent review.

    Source code and issue tracking are available at GitHub.

    Please share any notes in the comments below!

  • 2013 in Review

    As 2013 comes to an end, I’ve found myself in Indonesia again. With Granddad turning 86 and deciding to take an extended 2 month trip, it seemed like an important time to go and events in my own life lined up well — no finals, no school until January 13th, and no particular attachments in Bloomington. I’m spending 10 days with family, then off to Bali for 5 days, the beach for 4 days, 2 more days in Bogor, and then back to America. As in 1990 and 2007, I will leave on January 8th, 2014, which is apparently my Indonesian expiration date.

    The opportunity to explore Bogor and just unplug from my normal life has given me time for reflection and pause on what has been an eventful and fantastic year. I summarized much of the first half of the year earlier, but since then I’ve been moving swiftly.

    In July, I returned to DC to give a talk at the International Association for Computing and Philosophy, came back to Indy to give a poster at the Joint Conference on Digital Libraries, and then left for a 2 week vacation in the Bay. The vacation was amazing: I saw The Postal Service reunion, went on a road trip down California 1, checked out a music festival in Santa Cruz, then headed to Outside Lands in SF. When I got back, I ran off to Illinois to give a presentation and then moved down the hall to a new 2-bedroom apartment with a loft and 2-story ceilings. September and October were a blur of shows, homework, and settling into my new place.

    Perhaps November is the most emblematic of all the ways I’ve grown: I ran my first half-marathon (2:03!), organized my first retreat, hosted Friendsgiving, played with The 123s at The Bishop, gave presentations for all my classes, and hosted Mom’s Thanksgiving. None of these things would’ve been possible at the start of the year.

    For the first time in years, I feel caught up on life and comfortable in my own skin. While I still get overwhelmed, I’m starting to recognize that it’s going to work out. 2013 was a rediscovery of my values, and it feels like 2014 was the destination. I can’t wait to see what’s next.

  • One down, N to go

    This has been a very intense year, but the end has been worth it. In August, I started graduate school at Indiana University in the Computer Science Program. By October, I started having my first round of grad school anxieties – was a PhD worth it? Was I just doing more of the same by staying at IU? Was I going to grow? Several job offers and much discernment later, I realized that I truly wanted my doctorate, but that I had not positioned myself in the right programs — my interests are intensely interdisciplinary and more cognitive than computational. So, after some negotiations, I transferred from Computer Science to the Complex Systems Group in Informatics, which is a much better fit for my research goals.

    After this academic identity crisis, I came down with mono in December. Since I was the AI for the 75-student Data Structures course, I had to take incompletes in my coursework to focus my much-diminished energy on teaching. Despite the setback, mono was a very positive catalyst for me. I finally got to a doctor, which woke me up to the reality of what I had done to my body over the past 6 years: I was 23 and my blood pressure was in the hypertension range. For some reason the nurses weren’t freaked out about this, the doctor just said to check it out in a few months, but I knew something was wrong. So while recovering from mono, I decided to change things. I quit drinking to focus on my incompletes, started hitting the gym 5 times a week, picked up running, and have lost 40 pounds since January. I have collarbones, wristbones, and an Adam’s apple. It’s fucking awesome. Plus, I finished my first year of graduate school with a 3.83 GPA! 😀

    Research-wise, I’ve been distilling a new research area and imagining what my committee will look like. Right now, I’m diving into a literature review on what Colin is calling “biographically-plausible corpora”. The general intuition is that while “big data” approaches can create excellent recommendations, humans gain expertise from much smaller datasets. Thus, instead of training semantic models on 50 million books, what happens if you train them only on 50 or 500 books? I’ll be presenting this work at a symposia at IACAP 2013 in July.

    I’ve also had two side projects. The first is a return to Polyworld to examine correlations between TSE complexity and social behavior — an ALife approach to the social brain hypothesis. The second is an examination of the information flow between science and the humanities using the PhilPapers index and the UCSD Map of Science. Preliminary results are being presented as a poster at the Joint Conference on Digital Libraries (JCDL) and we’re aiming for a journal article by the end of the summer.

    Outside of school, I’ve been really enjoying myself musically. In January, I joined The 123s, playing alto sax on early rock, blues, and soul covers (stuff like Ray Charles, Aretha Franklin, Little Richard, Smokey Robinson, and Chuck Berry). This month Afro-Hoosier got a new trombone player, which has allowed me to switch to bari full-time. I’m playing gigs every other week, and on May 17th I’ll be playing my first gig in another town – a fundraiser out in Lafayette. On May 23rd, I’ll be headlining at the Bishop with The 123s. In the next 3 months I’ll be seeing Of Monsters and Men, Cold War Kids, Portugal. the Man, Todd Snider, The Wailers, The Postal Service, and all the bands at Outside Lands. Life has been good to my ears.

    So, all in all, I feel pretty great about where I’ve come this year. It took a bit of soul-searching to realize how much I wanted my PhD, and a lot of work to get my body ready for it, but I’m ready now and extremely satisfied with my position.

  • We are The 123s!

    This semester I’ve started playing sax with The 123s, a local blues and rock’n’roll band. I’ve really been enjoying our setlist, and we just uploaded two songs to YouTube.

    Our next gig is Friday, March 29th, at The Back Door, playing at the Blues on Blues benefit for the Trained Eye Arts Center from ~5:45-7pm ($5). I’m most excited about this summer though: we’re headlining at The Bishop on Thursday, May 23rd!

  • 2012 in Music

    This year, music has again become more than a consumptive activity. Through Afro-Hoosier, Canterbury, and my own noodling, I feel like I’m actively listening for arrangements and harmonies, and it feels wonderful to make that transition as a musician.

    So what have I been listening to? The top 10 are pretty indicative:

    ’12 Artist ’11 Change
    1 The Avett Brothers 2 (+1)
    2 Wilco 4 (+2)
    3 Radiohead 5 (+2)
    4 Paul Simon 48 (+44)
    5 John Mayer 24 (+19)
    6 The xx (–)
    7 Kings of Leon 9 (+2)
    8 The Black Keys 12 (+4)
    9 Bright Eyes 11 (+2)
    10 TV on the Radio 20 (+10)

    The Avett Brothers are riding on the strength of The Carpenter, which is a stellar album. John Mayer also rests on the strength of Born and Raised, which is easily my family’s favorite album of 2012. My experience with Paul Simon reflects that of seeing Sufjan Stevens – concert in November, followed by “woah this is really interesting” for the rest of time. The diversity reflected in his songwriting is amazing. The xx were the coolest “new” sound: very minimalist and sparse, with hip-hop beats and interesting guitar interplay. Their eponymous debut album is a must have.

    New Discoveries (YouTube playlist): Alabama Shakes (blues), The Lonely Forest (alt rock), Cage the Elephant (rock), Passion Pit & Handsome Furs (80s revival synth-pop), Portugal. the Man (psychedelic/rock), Of Monsters and Men (folk), Joshua Radin (singer/songwriter), Ben Howard (contemporary), BADBADNOTGOOD (jazz fusion, heavy electronic/hip-hop influences), Morphine (bass/bari sax/drum trio), Kid Cudi (hip-hop), Nero (dubstep), Above & Beyond (trance), and Shpongle (psychedelic/trance).

    Concerts: Above & Beyond, Radiohead, The Black Keys, Shpongle, Todd Snider, Outside Lands (Beck, Andrew Bird, Justice, Thee Oh Sees, Antibalas, Alabama Shakes, Portugal. the Man, Sigur Ros, Die Antwoord, Explosions in the Sky), The Avett Brothers, Victor Wooten.

  • Embracing Open Technologies

    As a computer scientist, my software and hardware environment are the most critical part of my professional life. Furthermore, as a digital native, this landscape is the strata upon which many of my interactions are built. Just as in our physical life, our digital life should inhabit healthy surroundings. Thus, I’ve entered a period of deep contemplation about the services I use, and have started embracing the ethos of the GNU Project: the tech we use should reflect the values we hold. To this end, there are three gradual shifts to my computing environment: adopting Linux, migrating to GitHub, and deactivating my Facebook account.

    Ownership, Context, Responsibilities

    The first notion is one of ownership, and there are two aspects: licensing and data. Open-source licensing solves many distribution problems, allowing system-wide update managers that upgrade all my software at once, rather than being bombarded with popup windows for each application. However, not all software works this way, and so we must confront the ambiguous reality of digital rights management (DRM). Last month, I had to replace my motherboard, which triggered Windows to inform me that I may have been a victim of software piracy. This is because the license is tied to the physical installation of the software, rather than the intellectual property of the ability to use the software. App stores, such as the Steam Platform, solve this problem by tying the software to the user, rather than the installation. So long as DRM does not interfere with the portability of my intellectual property, I am comfortable with it.

    The cloud is a double-edged sword when it comes to ownership and portability. On the one hand, by distributing data across multiple servers, we gain reliability and ubiquitous access, at the expense of security. However, many cloud storage implementations (e.g., Dropbox) do not follow file transfer standards in place since the 80s, locking you into their proprietary service and software. In contrast, services like GitHub offer remote hosting, but do not lock you into their system – your data is always portable. Amazon MP3 also offers portability through un-encrypted, unlimited download MP3s. By adhering to standards, applications guarantee openness of data, so long as the standards are published and APIs are available.

    However, standards, even when published, require compliance and ubiquity, and it is here that Facebook fails. While championing the Open Graph protocol for data, Facebook follows the old Microsoft approach to standards: “Embrace, extend, and extinguish.” Messages are the clearest example of this. Every user on Facebook automatically has an e-mail address @facebook.com. This address though is not accessible via the standard IMAP or POP protocols, but can receive messages form any address, locking them into the Facebook ecosystem. We are digital sharecroppers, handing over content with false promises of ownership, constantly undermined by forced changes to benefit corporate interests.

    The context of these messages has also rapidly changed. While they were once analogous to e-mail, they are now analogous to chat, a widely different medium (with the Jabber/XMPP open standard giving a facade of openness). Wall posts have undergone similar context shifting – from the early days of wall-to-wall conversations, to status comments, to the timeline – and all the while not offering easily accessible search. Control over context is a critical right for digital interactions, a point argued best by danah boyd. With nearly one billion users, Facebook is a self-described “social utility”, which vests a social responsibility for their users. Given their rejection of this responsibility, I have deactivated my Facebook account, in favor of controlling my own context at my personal web page. It is my hope that future social networks will maintain a balance between the free-for-all of MySpace pages and the rigor of Facebook profiles.

    We also must have right to be forgotten. Facebook maintains negative-space data, and based on network structure alone it is possible to infer unreported profile data and unregistered users. Klout auto-computes their metric for all Twitter users, regardless of whether they have registered for the service, driving thousands of registrations just to opt-out, forcing people to hand over their personal data regardless of their participation. This is a major problem for all social applications. The power of social applications is mighty, and maintaining user control is critical, lest we unintentionally surrender our identity to others.

    Dimensions of Services

    While I’ve sketched out some specific considerations, there are a few general principles to extract. It’s important to note that the above arguments have little to do with the notion of privacy, highlighting that the principle of openness is very different from the principle of publicity. It is possible to have an open system which is private. For example, private GitHub repositories are inherently open: the fundamental data, the code, is all accessible to the user, while private repositories may keep them from the public. Privacy and openness are also separate from commercial interests and cost. GMail is a private, open, free, commercial system, adhering to the very same IMAP protocol as all other mail servers, but it is monetized for the company, despite storing private information and being a free service. When it comes to privacy, we must first start with openness, because privacy is built on trust. If you are not trusted with access to your own data, how can you trust that system with it?

    Contemplating services within this framework still has issues: how do I deal with Steam, which is a closed, private, commercial service? The last aspect is portability. While my software is locked to the Steam service, it is not locked to a particular computer. Richard Stallman even makes a well-tempered argument that Steam can be beneficial for the Linux ecosystem by offering certain freedoms of choice, and the company itself has made a huge commitment to open-source development – rapidly improving Linux graphics drivers.

  • Containing the Semantic Explosion

    Yesterday afternoon, I delivered a talk to the PhiloWeb Workshop at the WWW2012 Conference titled “Containing the Semantic Explosion” with Cameron Buckner and Colin Allen. It is an overview of the InPhO Project architecture, known as dynamic ontology, and a preview of some forthcoming data mining tools. [slides]

    The explosion of semantic data on the information web, and within digital philosophy, requires new techniques for organizing and linking these knowledge repositories. These must address concerns about consistency, completeness, maintenance, usability, and pragmatics, while reducing the cost of double experts trained both in ontology design and the target domain. Folksonomy approaches address concerns about usability and personnel at the expense of consistency, completeness, and maintenance. Upper-level formal ontologies address concerns about consistency and completeness, but require double experts for the initial construction and maintenance of the representation. At the Indiana Philosophy Ontology (InPhO) Project, we have developed a general methodology called dynamic ontology, which alleviates the need for double experts, while addressing concerns about consistency, completeness and change through machine learning over a domain corpus, and concerns about usability and pragmatics through human input and semantic web standards. This representation can then be used by other projects in digital philosophy, such as the Stanford Encyclopedia of Philosophy (SEP) and PhilPapers, along with resources outside of digital philosophy enabled by the LinkedHumanities project. [slides]

  • Grad School: The Right Place

    If you like where you live, if you like what you do,
    If you like what you’re seein, when you’re lookin at you,
    If you like what you’re sayin, when you open your face,
    Then you got the right feeling, you’re in the right place.
    Monsters of Folk – “The Right Place”

    In November, I delivered two lectures to student organizations on campus and realized that I really miss teaching. Despite the amazing flexibility of a career in research and development, I won’t be able to find fulfillment until I am working with students. The only way to realize that goal is to become a professor, and in order to realize that I need a PhD, so I applied to graduate schools in December.

    After visiting the available options, I’ve decided to continue my studies at Indiana University, pursuing the Joint PhD In Cognitive Science and Computer Science. All in all, IU just feels like the right place. I’m well-positioned to make a lasting impact, both in my own studies and in the community, and there’s no break for moving to a new city and building a new professional network. Plus, there is a large amount of social and financial stability in Bloomington, which helps maintain my sanity.

    As for now, I’m off to Lyon, France to give a presentation titled “Containing the Semantic Explosion”, covering work with the Indiana Philosophy Ontology Project. An abstract and slides will follow later this week.

  • 2011 in Music

    Once again it is time to do a musical year-in-review. I feel some of my scrobble counts are off this year due to the launch of the Amazon Cloud Player, which I’ve been using at work. Of course, my 2009 play counts were also off due to sporadic iPod syncing, but this is still fairly accurate.

    ’11 Artist ’10 Change
    1 Cold War Kids 68 (+67)
    2 The Avett Brothers 1 (-1)
    3 Death Cab for Cutie 21 (+18)
    4 Wilco 7 (+3)
    5 Radiohead 2 (-3)
    6 Kanye West 78 (+72)
    7 Say Hi 8 (+1)
    8 Daft Punk 34 (+26)
    9 Kings of Leon 9 (–)
    10 Counting Crows 12 (+2)

    Right below this list of top artists is a significant number of new discoveries. In the folk scene, I’ve been listening to Ryan Adams, The Head and the Heart, and The Goat Rodeo Sessions. In the indie scene, I’ve been listening to Death Cab for Cutie’s newest album, Cold War Kids, and Florence + the Machine. Sonic Youth has been an awesome discovery — Goo is making weekly appearnces in my playlists.

    TV on the Radio is the coolest band I’ve discovered this year. Their arrangements are superb, and I really like their use of horns. The first song I heard (and subsequently fell in love with) is “Things You Can Do”. The new album, Nine Types of Light, has an accompanying movie that is an essential viewing for fans of Waking Life. Also, the movie has some amazing quotes: “It’s an unspeakable name. You don’t say it, you just look at it.”

    The biggest musical change this year may not be reflected in play counts, but rather in consumption practice. I’ve been going to way more concerts in the past few months including Paul Simon, Punch Brothers, Gillian Welch & David Rawlins, Taj Mahal, Cold War Kids, They Might Be Giants, Main Squeeze, End Times Spasm Band, Joe Pug, and Say Hi. Bloomington has an astonishing number of bands come through, and because it’s a smaller town, we get to see them in smaller venues.

    I’ve also continued switching to Amazon MP3, which has gotten even better with the advent of the Amazon Cloud Player, with clients for Windows, Max OS X, Linux, and Android. It’s nice having easy, instant access to my music anywhere. My only complaint is that the Amazon MP3 Downloader doesn’t have a 64-bit Linux client.

  • Talks

    Last week I wrote and then gave two lectures on “Categorization” and “Practical Parallelism”. It was a ton of fun to prepare them, and actually giving them made me realize how much I miss teaching. Abstracts and slides follow.

    Categorization

    Student Organization for Cognitive Science (SOCS)
    November 15, 2011 @ 5:30pm

    Abstract: Categorization is a fundamental problem in cognitive science that goes by a multitude of names: In artificial intelligence, categorization is known as clustering; in mathematics, the problem is partitioning. There are many applications in linguistics, vision, and memory research. In this talk, I will provide a brief overview of exemplar vs. prototype models in the cognitive sciences (Goldstone & Kersten 2003), followed by an introduction to three different general-purpose clustering algorithms: k-means (MacQueen 1967), qt-clust (Heyer et al 1999), and information-theoretic clustering (Gokcayso & Principe 2002). Open-source Python implementations of each algorithm will be provided.

    Slides

    Practical Parallelism

    CS Club Tech Talk
    November 17, 2011 @ 7pm

    Abstract: In this talk, I will give a brief overview of several key parallelism concepts and practical tools for several languages. After this talk, attendees should have the resources to recognize and solve “painfully parallel problems”. Topics will include: threads vs. processes, Amdahl’s Law, shared vs. distributed memory, synchronization, locks, pipes, queues, process pools, futures, OpenMP, MapReduce, Hadoop, and GPU programming.

    Slides