Blog

  • Wisdom of the Few?

    Wisdom of the Few? “Supertaggers” in Collaborative Tagging Systems

    Jared Lorince, Sam Zorowitz, Jaimie Murdock, Peter M. Todd

    A folksonomy is ostensibly an information structure built up by the “wisdom of the crowd”, but is the “crowd” really doing the work? Tagging is in fact a sharply skewed process in which a small minority of “supertagger” users generate an overwhelming majority of the annotations. Using data from three large-scale social tagging platforms, we explore (a) how to best quantify the imbalance in tagging behavior and formally define a supertagger, (b) how supertaggers differ from other users in their tagging patterns, and (c) if effects of motivation and expertise inform our understanding of what makes a supertagger. Our results indicate that such prolific users not only tag more than their counterparts, but in quantifiably different ways. These findings suggest that we should question the extent to which folkosonomies achieve crowdsourced classification via the “wisdom of the crowd”, especially for broad folksonomies like Last.fm as opposed to narrow folksonomies like Flickr.

    Preprint of article in review available at arXiv:1502.02777 [cs.SI]

  • Topic Explorer at AAAI

    Next week, I’ll be headed to Austin, TX for AAAI-15 to present a demo of the Topic Explorer. With this presentation is a short paper:

    Topic models remain a black box both for modelers and for end users in many respects. From the modelers’ perspective, many decisions must be made which lack clear rationales and whose interactions are unclear – for example, how many topics the algorithms should find (K), which words to ignore (aka the “stop list”), and whether it is adequate to run the modeling process once or multiple times, producing different results due to the algorithms that approximate the Bayesian priors. Furthermore, the results of different parameter settings are hard to analyze, summarize, and visualize, making model comparison difficult. From the end users’ perspective, it is hard to understand why the models perform as they do, and information-theoretic similarity measures do not fully align with humanistic interpretation of the topics. We present the Topic Explorer, which advances the state-of-the-art in topic model visualization for document-document and topic-document relations. It brings topic models to life in a way that fosters deep understanding of both corpus and models, allowing users to generate interpretive hypotheses and to suggest further experiments. Such tools are an essential step toward assessing whether topic modeling is a suitable technique for AI and cognitive modeling applications.

    Jaimie Murdock and Colin Allen. (2015) Visualization Techniques for Topic Model Checking. [demo track] in Proceedings of the 29th AAAI Conference (AAAI-15). Austin, Texas, USA, January 25-29, 2015.

  • Granddad’s Indonesian Career

    The Granddad had two tenures in Indonesia at the Bogor Agricultural School (Institut Pertanian Bogor – IPB) from 1968-1970 and 1980-1985. IPB became independent from the University of Indonesia in 1963, and Granddad’s work was instrumental in its reorganization as the first degree-granting agricultural school in Indonesia. In his first term, he created the 4-year undergraduate curriculum and set general education requirements, helping the university exceed its goal of 20,000 graduates by the year 2000. In his second term, he served as Director of the Graduate Education Project and began issuing doctoral degrees.

    IPB was founded on a “tri-darma” of teaching, research, and extension, which matched the educational philosophy of the American land grant universities that trained Granddad. His design for the flagship Darmaga Campus was located on a Dutch rubber plantation and recognized that a university hosted not only research and faculty, but also students and their families. Therefore, it included classroom buildings, research and teaching fields, extension offices, residence halls and chapels. On a tour of the campus on Christmas Eve 2013, he was especially proud that the church and the mosque were located on the same courtyard sharing the same playground, that actual rubber, banana, rice, and corn fields for the students had been preserved, and that the library had been vastly expanded. He visited his IPB colleagues every year from his retirement at UW-Madison through his death in 2014 (pictured laughing in 2006, below).

    Granddad talking about his career on our way to the IPB Darmaga Campus (December 2013)
    December 2006 gathering of Granddad and IPB colleagues at Aunt Cindy's house
    Gathering of The Granddad and IPB colleagues at Aunt Cindy’s house (December 2006)
  • Debut Album: “We are the 123s!”

    On November 21st, The 123s will release our debut album “We are the 123s!” Recorded on June 10, 2014 at Russian Recording in Bloomington, IN, the album features 7 tracks and is available for streaming at http://wearethe123s.com/

    Additionally, we’ll be having a FREE release show:

    “We are the 123s!” Release Show
    November 21, 2014 10 pm
    Max’s on the Square
    106 W 6th St, Bloomington, IN

    Finally, we’ve released a full set of music videos from the live recording:

    A lot of hard work went into this album, and I’m very excited to share it with everyone! Physical copies on a “vinyl” CD are available as well, message me for more details.

  • Granddad

    On August 29, Granddad passed away suddenly at 86 in his home on Terrapin Creek. As the public obituary shows, Granddad was a legendary man: a Professor of Soil Science for 39 years at University of Wisconsin – Madison, he led the green revolution in Indonesia and Brazil (for which he received doctorates in 1985 and 2014, respectively). As President of the Midwest Universities Consortium on International Activities (MUCIA), he helped many other institutions and countries coordinate humanitarian aid. After retirement, Granddad still traveled to Indonesia every year and worked on the Ponderosa through his last day.

    At his funeral, all his grandchildren were given the opportunity to speak and my eulogy is below.

    After Grandma passed away, Granddad started a new tradition of writing his grandchildren a Christmas letter every year. In them, he told us his life’s story – from childhood on Terrapin Creek to finding the love of his life to moving away for school and then his first job. Throughout everything Granddad’s letters were filled with love and his profound sense of finding home, wherever he was.

    In the past two years, Granddad and I recognized that I was following in part of his footsteps by becoming a PhD student. This Christmas, I made plans to see him in Indonesia. Granddad and I arrived in Jakarta within an hour of each other. He had just come back from the mission field in Sulawesi, and was undeniably sick. Granddad’s health was never a complaint, it was just a statement. When his lung stopped working almost a decade ago, he didn’t. When he visited the doctor in Indonesia, the doctor asked to take a picture of him. Granddad asked why and the doctor said “My dad is 84 and giving up on life, you’re 86 and your life is just beginning!”

    Granddad’s life always was just beginning – he started every day in gratitude and as his letters showed us, even recollections of the past started with thankfulness for the day he was given and the future he had created for his family and the world.

    Granddad and I at IPB on Christmas Eve 2014 As he was feeling better, he started whistling again as he was in the house. One morning I asked him to take me to IPB – the Bogor Agricultural School – that he worked at for 7 years. Twenty minutes later, in a moment that was very Granddad, he said “car’s out front, let’s go.” Now, I was expecting him to take a few days to make arrangements, so I hurried off to get shoes. When we got in the car he started telling me all about the work he had done there restructuring the curriculum and I hadn’t realized he literally designed the university – from the library to offices to fields to chapels. They had set a goal of 20,000 graduates by the year 2000, which they met early!

    Granddad made an immediate impact on so many lives, but the life and work he created was built to last. We each saw that first-hand as he mentored so many of us. Now, as his letters stop, we are left to find our own path, but the lessons he gave us of love and dedication will live on forever.

  • The InPhO Topic Explorer

    This week, I launched The InPhO Topic Explorer. Through an interactive visualization, The InPhO Topic Explorer exposes one way search engine results are generated and allows more focused exploration than just a list of related documents. It uses the LDA machine learning algorithm, the explorer infers topics from arbitrary text corpora. The current demo is trained on the Stanford Encyclopedia of Philosophy, but I will be expanding this to other collections in the next few weeks.

    Click for interactive topic explorer

    The color bands within each article’s row show the topic distribution within that article, and the relative sizes of each band indicates the weight of that topic in the article. The full width of each row indicates the similarity to the focus article. Each topic’s label and color is arbitrarily assigned, but is consistent across articles in the browser per topic.

    Display options include topic normalization, alphabetical sort and topic sort. By normalizing topics, the full width of each bar expands and topic weights per document can be compared. By clicking a topic, the documents will reorder acoording to that topic’s weight and topic bars will reorder according to the topic weights in the highest weighted document.

    By varying the number of topics, one can get a finer or coarser-grained analysis of the areas discussed in the articles. The visualization currently has 20, 40, 60, 80, 100, and 120 topic models for the Stanford Encyclopedia of Philosophy.

    In contrast to a search engine, which displays articles based on a similarity measure, the topic explorer allows you to reorder results based on what you’re interested in. For example, if you’re looking at animal consciousness (80 topics), you can click on topic 46 to see those that are closest in the “animals” category, while 46 shows “consciousness” and 42 shows “perception” (arbitrary labels chosen). Some topics have a lot of words like “theory”, “case”, “would”, and “even”. These general argumentative topics can be indicative of areas where debate is still ongoing.

    In early explorations, the visualization already highlights some interesting phenomena:

    • For central articles, such as kant (40 topics), one finds that a single topic (topic 30) comprises much of the article. By increasing the number of topics, such as to kant (120 topics), topic 77 now captures the “kant”-ness of the article, but several other components can now be explored. This shows the value of having multiple topic models.
    • For creationism (120 topics), one can see that the particular blend of topics generating that article is truly an outlier, with the probability only just over .5 of generating the next closest document; compare this to the distribution of top articles related to animal-consciousness (120 topics) or kant (120 topics).  Can you find other outliers in the SEP?

    The underlying dataset was generated using the InPhO VSM module’s LDA implementation. See Wikipedia: Latent Dirichlet Allocation for more on the LDA topic modeling approach or “Probabilistic Topic Models” (Blei, 2012) for a recent review.

    Source code and issue tracking are available at GitHub.

    Please share any notes in the comments below!

  • 2013 in Review

    As 2013 comes to an end, I’ve found myself in Indonesia again. With Granddad turning 86 and deciding to take an extended 2 month trip, it seemed like an important time to go and events in my own life lined up well — no finals, no school until January 13th, and no particular attachments in Bloomington. I’m spending 10 days with family, then off to Bali for 5 days, the beach for 4 days, 2 more days in Bogor, and then back to America. As in 1990 and 2007, I will leave on January 8th, 2014, which is apparently my Indonesian expiration date.

    The opportunity to explore Bogor and just unplug from my normal life has given me time for reflection and pause on what has been an eventful and fantastic year. I summarized much of the first half of the year earlier, but since then I’ve been moving swiftly.

    In July, I returned to DC to give a talk at the International Association for Computing and Philosophy, came back to Indy to give a poster at the Joint Conference on Digital Libraries, and then left for a 2 week vacation in the Bay. The vacation was amazing: I saw The Postal Service reunion, went on a road trip down California 1, checked out a music festival in Santa Cruz, then headed to Outside Lands in SF. When I got back, I ran off to Illinois to give a presentation and then moved down the hall to a new 2-bedroom apartment with a loft and 2-story ceilings. September and October were a blur of shows, homework, and settling into my new place.

    Perhaps November is the most emblematic of all the ways I’ve grown: I ran my first half-marathon (2:03!), organized my first retreat, hosted Friendsgiving, played with The 123s at The Bishop, gave presentations for all my classes, and hosted Mom’s Thanksgiving. None of these things would’ve been possible at the start of the year.

    For the first time in years, I feel caught up on life and comfortable in my own skin. While I still get overwhelmed, I’m starting to recognize that it’s going to work out. 2013 was a rediscovery of my values, and it feels like 2014 was the destination. I can’t wait to see what’s next.

  • One down, N to go

    This has been a very intense year, but the end has been worth it. In August, I started graduate school at Indiana University in the Computer Science Program. By October, I started having my first round of grad school anxieties – was a PhD worth it? Was I just doing more of the same by staying at IU? Was I going to grow? Several job offers and much discernment later, I realized that I truly wanted my doctorate, but that I had not positioned myself in the right programs — my interests are intensely interdisciplinary and more cognitive than computational. So, after some negotiations, I transferred from Computer Science to the Complex Systems Group in Informatics, which is a much better fit for my research goals.

    After this academic identity crisis, I came down with mono in December. Since I was the AI for the 75-student Data Structures course, I had to take incompletes in my coursework to focus my much-diminished energy on teaching. Despite the setback, mono was a very positive catalyst for me. I finally got to a doctor, which woke me up to the reality of what I had done to my body over the past 6 years: I was 23 and my blood pressure was in the hypertension range. For some reason the nurses weren’t freaked out about this, the doctor just said to check it out in a few months, but I knew something was wrong. So while recovering from mono, I decided to change things. I quit drinking to focus on my incompletes, started hitting the gym 5 times a week, picked up running, and have lost 40 pounds since January. I have collarbones, wristbones, and an Adam’s apple. It’s fucking awesome. Plus, I finished my first year of graduate school with a 3.83 GPA! 😀

    Research-wise, I’ve been distilling a new research area and imagining what my committee will look like. Right now, I’m diving into a literature review on what Colin is calling “biographically-plausible corpora”. The general intuition is that while “big data” approaches can create excellent recommendations, humans gain expertise from much smaller datasets. Thus, instead of training semantic models on 50 million books, what happens if you train them only on 50 or 500 books? I’ll be presenting this work at a symposia at IACAP 2013 in July.

    I’ve also had two side projects. The first is a return to Polyworld to examine correlations between TSE complexity and social behavior — an ALife approach to the social brain hypothesis. The second is an examination of the information flow between science and the humanities using the PhilPapers index and the UCSD Map of Science. Preliminary results are being presented as a poster at the Joint Conference on Digital Libraries (JCDL) and we’re aiming for a journal article by the end of the summer.

    Outside of school, I’ve been really enjoying myself musically. In January, I joined The 123s, playing alto sax on early rock, blues, and soul covers (stuff like Ray Charles, Aretha Franklin, Little Richard, Smokey Robinson, and Chuck Berry). This month Afro-Hoosier got a new trombone player, which has allowed me to switch to bari full-time. I’m playing gigs every other week, and on May 17th I’ll be playing my first gig in another town – a fundraiser out in Lafayette. On May 23rd, I’ll be headlining at the Bishop with The 123s. In the next 3 months I’ll be seeing Of Monsters and Men, Cold War Kids, Portugal. the Man, Todd Snider, The Wailers, The Postal Service, and all the bands at Outside Lands. Life has been good to my ears.

    So, all in all, I feel pretty great about where I’ve come this year. It took a bit of soul-searching to realize how much I wanted my PhD, and a lot of work to get my body ready for it, but I’m ready now and extremely satisfied with my position.

  • We are The 123s!

    This semester I’ve started playing sax with The 123s, a local blues and rock’n’roll band. I’ve really been enjoying our setlist, and we just uploaded two songs to YouTube.

    Our next gig is Friday, March 29th, at The Back Door, playing at the Blues on Blues benefit for the Trained Eye Arts Center from ~5:45-7pm ($5). I’m most excited about this summer though: we’re headlining at The Bishop on Thursday, May 23rd!

  • 2012 in Music

    This year, music has again become more than a consumptive activity. Through Afro-Hoosier, Canterbury, and my own noodling, I feel like I’m actively listening for arrangements and harmonies, and it feels wonderful to make that transition as a musician.

    So what have I been listening to? The top 10 are pretty indicative:

    ’12 Artist ’11 Change
    1 The Avett Brothers 2 (+1)
    2 Wilco 4 (+2)
    3 Radiohead 5 (+2)
    4 Paul Simon 48 (+44)
    5 John Mayer 24 (+19)
    6 The xx (–)
    7 Kings of Leon 9 (+2)
    8 The Black Keys 12 (+4)
    9 Bright Eyes 11 (+2)
    10 TV on the Radio 20 (+10)

    The Avett Brothers are riding on the strength of The Carpenter, which is a stellar album. John Mayer also rests on the strength of Born and Raised, which is easily my family’s favorite album of 2012. My experience with Paul Simon reflects that of seeing Sufjan Stevens – concert in November, followed by “woah this is really interesting” for the rest of time. The diversity reflected in his songwriting is amazing. The xx were the coolest “new” sound: very minimalist and sparse, with hip-hop beats and interesting guitar interplay. Their eponymous debut album is a must have.

    New Discoveries (YouTube playlist): Alabama Shakes (blues), The Lonely Forest (alt rock), Cage the Elephant (rock), Passion Pit & Handsome Furs (80s revival synth-pop), Portugal. the Man (psychedelic/rock), Of Monsters and Men (folk), Joshua Radin (singer/songwriter), Ben Howard (contemporary), BADBADNOTGOOD (jazz fusion, heavy electronic/hip-hop influences), Morphine (bass/bari sax/drum trio), Kid Cudi (hip-hop), Nero (dubstep), Above & Beyond (trance), and Shpongle (psychedelic/trance).

    Concerts: Above & Beyond, Radiohead, The Black Keys, Shpongle, Todd Snider, Outside Lands (Beck, Andrew Bird, Justice, Thee Oh Sees, Antibalas, Alabama Shakes, Portugal. the Man, Sigur Ros, Die Antwoord, Explosions in the Sky), The Avett Brothers, Victor Wooten.