Archive for philosophy

The InPhO Topic Explorer

This week, I launched The InPhO Topic Explorer. Through an interactive visualization, The InPhO Topic Explorer exposes one way search engine results are generated and allows more focused exploration than just a list of related documents. It uses the LDA machine learning algorithm, the explorer infers topics from arbitrary text corpora. The current demo is trained on the Stanford Encyclopedia of Philosophy, but I will be expanding this to other collections in the next few weeks.

Click for interactive topic explorer

The color bands within each article’s row show the topic distribution within that article, and the relative sizes of each band indicates the weight of that topic in the article. The full width of each row indicates the similarity to the focus article. Each topic’s label and color is arbitrarily assigned, but is consistent across articles in the browser per topic.

Display options include topic normalization, alphabetical sort and topic sort. By normalizing topics, the full width of each bar expands and topic weights per document can be compared. By clicking a topic, the documents will reorder acoording to that topic’s weight and topic bars will reorder according to the topic weights in the highest weighted document.

By varying the number of topics, one can get a finer or coarser-grained analysis of the areas discussed in the articles. The visualization currently has 20, 40, 60, 80, 100, and 120 topic models for the Stanford Encyclopedia of Philosophy.

In contrast to a search engine, which displays articles based on a similarity measure, the topic explorer allows you to reorder results based on what you’re interested in. For example, if you’re looking at animal consciousness (80 topics), you can click on topic 46 to see those that are closest in the “animals” category, while 46 shows “consciousness” and 42 shows “perception” (arbitrary labels chosen). Some topics have a lot of words like “theory”, “case”, “would”, and “even”. These general argumentative topics can be indicative of areas where debate is still ongoing.

In early explorations, the visualization already highlights some interesting phenomena:

  • For central articles, such as kant (40 topics), one finds that a single topic (topic 30) comprises much of the article. By increasing the number of topics, such as to kant (120 topics), topic 77 now captures the “kant”-ness of the article, but several other components can now be explored. This shows the value of having multiple topic models.
  • For creationism (120 topics), one can see that the particular blend of topics generating that article is truly an outlier, with the probability only just over .5 of generating the next closest document; compare this to the distribution of top articles related to animal-consciousness (120 topics) or kant (120 topics).  Can you find other outliers in the SEP?

The underlying dataset was generated using the InPhO VSM module’s LDA implementation. See Wikipedia: Latent Dirichlet Allocation for more on the LDA topic modeling approach or “Probabilistic Topic Models” (Blei, 2012) for a recent review.

Source code and issue tracking are available at GitHub.

Please share any notes in the comments below!

Comments off

Computer Studies

The latest issue of Communications of the ACM, the premier computer science journal, contains an interesting article by IU Professor Dennis Groth — Why an Informatics Degree? The article has much to say about the necessity of application and applied computing as a measure of computer science success.

However, there are some questions left unanswered. First, I address two questions in philosophy of science: “What is Computer Science?” and “Why Informatics?” I then address the pedagogical implications of these questions in a section on “Computer Studies”.

What is Computer Science?

Any new discipline needs to consider its philosophy in order to establish a methodology and range of study. Prof. Groth’s definitions of Computer Science and Informatics do not quite capture these considerations:

Computer science is focused on the design of hardware and software technology that provides computation. Informatics, in general, studies the intersection of people, information, and technology systems.

In explicitly linking the science to its implementation, this definition of Computer Science fumbles away its essence. Yes, the technology is important and provides a crucial instrument on which to study computation, but at its core computer science studies computation — information processing. Computer science empirically examines this question by studying algorithms (or procedures) in the context of a well-defined model (or system).

This conflation of implementation and quantum is extremely pervasive. For example, Biology is “the study of life”, but in a (typical) biology class one never addresses the basic question: “What is life?” The phenomena of life can be studied independently of the specific carbon-based implementation we have encountered. This doesn’t deny the practical utility of modern biology, but it does raise the question of how useful our study of the applied life is to our understanding of life itself. (If you’re interested in this line of questioning, I highly recommend Larry Yaeger’s course INFO-I486 Artificial Life.)

Similarly, Computer Science can study procedures independently of the hardware and software implementations. Consider the sorting problem. (If you are unfamiliar with sorting, see the Appendix: Sorting Example.) One would not start by looking at processor architecture or software deisgn, but would instead focus on the algorithm. Pure Computer Science has nothing to do with hardware or software, they are just an extremely practical medium on which we experiment.

Why Informatics?

Informatics seems to be ahead of itself here in asking “Why an Informatics degree?” before asking the more fundamental “Why Informatics?” There are two primary definitions implied in the article. The more popular answer is that “Informatics solves interdisciplinary problems through computation”. The second, emerging answer is that “Informatics studies the interaction of people and technology”.

The first definition defines a methodology but does not define a subject. It should be obvious that we live in a collaborative, interdisciplinary world. Fields should inform one another but there is still a distinction between fields: Biology studies life; Computer Science studies computation; Cognitive Science studies cognition; Chemistry studies chemicals; etc. One can approach any problem with any number of techniques – computing is one part of this problem-solving toolkit, along with algebra, calculus, logic and rhetoric. However, each of the particular sciences should answer some natural question – whether that be a better explanation of life, computation, mathematics or cognition. Positing a discipline as the use of one field to address problems in another field is not a new field. It’s applied [field] or [field] engineering.

The other definition, that informatics studies the interaction of people and technology, hints at a new discipline studying a quantum of “interaction”. This area has tons of exciting research, especially in human-computer interaction (HCI) and network science. Further emphasizing this would go a long ways toward creating a new discipline and set a clear distinction between the informaticist and the computer scientist. Computer scientists study computation; informaticists study interaction; both should be encouraged. As it stands, both study “computers” and both step on each other’s toes.

Computer Studies

This discussion of philosophies has important implications for how we structure computer-related education (formalized as Computer Studies). Despite major differences in our approaches, it does seem clear that Computer Science and Informatics should work together, especially in applications.

However, as currently implemented at IU, the Informatics curriculum is a liberal arts degree in technology. Formal education should teach either a vocation, a discipline or (ideally) both. Informatics seems to answer to neither claim by emphasizing how informaticists “solve problems with computers” without diving into programming or modeling. If it aims to teach such a vocation, then more application is necessary to give expertise; if it aims to teach a discipline, it is fine to do that through application, but we must recognize that application is only useful insofar as it benefits theory (and vice versa). Additionally, if the field does indeed have a quantum of interaction, then interaction should be the forefront of the curriculum.

IU’s Computer Science ex-department is a valiant effort to teach a discipline – in the span of 4 years we cover at least 3 distinct programming paradigms (functional, object-oriented and logic) spread over 4 distinct languages, bristling with an exploration of algorithms. That being said, I would be surprised if more than 25% of the graduating class could explain a Turing Machine.

Not everyone is into theory – most people really just want to “solve problems with computers” and have a good job. Where do these programmers go? Informatics does not address this challenge, and shouldn’t attempt to. The answer is software engineering – just as applied physics finds a home in classical engineering. By establishing a third program for those clearly interested in application, IU would have a very solid “computer studies” program (as distinguished from computation or technology). [A friend has pointed out that IU cannot legally offer an engineering degree, so we’d have to get creative on the name or tell people to go to Purdue. This works as a general model of Computer Studies pedagogy.]

As another example of how to split “computer studies”, Georgia Tech recently moved to a three-prong approach with the School of Computer Science (CS), School of Interactive Computing (IC), and Computational Science and Engineering Division (CSE). My view of Informatics roughly correlates to that of IC; the Computer Science programs are equivalent but include software engineering. The CSE division is a novel concept, presently captured by IU’s School of Informatics, and it seems this is another working group, but I feel it is best captured by adjunct faculty and interdisciplinary programs, rather than a whole new field.

Appendix: Sorting Example

Let’s say we have a list of numbers and want to sort them from smallest to largest. One naive way is to compare each term to the next one, and swap them if they are in the wrong order and restart until you can make it to the end without swapping:

1: *4 3* 2 1 -> 3 *4 2* 1 -> 3 2 *4 1* -> 3 2 1 4
2: *3 2* 1 4 -> 2 *3 1* 4 -> 2 1 *3 4* -> 2 1 3 4
3: *2 1* 3 4 -> 1 *2 3* 4 -> 1 2 *3 4* -> 1 2 3 4
4: *1 2* 3 4 -> 1 *2 3* 4 -> 1 2 *3 4* -> 1 2 3 4

This is called bubble sort, and solves the problem of sorting. However, consider what you’d have to do to sort a bigger list: each time you make a swap you have to rescan the whole list! A smarter way to sort this list would be to divide the list into two smaller lists, sort the smaller lists, and then merge them together:

1a: *4 3* -> 3 4
1b: *2 1* -> 1 2

Now merge:
2a: *3* 4 -> *3* 4 -> 1 2 3 4
2b: *1* 2 -> 1 *2* -^

This only takes 4 comparisons, compared to 12! We just did a classic problem in Computer Science without even once mentioning computer hardware or writing a single line of code!

Comments (1)