1 post tagged “topic modeling”
This post is an excerpt from an email with one of cool guys I met at the Ruby on Rails unconference (http://www.rubyonrailscamp.com) that I attended last Thursday.
Over dinner after the conference we touched on the topic of text mining so I brought up a Matlab application that made the news a while back. It’s the Matlab Topic Modeling Toolbox. You can get more info on it here: http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm Here’s a great article about the tool: http://arstechnica.com/news.ars/post/20060802-7408.html
In response to an email about the Topic Modeling Toolbox, I received the great question: "Interesting articles you sent – how do you think they are useful to the consumer?"
Following is part of my response email.
-----
It’s an interesting question that you pose. I think there are really two questions about usefulness. How is this technology useful to society and how is this technology useful to consumers (i.e. how can it be monetized)? Clearly current text processing technologies are revolutionizing our ability to share knowledge through search, but I think that topic modeling will open a new dimension for ways to visualize text based knowledge.
Have you seen the liveplasma interface for music and movies (http://www.liveplasma.com)? It’s a wonderful data visualization that I think could be powered by a good topic modeling interface. Instead of being limited to music or movies, the user could search for any arbitrary text element and find topically related elements, even if the relationship is one that might not be typically thought of. Think of a celebrity that produces music, is in the news for fashion, is getting married and was found in a plushie suit with her priest. All of these aspects could be brought to light in an easily viewed fashion. This is just one quick idea of how the data generated through topic modeling could be exposed to users.
Regarding usefulness: I’m a big believer in information as the ultimate tool for solving the world’s problems. I think many problems result from ignorance or else from what the economists call “information asymmetry.” Since topic modeling could provide whole new ways to systematically display relationships between text (relationships that are often already present in how real people conceptualize information) then it will be easier for people to discover relevant information.
An interesting approach for monetizing this technology for consumers is to provide a better information set than other information providers. There’s a new company called Monitor110 (http://www.monitor110.com) that is focused on culling data from many varied sources and providing the most relevant information to institutional investors. I think that topic modeling could provide a fantastic way to develop a more holistic view of market happenings. In a sense, monetizing this technology would actually create more information asymmetry because only those who pay for the service will have that information.
-----
During the course of dinner, we also stumbled upon the topic of plushophiles (plushies). So I included this little notion:
-----
Regarding plushies, If you’re interested in a rather fascinating map of deviant sexual desires check this out: http://www.deviantdesires.com/map/map06b.jpg There are many, many things that are astounding to me, but are part of the human experience. Through topic modeling, a map like this could be generated by crawling the internet and would be constantly updated to have the latest in deviant lingo exposed to the lay person…
-----
My question for you (my blog reader): what would you do if you could apply topic modeling to a large corpus of text?