WorkHabit Blogs

WORKHABIT LABS

Suggestion: NLP produced keywords for twitter to enable more sophisticated taxonomy mapping for large-scale twitter applications

Twitter won't work with my social network, because my social network relies on keyword taxonomies. So, right now, it's easy to associate twitter with a profile, because it's a 1-1 relationship. And I can build a twitter channel (#workhabitinc) for example, to pull from multiple twitter feeds. Channels are fantastic because it's folksonomic, which means it's meta-data generate by people at the time of their post (as opposed to pre-defined categories). This means you can essentially build "group support" into twitter with a simple # symbol - bam, you have a channel tag.

But, if you're dealing with more complex meta-data, you have a weird problem. Let's say I wanted to float a twitter feed into multiple categories in (a taxonomy structures (I have no idea if that's a real word yet) with more complex multi-placement categorization? For example, I tweet, and I want to have that twitter element picked up by my social network, and then have that social network distribute that tweet to multiple categories. For a business user doing a product release, perhaps they want the tweet to show up in the support pages, and in the product section. They can do that, but only if the taxonomy term is a 1-1 match. If they want to post cross category, I would have to do #term1+term2, and let my system blow that out, and seperate the terms. That's useful for capturing data where people who are tweeting are actually explicitly pushing data to the system, and most current use cases support that. There are exceptions. But it's not possible to capture the data reliably, because you're relying on a machine to make judgements (hello Alexa!).

(Other interesting startups use the friends of friends algorithm (crazybob), and there are more models coming out frequently (amazing what you can build with such a simple tool, eh?).)

It's great, and it's useful. However, it requires a LOT of thought, and it's explicit versus implicit in the way that it gets posted to the site. What twitter needs in order to be more useful is an auto-tagging keyword service that provides useful and accurate keywords so that I can associate tweets from across the network that may not necessarily be explicitly aimed at your service. As mentioned, there are services that are trying to do so with natural language, and of course Gnip has been awesome at allowing people access to raw data for data mining. But twitter really needs to build and publish auto-tagging into the interface, to allow implicit discovery of data in the system.

The reason for this? Twitter's main value is the value of the database underneath it - the realtime social database. It's far in excess the chief valuation component for twitter, much more so than most of the monetization mechanisms that have been suggested. A real time database of trends is pure gold to marketing and sales organizations around the world. Right now, if you want to build sophisticated twitter applications, you have to perform your own realtime semantic overlay, and extract your own keywords on a case by case basis. While it's possible, it's definitely expensive and requires high level expertise to build (you have to do keyword extraction on a part-of-speech hierarchy, or use lexical chain analysis, both Natural Language Processing (NLP) techniques.

I would really like to see a richer set of meta-data available as part of the core twitter service, ideally without having to do more work to get at it.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <h3>
  • You can use Markdown syntax to format and style the text.

More information about formatting options

Papernote
Papernote

WorkHabit Labs Archives