Term Extraction API and TagCloud.com

One of the most inspiring backend pieces of the Event Browser for me was the innovative use of the Content Analysis Term Extraction API, which Ed describes in his post about the Event Browser:

One of the problems we had were that there were no images in our event feed. We knew we wanted to get images from either Flickr or Yahoo Image Search but it wasn’t immediately obvious how we would get an image from a phrase like “Highlights of the Textiles Permanent Collection at the MH De Young Memorial Museum”. It was another Yahoo, Toby Elliott, who suggested that we use the Content Analysis API and then Image Search. Honestly, I didn’t even know Yahoo had term extraction as a public API. To get the images you see in the demo I concatenate the title and venue to get the most important terms extracted and then use that as the image query. The first thumbnail gets used as the photo for each event. It’s really simple code on my end and all the real work is done by the term extraction service. My favorite example for images is the De Young Museum’s list of events.

I’m a total news and politics junkie (when I worked at CNN.com several years ago, it was like giving an alcoholic a job in a liquor store), so last week I started thinking about how using a tag cloud to represent breaking news from a set of RSS feeds that I choose. Last week being a big news week, I envisioned a dynamic tag cloud where words like “Scooter,” “Miers,” and “indictment” would get huge in breaking news situations to tip me off that something big was going on at that moment.

I started digging around and was getting ready to sit down and write the code, but then I found TagCloud.com, which uses (you guessed it) the Yahoo! Term Extraction API to produce a tag cloud built from RSS feeds you specify in an interface that is nicer than anything I could build quickly myself. No need to re-invent the wheel, so I signed up and had my tag cloud in a few minutes, using these feeds: CNN.com Top News, Fox News (to be fair and balanced), MSNBC, NY Times home page, Washington Post top news, and Yahoo! Top News.

Here is the resulting tag cloud from those sources. TagCloud.com offers a nice “stop words” feature so you can remove common (but useless) words from the tag cloud display. I specified “full story” and “story” as stop words, for example, but I also specified “white” and “supreme” because they were generally represented in the phrases “white house” and “supreme court” (terms which were preserved in the tag cloud even though I specified words within those phrases as stop words). (And check out their implementation guide for simple instructions on how to put your TagClouds on your site in badge form).

Displaying this tag cloud more dynamically in a Konfabulator widget would be cool. . . . TagCloud.com has already done the difficult work using the Term Extraction API, perhaps the most underappreciated API in the Yahoo! API arsenal. Sounds like a fun project.

Super-mashup with Yahoo! APIs: event browser

Event Browser In my twelfth week at Yahoo! I’m really happy to be able to finally point to something I have been working on with a small but incredibly talented team of engineers and UI designers (a couple of them even newer than me to Yahoo!) Check out the Event Browser, a super-demo of a bunch of Yahoo! APIs, with the exciting new Maps AJAX API we just announced as the foundation (and if you’re not in to AJAX there are many more Maps APIs to play with).

Frankly, I’ve been getting a little bored with maps mashups in general. Most are just a new set of points on a map from some newly-liberated set of data, which is cool but not as exciting as it used to be. This mashup is different, though. Instead of a standard query interface, the map becomes the center of the experience and your browse movement on the map determines the events you see in a very dynamic way. As you move around on the map, events taking place within your map space appear to the right of the map. All this goodness is happening completely client-side, i.e. Javascript making REST calls. There’s also a dynamic tag cloud with event categories that re-draw themselves as you move around. Very cool.

One more cool thing (hey, did someone use the word “cool”?) The images displayed for particular events take Yahoo! events output and pipe it through our Term Extraction API, then through the Image Search API to produce amazingly appropriate images for the event. Ravi Dronamraju, who put together the team that built this demo, provides his thoughts on this demo. A big thanks to Ed, Jonathan, Mirek, Karon, Sam, Nate, and Toby.

It’s a great team, and working with them on this reminded me of the concept of the “jelled team” from the truly excellent must-read software engineering management book Peopleware, which I wrote about at InfoWorld:

The jelled team is so tightly knit that the whole is greater than the sum of its parts. There is low turnover, a strong sense of identity, a sense of elitism, joint ownership of products, and enjoyment derived from participation.

Can’t wait to do more.

Update: Ed Ho writes in a little more detail about how the demo works, and kindly gives credit to his teammates.

Berkeley-area doctors map mashup

Mashup screen I was sorting through some old papers and found one of those thick health care provider directories that you used to get when you started a new job with new health insurance. While most providers disseminate that information online now, the display of the information is often close to useless — you run a search and get dozens of providers back, and even if you can drill down by specialty, you’re still looking at a bunch of addresses with no sense of where they are relative to where you live. And who wants that kind of aggravation when you’re already sick?

To get ahead of the game (while I’m not sick), I created a Berkeley-area doctors maps mashup using screen-scraped data from my health care provider. I’m not a great interface designer, so it’s Web 1.0-certified, complete with frames. What the interface lacks in pizzazz, it hopefully earns back in simplicity: there’s a list of medical specialties on the left, and when you click on one, the providers that match that speciality display on the map in the window on the right.

Getting the data in shape was the hardest part, and required quite a bit of Perl elbow-grease with a little MySQL database design thrown in. From there, a little PHP hacking leveraging the Yahoo! Maps API and voila! That pediatric gastroenterologist that I hope you will never need is just one click away.

While the data part of this equation was difficult (it would have been WAY easier if this information was available via RSS), I think the utility of such an application made the data parsing worth it.

Bill Gates: The Udell Interview

Dan Farber offers high praise for Jon Udell’s recent podcast interview with Bill Gates, saying that “it really shows the geeky Gates, and is one of the better interviews I have read/heard in covering Gates for more than two decades.” I agree (though I haven’t been following Gates for two decades yet myself). I listened to the podcast and enjoyed exchanges like this one (a big thanks to Jon for putting up a transcript — but you should listen anyway because a transcript doesn’t do justice to the palpable geeky excitement in Gates’ voice):

JU: Yeah, somebody had a nice quote that RSS is the human face on Web services. I kind of like that a lot and related to that is something that I’ve said a few times, which is that human beings are the exception handlers in all workflows. And so…

BG: Absolutely. That’s a really good way of capturing something I was saying about the boundary between structured and unstructured. Eventually you’ve got to know who in what role and how to communicate to them, because if software could just talk to software, we could get rid of all the humans. Everything that’s real, eventually there’s a human involved in. And there is a little bit of tension between very interpretive, simple-to-create stuff, like REST or POX, and very structured, tight stuff like Web services. And if the industry is smart, we can get the best of both worlds, where things that are not very complex, you just want to go get a stock quote, a weather thing, fine. Use REST. Even, you know, go to Wordpad and type in the ugly URL.

If this interview was a book, it would be much closer in spirit to an O’Reilly title than the relative fluff we got from Bill Gates in The Road Ahead, a book that surely helped thousands of businessmen achieve deep sleep on airplanes back in the day.

Site to watch: ProgrammableWeb.com

I just subscribed to the ProgrammableWeb.com blog, described by the site’s creator (John Musser) here:

So what’s the point of this site? Although still euphemistically ‘in beta’, the goal is to create a home page for Web 2.0 developers. Content to include news, reviews, comparisons, and examples. Formal APIs, unofficial APIs, and accidental APIs are all fair game. Anything ‘programmatic’ that’s publicly accessible online from sources including Amazon, Google, eBay, Microsoft, del.icio.us, Feedster, UPS, EVDB, WeatherBug, indeed, Blogger and others. [Hey John, don’t forget Yahoo! Actually, John does list Yahoo’s APIs here. – CD]

Why? Because going From Web Page to Web Platform is a big deal. It’s immature and a bit ill-defined but full of potential. To particpate as developers requires understanding, and to do that means to know what the parts are and how they work.

Another way to look at this site is from its genesis: frustration. I wanted to get the ‘big picture’ view of web apis. So I picked-up what books I could find (like Iverson). Pretty good start. But not enough. Then where? Everywhere. Despite what seems like an infinite number of social/web2.0 blogs, sites and businesses, I still couldn’t find the ‘go-to’ place I wanted.

Although it’s in the early stages, the site looks promising and I agree with John that the web-as-platform is a HUGE deal (why else would I leave my CTO gig to take a job at Yahoo! with the word “platform” in the title?) I actually met John when I was at InfoWorld since he used to be involved in the NY CTO Club that I wrote about and visited regularly. I’ll definitely be keeping an eye on his site.

Frederick Brooks / Ruby on Rails smackdown

Over at the 37Signals blog, there’s a post praising Frederick Brooks’ absolutely timeless Mythical Man-Month book (Wikipedia entry here), following up on a prior post espousing a “three people for version 1” philosophy, described as follows:

If you can’t build your version 1 with three people, then 1. you need different people, or 2. you need to slim down your version 1. Now, before I get yelled at, this doesn’t apply to every project, but I do believe it applies to the majority. And sure, if you are building a weapons system, a nuclear control plant, a banking system for millions of customers, or some other life/finance-critical system, then you may need a fourth.

But keep it in mind: three for version 1. Remember, it’s better to make version 1 half a product than a half-assed product. Three people will keep you closer to half a product and a cleaner, tighter, simpler base on which you can grow later.

The mysteries of the art of software development have always intrigued me, and I wrote about Frederick Brook’s amazing “No Silver Bullet” essay a couple of times in my blog and weekly column when I was at InfoWorld (man, that column about web services seems DATED now!) It’s one of those essays that’s worth re-reading every couple of years.

One of the comments to the post at 37Signals makes reference to the “No Silver Bullet” essay and puts it into the context of the Ruby on Rails phenomenon:

His essay called “No Silver Bullet” predicted that there would be no single significant development in programming that would increase productivity by an order of magnitude (whatever that means).

I’d be interested to hear JF/DHH’s opinion on that re: Ruby on Rails. Does RoR constitute an advancement of this type? Or, does RoR incorporate multiple technologies which collectively increase productivity (thus validating Brooks’ assertion).

Interesting question — I’ll stay tuned with my newsreader to see the response (and this isn’t really a Frederick Brooks / Ruby on Rails “smackdown,” I just liked the way that sounded as a title to this post!)

Incidentally, my friend and former Salon.com colleague Scott Rosenberg is currently deep in the process of writing a book that touches on some of the issues he wrote about in one of his Salon columns — can’t wait to read it.

PHP/Perl programming language mashup!

(Via O’Reilly Radar) My friend and former Salon.com colleague David Wheeler announced on Saturday that the new version of open source CMS Bricolage supports PHP templating via a new Perl module developed by George Schlossnagle. From the README:

This module provides a way to load a PHP interpreter into your Perl programs. The PHP interpreter then automagically has access to all of the modules and variables loaded into Perl. So PHP executed from Perl can use any Perl modules.

This is so cool and generally amazing (and it gave me the opportunity to debut the word “mashup” on this blog). When you’re remixing at the language level, you’re enabling all sorts of potential. Go Team Bricolage!