Free Data Now!July 2nd, 2007 — ddelmoli
So this post is only tangentially related to the current online effort to get access to Oracle 10g’s AWR/ASH data. It’s actually more about data mashups.
When building an internal corporate application, most of the time you’re familiar with the internal sources of data and how to get at them — although, you can often have lots of “fun” doing the detective work to uncover hidden sources of internal data, fighting the data guardians and mastering the necessary API incantations to pry loose the secrets.
By basing your application on only internal data you may be missing out on interesting opportunities to leverage public and freely available data and feeds. Although, even there you need to be wary of usage restrictions and odd APIs. Usage restrictions can be odd — unlimited access for “non-profit use”, or “public use”, but heavy costs and fines for “private” use. An interesting example is Google Maps — here’s the quote from the Google Maps API page:
To use the Maps API on an intranet or in a non-publicly accessible application, please check out Google Maps for Enterprise.
Following the links shows you that to use Google Maps on an internal application will set you back $10,000 per year.
The question of usage rights for data has been a huge source of problems in professional sports leagues. In 1997, the NBA sued Motorola and STATS, Inc. regarding the real-time dissemination of scores and game information and lost. However, recently the NCAA clamped down on live-blogging at one of its events. Even today, you’ll have a hard time finding real-time RSS feeds for MLB sports scores. Although, those of you who love data analysis may enjoy poring over the MLB Enhanced GameCast data — latest example here and initial research here.
So, clearly you’ll need to be selective when working with external data — there’s also the questions of data reliability and validity. However, the sheer plethora of public RSS feeds should be a new and interesting source of data considered when building data analysis applications.
I was working on this post last week when the 10g AWR/ASH petition came online here. I’m of two minds about the issue. In the end, I think limiting access to this data isn’t going to achieve what I believe to be Oracle’s goal of encouraging sales of Enterprise Manager Diagnostic Pack. I’ve tried to evaluate the Diagnostic Pack — it’s not easy to install, and not easy to navigate. Go ahead, try and find the links on oracle.com to the download and/or documentation. (I tried this for the Change Management pack, and it was just about impossible to find). Compare that with how easy it is to find and download the database, client tools and/or SQL Developer. In the end, I think you’ll see people trying to re-create STATSPACK for 10g in a “public” fashion rather than running out and installing Enterprise Manager. So, you’ll have a lot of time and effort spent in lobbying Oracle for access to this data, and/or effort spent on re-creating the data capture and storage in a public way — instead of inventing useful analysis tools based on the data that’s already there.