More Mass Market Oracle

Almost exactly one year ago, I wrote about the lack of hosted Oracle database packages.  Since then I haven’t seen much movement on that front, until last week, when STRATO AG, a German hosting company started offering a STRATO Oracle Server hosting package based on Parallels Virtuozzo Containers and Oracle Express Edition.  Here’s the press release.  I’m having a bit of trouble finding the package on the STRATO site, but if it’s true it’s another valuable option out there for ways to increase Oracle usage at the low end (where a lot of innovation takes place).  I’m curious how STRATO is handling the Oracle licensing issues…

Injection Nation

I’m somewhat surprised to see a lack of Oracle blogging reaction to the recent post on The Daily WTF which goes into great detail on a case of SQL injection.  Maybe we’ve either become tired of it or we assume that “my systems don’t do that!”.

So, how do you audit or track if your system is being hit by injection?  How would you detect it?  Assume you’re “just a DBA” — and no one tells you about applications being deployed that talk to the database.  Is there a way you could tell just by looking from within the database?  What kind of assumptions would you make?

The Value of Information

There has been an interesting and somewhat heated discussion going on about a recent blog post by Dominic Brooks and referenced by Doug Burns about the relative value of data vs. applications.  Actually, most of the heat seems to be directed at a comment made by Tim Gorman on several mailing lists in which he states that:

Data, not programs, is the only thing that matters — applications are transient and have no value except to acquire, manipulate, and display data. Data is the only thing with value.

I’ve deliberately taken the quote out of context — for that is how it’s being reacted to, fairly or unfairly on Doug Burns’ blog entry.

I’m not actually going to add any fuel to that fire, only offer up some observations.  I think I agree with many who are stating that data that lies about, unexploited by any application, is a pretty useless waste of storage.  That the true value of data comes from an ability to use it through an application which allows one to analyze, manipulate and visualize information synthesized from the data soup.  One reason I’m excited about the new company I’m with is its focus on helping people increase their ability to exploit their data.

To that end, one of my burning interests is in the ease of which the average employee has access to data and the means to create value out of it.  This includes data accessibility combined with compliance controls as well as tools and applications which allow the employee to tease ideas out of the data.  I wish Excel was a better data manipulation and analysis tool, since it’s so ubiquitous.  But my real concern is my perception that the language of data access has been kicked into a corner, shunned by end users and application programmers alike.  I find the lack of SQL knowledge and use appalling in most of the technologists I’ve encountered.  And that’s a real shame — for SQL’s ability to make data accessible I find second to none.  I have an idea about why SQL ability is failing, and I think it goes back to its original development.  The following is from a fascinating interview at McJones titled: The 1995 SQL Reunion: People, Projects, and Politics

Don Chamberlin: So what this language group wanted to do when we first got organized: we had started from this background of SQUARE, but we weren’t very satisfied with it for several reasons. First of all, you couldn’t type it on a keyboard because it had a lot of funny subscripts in it. So we began saying we’ll adapt the SQUARE ideas to a more English keyword approach which is easier to type, because it was based on English structures. We called it Structured English Query Language and used the acronym SEQUEL for it. And we got to working on building a SEQUEL prototype on top of Raymond Lorie’s access method called XRM.

At the time, we wanted to find out if this syntax was good for anything or not, so we had a linguist on our staff, for reasons that are kind of obscure. Her name was Phyllis Reisner, and what she liked to do was human-factors experiments. So she went down to San Jose State and recruited a bunch of San Jose State students to teach them the SEQUEL language and see if they could learn it. She did this for several months and wrote a paper about it, and gained recognition in the human-factors community for her work.[30], 31 I’m not sure if the results were very conclusive; it turned out that sure enough if you worked hard enough, you could teach SEQUEL to college students. [laughter] Most of the mistakes they made didn’t really have anything to do with syntax. They made lots of mistakes – they wouldn’t capitalize correctly, and things like that.

Looking back on it, I don’t think the problem we thought we were solving was where we had the most impact. What we thought we were doing was making it possible for non-programmers to interact with databases. We thought that this was going to open up access to data to a whole new class of people who could do things that were never possible before because they didn’t know how to program. This was before the days of graphical user interfaces which ultimately did make that sort of a revolution, and we didn’t know anything about that, and so I don’t think we impacted the world as much as we hoped we were going to in terms of making data accessible to non-programmers. It kind of took Apple to do that. The problem that we didn’t think we were working on at all – at least, we didn’t pay any attention to it – was how to embed query languages into host languages, or how to make a language that would serve as an interchange medium between different systems – those are the ways in which SQL ultimately turned out to be very successful, rather than as an end-user language for ad hoc users. So I think the problem that we solved wasn’t really the problem that we thought we were solving at the time.

Anyway, we were working on this language, and we adapted it from SQUARE and turned it into English and then we started adding a bunch of things to it like GROUP BY that didn’t really come out of the SQUARE heritage at all. So you couldn’t really say it had much to do with SQUARE before we were done. Ray and I wrote some papers about this language in 1974. We wrote two papers: one on SEQUEL/DML[32] and one on SEQUEL/DDL[33]. We were cooperating very closely on this. The DML paper’s authors were Chamberlin and Boyce; the DDL paper’s authors were Boyce and Chamberlin, for no special reason; we just sort of split it up. We wanted to go to Stockholm that year because it was the year of the IFIP Congress in Stockholm. I had a ticket to Stockholm because of some work I’d done in Yorktown, so Ray submitted the DDL paper to the IFIP Congress in Stockholm, and the DML paper we submitted to SIGMOD. This is the cover page of the SEQUEL/DML paper. It was 24 pages long. These were twin papers in our original estimation. We wrote them together and thought they were of comparable value and impact. But what happened to them was quite different. The DDL paper got rejected by the IFIP Congress; Ray didn’t get to go to Stockholm. I still have that paper in my drawer; it’s never been published. The DML paper did get accepted at SIGMOD. Several years later I got a call from a guy named Larry Ellison who’d read that paper; he basically used some of the ideas from that paper to good advantage. [laughter] The latest incarnation of these ideas is longer than 24 pages long; it’s the ISO standard for the SQL language, which was just described last week at SIGMOD by Nelson Mattos[34]. It’s now about 1600 pages.

It’s from this quote that I believe SQL gained its second-class status — it’s not for programmers, but it’s “too complicated” for end-users who became used to graphically interacting with applications.

Do you have someone on staff who really knows SQL?  Who can make the data super easily accessible to application programmers and end-users alike?  Who removes the barrier and lowers the hurdle in the way of turning data into value?  You’re probably gathering more and more relational data every day — and probably shredding your XML and storing your BLOBs there too.  I’m not saying that SQL is more important than data or the means to analyze it — I am saying that experts at SQL can make your databases perform better AND make it easier for your application people to focus on delivering that data to the people who want to use it.  Don’t put it in the limbo land of being not for programmers and not for end-users.

Update:  I wanted to give credit to the source of my quote:

Copyright (c) 1995, 1997 by Paul McJones, Roger Bamford, Mike Blasgen, Don Chamberlin, Josephine Cheng, Jean-Jacques Daudenarde, Shel Finkelstein, Jim Gray, Bob Jolls, Bruce Lindsay, Raymond Lorie, Jim Mehl, Roger Miller, C. Mohan, John Nauman, Mike Pong, Tom Price, Franco Putzolu, Mario Schkolnick, Bob Selinger, Pat Selinger, Don Slutz, Irv Traiger, Brad Wade, and Bob Yost. You may copy this document in whole or in part without payment of fee provided that you acknowledge the authors and include this notice.

Free Data Now!

So this post is only tangentially related to the current online effort to get access to Oracle 10g’s AWR/ASH data.  It’s actually more about data mashups.

When building an internal corporate application, most of the time you’re familiar with the internal sources of data and how to get at them — although, you can often have lots of “fun” doing the detective work to uncover hidden sources of internal data, fighting the data guardians and mastering the necessary API incantations to pry loose the secrets. :-)

By basing your application on only internal data you may be missing out on interesting opportunities to leverage public and freely available data and feeds.  Although, even there you need to be wary of usage restrictions and odd APIs.  Usage restrictions can be odd — unlimited access for “non-profit use”, or “public use”, but heavy costs and fines for “private” use.  An interesting example is Google Maps — here’s the quote from the Google Maps API page:

The Maps API is a free beta service, available for any web site that is free to consumers. Please see the terms of use for more information.

To use the Maps API on an intranet or in a non-publicly accessible application, please check out Google Maps for Enterprise.

Following the links shows you that to use Google Maps on an internal application will set you back $10,000 per year.

The question of usage rights for data has been a huge source of problems in professional sports leagues.  In 1997, the NBA sued Motorola and STATS, Inc. regarding the real-time dissemination of scores and game information and lost.  However, recently the NCAA clamped down on live-blogging at one of its events.  Even today, you’ll have a hard time finding real-time RSS feeds for MLB sports scores.  Although, those of you who love data analysis may enjoy poring over the MLB Enhanced GameCast data — latest example here and initial research here.

So, clearly you’ll need to be selective when working with external data — there’s also the questions of data reliability and validity.  However, the sheer plethora of public RSS feeds should be a new and interesting source of data considered when building data analysis applications.

I was working on this post last week when the 10g AWR/ASH petition came online here.  I’m of two minds about the issue.  In the end, I think limiting access to this data isn’t going to achieve what I believe to be Oracle’s goal of encouraging sales of Enterprise Manager Diagnostic Pack.  I’ve tried to evaluate the Diagnostic Pack — it’s not easy to install, and not easy to navigate.  Go ahead, try and find the links on oracle.com to the download and/or documentation.  (I tried this for the Change Management pack, and it was just about impossible to find).  Compare that with how easy it is to find and download the database, client tools and/or SQL Developer.  In the end, I think you’ll see people trying to re-create STATSPACK for 10g in a “public” fashion rather than running out and installing Enterprise Manager.  So, you’ll have a lot of time and effort spent in lobbying Oracle for access to this data, and/or effort spent on re-creating the data capture and storage in a public way — instead of inventing useful analysis tools based on the data that’s already there.

Ah well…

More interview questions

This weekend, Robert Vollman posted some of his ideas for SQL Interview questions.  I thought I’d share mine as well.  Mine are very simple, and really don’t call for single word answers.  When giving them, I used to give people a “grade” for their answer.  One of A, B, C and D (sometimes F).  “A” meant they fully understood the concept behind the question.  “F” meant they totally didn’t know and guessed wrong.  Sometimes I gave extra credit when they said they didn’t know the answer, and then asked me — I thought that demonstrated an innate curiosity.

Oh well, without further fuss, here are my questions:

SQL

What’s an inline view?
Can you have a SELECT clause in a SELECT column list?
If yes, what does it do when it retrieves no rows, more than one row?
What are analytic functions and give an example?
What does the NO_MERGE hint do?
What does the CARDINALITY hint do?
What’s your rule of thumb for index vs full-table scan?
What are the benefits and problems of parallel hints?
In implicit type conversions does Oracle convert numbers to chars or chars to numbers?

PL/SQL

How can you pass an array to / from a PL/SQL proc?
How can you join a table to a PL/SQL array?  Any issues to be aware of?
Can you use analytic functions in PL/SQL?
What do you use to write PL/SQL code?

Performance

What view shows what sessions are waiting on?
What’s an enqueue wait?
How can you tell which tables are being locked?
How can you tell which tables are having high row-level locking contention?
What’s a logical I/O?
How do you identify “bad” statements?  What do you consider bad?
How do you measure TEMP space usage?
How can you tell how long a statement will take before it finishes?
What is a “fetch across commit”?
How can you tell how much undo / rollback size a statement is using?

Physical Layout / Design

What’s the difference between a session and a process?
What is MTS and why would you use it?
What’s a function based-index?
What’s an IOT?
Can you create an index on an IOT?  What’s in it?
What’s a single-table hash-cluster?
Explain the difference between a global index and local index.
What’s a queue table?
What’s the difference between an LMT and a DMT?
What does AUTOALLOCATE mean?
What does DROP COLUMN do?
What does ALTER TABLE ADD COLUMN DO?  What happens if you add a DEFAULT VALUE?
Do you need to index columns which are foreign keys?  Why or why not?
What’s Advanced Queuing?
What’s XMLTYPE?
What’s online table redefinition?  How does it work?

Logical Layout / Design

What’s the CACHE parameter on sequence creation for?