Sequence Smackdown

A short post where I kick myself for forgetting something basic…

In recent engagement, I come across a "smelly" construct (database smells) that looks like this:

Select max(error_id)+1 into new_error_id from error_log;

"Why aren’t they using a sequence?", I wondered.

The reason, of course, is that the PL/SQL developers need to request the creation of each and every object from the production support DBAs, and since such requests require review by the central data architects for correctness before being approved for creation in development, the process can take 4-5 days. As a result, they took this "shortcut". (Reason #392 of why I don’t think production support DBAs should have any place in the development process, but that’s another story).

The good news is that they recognized this was bad after I pointed it out, and they went ahead and requested the sequence.

One week later, we get the sequence, correct the code and promote it to the integration environment.

Where we promptly get uniqueness violations when attempting to insert rows into the table because the sequence number was less than the max(error_id) already in the table.

"No problem!", I said – I didn’t want to re-create the sequence with a larger "start with" (due to the turnaround time), so I take a lazy shortcut:

Declare
I number;
J number;
begin
select error_id_seq.nextval into I from dual;
select max(error_id) into J from error_log;
while I <= J loop
select error_id_seq.nextval into I from dual;
end loop;
end;
/

Yes – I know this is all kinds of horrible, but I was in a hurry and didn’t think.

And the worst part is that it didn’t even work.

They still got uniqueness violations and came to me later saying that there were problems with the sequence – that when they selected maxval from the sequence in TOAD they got one value (1000), and when they selected maxval from the sequence via SQL Developer, they got another value (300).

What did I forget / do wrong?  What should I have done?

I eventually figured it out and "fixed" it.

There’s a coda to this – after I smacked the palm of my hand to my forehead and then explained the problem to the PL/SQL developers I thought they understood it. But later in the day they came to me and said they were having the same problem with a different sequence (getting different – and much smaller – values when selecting maxval from different tools)…

I should have done a better job of explaining it.J

Arbitrary

Do you hate arbitrary requirements?  You know the ones — like: the customer account number must be a 10-digit number without any leading zeros and no more than 3 repeated digits?  Don’t you always try to argue the user back into letting you use a simple sequence generator — maybe giving in on the leading zero requirement, but arguing against trying to make sure there aren’t 3 of the same digits in a row? :-)

Maybe if you thought there was a good reason, or authority or research on why that requirement was a good idea, then you’d see it as an interesting challenge rather than a burden?  Maybe if you read a post by Seth Godin about it?

Wonder if the user thinks that some of the database limitations are arbitrary?

So, today’s challenge — implement a stored procedure to generate such a serial number…

Injection Nation

I’m somewhat surprised to see a lack of Oracle blogging reaction to the recent post on The Daily WTF which goes into great detail on a case of SQL injection.  Maybe we’ve either become tired of it or we assume that “my systems don’t do that!”.

So, how do you audit or track if your system is being hit by injection?  How would you detect it?  Assume you’re “just a DBA” — and no one tells you about applications being deployed that talk to the database.  Is there a way you could tell just by looking from within the database?  What kind of assumptions would you make?

The Value of Information

There has been an interesting and somewhat heated discussion going on about a recent blog post by Dominic Brooks and referenced by Doug Burns about the relative value of data vs. applications.  Actually, most of the heat seems to be directed at a comment made by Tim Gorman on several mailing lists in which he states that:

Data, not programs, is the only thing that matters — applications are transient and have no value except to acquire, manipulate, and display data. Data is the only thing with value.

I’ve deliberately taken the quote out of context — for that is how it’s being reacted to, fairly or unfairly on Doug Burns’ blog entry.

I’m not actually going to add any fuel to that fire, only offer up some observations.  I think I agree with many who are stating that data that lies about, unexploited by any application, is a pretty useless waste of storage.  That the true value of data comes from an ability to use it through an application which allows one to analyze, manipulate and visualize information synthesized from the data soup.  One reason I’m excited about the new company I’m with is its focus on helping people increase their ability to exploit their data.

To that end, one of my burning interests is in the ease of which the average employee has access to data and the means to create value out of it.  This includes data accessibility combined with compliance controls as well as tools and applications which allow the employee to tease ideas out of the data.  I wish Excel was a better data manipulation and analysis tool, since it’s so ubiquitous.  But my real concern is my perception that the language of data access has been kicked into a corner, shunned by end users and application programmers alike.  I find the lack of SQL knowledge and use appalling in most of the technologists I’ve encountered.  And that’s a real shame — for SQL’s ability to make data accessible I find second to none.  I have an idea about why SQL ability is failing, and I think it goes back to its original development.  The following is from a fascinating interview at McJones titled: The 1995 SQL Reunion: People, Projects, and Politics

Don Chamberlin: So what this language group wanted to do when we first got organized: we had started from this background of SQUARE, but we weren’t very satisfied with it for several reasons. First of all, you couldn’t type it on a keyboard because it had a lot of funny subscripts in it. So we began saying we’ll adapt the SQUARE ideas to a more English keyword approach which is easier to type, because it was based on English structures. We called it Structured English Query Language and used the acronym SEQUEL for it. And we got to working on building a SEQUEL prototype on top of Raymond Lorie’s access method called XRM.

At the time, we wanted to find out if this syntax was good for anything or not, so we had a linguist on our staff, for reasons that are kind of obscure. Her name was Phyllis Reisner, and what she liked to do was human-factors experiments. So she went down to San Jose State and recruited a bunch of San Jose State students to teach them the SEQUEL language and see if they could learn it. She did this for several months and wrote a paper about it, and gained recognition in the human-factors community for her work.[30], 31 I’m not sure if the results were very conclusive; it turned out that sure enough if you worked hard enough, you could teach SEQUEL to college students. [laughter] Most of the mistakes they made didn’t really have anything to do with syntax. They made lots of mistakes – they wouldn’t capitalize correctly, and things like that.

Looking back on it, I don’t think the problem we thought we were solving was where we had the most impact. What we thought we were doing was making it possible for non-programmers to interact with databases. We thought that this was going to open up access to data to a whole new class of people who could do things that were never possible before because they didn’t know how to program. This was before the days of graphical user interfaces which ultimately did make that sort of a revolution, and we didn’t know anything about that, and so I don’t think we impacted the world as much as we hoped we were going to in terms of making data accessible to non-programmers. It kind of took Apple to do that. The problem that we didn’t think we were working on at all – at least, we didn’t pay any attention to it – was how to embed query languages into host languages, or how to make a language that would serve as an interchange medium between different systems – those are the ways in which SQL ultimately turned out to be very successful, rather than as an end-user language for ad hoc users. So I think the problem that we solved wasn’t really the problem that we thought we were solving at the time.

Anyway, we were working on this language, and we adapted it from SQUARE and turned it into English and then we started adding a bunch of things to it like GROUP BY that didn’t really come out of the SQUARE heritage at all. So you couldn’t really say it had much to do with SQUARE before we were done. Ray and I wrote some papers about this language in 1974. We wrote two papers: one on SEQUEL/DML[32] and one on SEQUEL/DDL[33]. We were cooperating very closely on this. The DML paper’s authors were Chamberlin and Boyce; the DDL paper’s authors were Boyce and Chamberlin, for no special reason; we just sort of split it up. We wanted to go to Stockholm that year because it was the year of the IFIP Congress in Stockholm. I had a ticket to Stockholm because of some work I’d done in Yorktown, so Ray submitted the DDL paper to the IFIP Congress in Stockholm, and the DML paper we submitted to SIGMOD. This is the cover page of the SEQUEL/DML paper. It was 24 pages long. These were twin papers in our original estimation. We wrote them together and thought they were of comparable value and impact. But what happened to them was quite different. The DDL paper got rejected by the IFIP Congress; Ray didn’t get to go to Stockholm. I still have that paper in my drawer; it’s never been published. The DML paper did get accepted at SIGMOD. Several years later I got a call from a guy named Larry Ellison who’d read that paper; he basically used some of the ideas from that paper to good advantage. [laughter] The latest incarnation of these ideas is longer than 24 pages long; it’s the ISO standard for the SQL language, which was just described last week at SIGMOD by Nelson Mattos[34]. It’s now about 1600 pages.

It’s from this quote that I believe SQL gained its second-class status — it’s not for programmers, but it’s “too complicated” for end-users who became used to graphically interacting with applications.

Do you have someone on staff who really knows SQL?  Who can make the data super easily accessible to application programmers and end-users alike?  Who removes the barrier and lowers the hurdle in the way of turning data into value?  You’re probably gathering more and more relational data every day — and probably shredding your XML and storing your BLOBs there too.  I’m not saying that SQL is more important than data or the means to analyze it — I am saying that experts at SQL can make your databases perform better AND make it easier for your application people to focus on delivering that data to the people who want to use it.  Don’t put it in the limbo land of being not for programmers and not for end-users.

Update:  I wanted to give credit to the source of my quote:

Copyright (c) 1995, 1997 by Paul McJones, Roger Bamford, Mike Blasgen, Don Chamberlin, Josephine Cheng, Jean-Jacques Daudenarde, Shel Finkelstein, Jim Gray, Bob Jolls, Bruce Lindsay, Raymond Lorie, Jim Mehl, Roger Miller, C. Mohan, John Nauman, Mike Pong, Tom Price, Franco Putzolu, Mario Schkolnick, Bob Selinger, Pat Selinger, Don Slutz, Irv Traiger, Brad Wade, and Bob Yost. You may copy this document in whole or in part without payment of fee provided that you acknowledge the authors and include this notice.

Friends…

WHEN OTHERS THEN NULL; has a new friend…