Talkin’ ‘Bout My Generation…

With apologies to The Who.

I’ve written before about my obsession with sql.bsq :-)

The genesis of which began during a stint at a customer site where I simply providing on-call support without any specified tasks.  This site made heavy use of Pro*C, with a common problem being making sure that local variables were declared with the same precision as their database column counterparts.

I couldn’t understand why the developers were declaring their variables by hand — didn’t the data dictionary have all of the information to generate this stuff?  And didn’t SQL*Plus have an ability to spool output to a flat file?  How hard could it be to just ask for the table definition and turn it into a .h file?

And so, my first code generation script:

rem
rem SCCS
rem File Name : %W%
rem Check-in Date :
rem SCCS Version : %I%
rem Path : %F%
rem NAME
rem procvars.sql
rem DESCRIPTION
rem Select script that will create Pro*C variable declarations for
rem a given table. All variables except NUMBER are declared as
rem VARCHARs (NUMBERS are declared as long or double, depending on
rem their data scale).
rem
rem This script spools its output to a file with the name
rem [table_name]_vars.h
rem HISTORY
rem Who When Reason
rem —— —— —————————————–
rem Dom 1/10/94 Creation
rem

The interesting thing about this script (I liked source control, even back then), was that it started me down the path of exploring the data dictionary.  I was fascinated by the data dictionary in Oracle 6 — it had EVERYTHING in it.  I’d worked with other databases before (dBase III & IV, SQL/DS), but none of them had anything as detailed as the Oracle data dictionary.  I loved the fact that it was also self-describing — selecting from dba_views gave the names of every view.  To this day, I don’t understand people who have those huge posters of the data dictionary — why would you put up something which would become obsolete so quickly when you could just query the system for the info?

Anyway, the script referenced all_tab_columns, and I wondered why it was so slow — and that’s when I started digging into tab$, col$ and sql.bsq.  I liked going after the base tables instead of the views — and it sped up things considerably.

After that, I began writing scripts (using the same framework) to generate DDL for all kinds of things — since at the time I wasn’t familiar enough with the Unix strings command to pull DDL from Export dump files.

I still have those scripts on my hard drive — they’ve followed me everywhere I go.  Scripts with ALL CAPS names limited to 8 characters long.

MKINDEX.SQL
MKSEQUEN.SQL
MKSYNONY.SQL
MKTABLE.SQL
MKVIEW.SQL

When I first was confronted with the need to maintain source code controlled database scripts, these old scripts became the basis for how I would attack the problem…

Why code stuff by hand when you could generate it?  And not out of thin air either, but based on the actual definition in the system.

Change Management

I know, I’m staring at a box with the word Pandora written all over it.

Over the years, I’ve worked on many ways to let database schema evolution and software change management co-exist peacefully.

During the early years, I was happy if the init.ora file was under SCCS control. I re-read the original OFA document and couldn’t find any reference to pfile standards, but I recall that we thought it would be neat if changes to the init.ora file were tracked and kept under version control with history.

James Morle has an excellent section in his book, Scaling Oracle8i, which discusses the init.ora file and mentions SCCS here.

This is one of the reasons I don’t use binary spfiles nowadays….

Anyway, back then we didn’t worry so much about schema evolution and change management. Who would’ve thought that these systems would last long enough to have such a requirement? :-)

I was introduced more formally to database source code control by John Beresniewicz, whom I met during my time as CTO at Savant Corporation. John was the primary developer of an application (Savant Q) which had a repository and several robust PL/SQL packages. It was one of the first retail applications I encountered which made heavy use of PL/SQL. As such, John had everything under source code control using MS SourceSafe. Since the company was really small, I ended up chipping in on the installation code.

As with many retail applications, the focus of the installation code was on “clean” installations instead of upgrades. So, our source code control reflected that with heavy control on the creation commands, and specialized files for delta upgrades. Packaging usually meant making an image of the entire source tree for burning to CDs.

Source code control for packaged software differs somewhat significantly from that for custom internal corporate applications. Heck, source code control at many corporations can be a hit-and-miss affair for all kinds of code, let alone database schema evolution.

My current company has a very strong source code control ethic, which has led me to work on ways to manage schema evolution in line with that sense of responsibility.

The goals differ somewhat from the ones we used on packaged software. “Releases” occur very frequently, often every 2 months. The emphasis is on “upgrades”, not installations from scratch. (Every once in a while, I get asked about whether or not we could create the corporate databases using only the scripts from source code control. While I’d like to support that, the real need for such a capability is low. The idea that I’d take 5-7 year old scripts to create an empty database and re-import 2-3 TB of data seems silly to me when it would be much, much more practical to practice doing backups and restorations).

I intended this topic to be the subject of my paper at Hotsos 2007, but I ended up not being able to attend the conference, and with that the paper never got written. :-)

2recent blog posts *here and here) by Dominic Brooks got me to thinking about this topic again, so I’ll be posting on it over the next few days. Perhaps with your feedback, I’ll eventually coalesce these posts into a decent paper on the subject.

Mean Mr. Modeler

With apologies to the Beatles. I read a postby Tom Kyte the other day about a data model that had passwords as clear-text, unique keys on a table and even used them as foreign keys! After reading it, I was reminded of a conversation I had with my wife, Wendy, about how she’s seeing a revival of interest in enterprise data modeling due to the SOA / BPEL fad surge ;-).

Despite some people’s opinion to the contrary, I’m less of a purist than I sound. I love data modeling, but I recognize that it can be an endless quest for non-achievable perfection. I once worked on a project to fully model all of the financial reporting attributes for a Fortune 500 company — the project took a full year to model, at which point the model was obsolete :-).

After that debacle, I’ve come to realize that data modeling (and the eventual goal of a data model, which is its representation in the physical implementation world) has a heavy dose of artistry along with the science of analysis and normalization. And I’m not sure how to teach the art — so much of how to model something includesthe requirement to understand how the data will be accessed and processed.

A common problem I encounter is a data “model” produced by programmers who have no experience in the art. These are usually physical models which mimic flat files or maybe, just maybe an advanced structure like a linked-list or tree. Most of the time they suffer greatly from poor attribute definition, lack of any keys, and no appreciation for scalability or schema evolution.

In the end, those last 2 items appear to be the biggest problems. I usually handle them with3 questions every time I’m presented with a physical model:

  1. How will this handle concurrency?
  2. What will it look like with 1 million entries? 10 million? 1 billion?
  3. How hard will it be to add attributes? Add dimensions (versions, history, etc.)?

Development Standards, Part 1 – SQL

First up in our work on development standards refers to SQL.  A reminder that these are just my opinions, based on what I’ve seen work — I’m perfectly willing to hear counter-arguments, proposals and even full-blown critiques of what’s wrong, what’s better or otherwise :-)

I like to keep the overriding goal in place here — these standards should not place undue burdens on developers.  They should be simple enough that even novice developers can abide by them without requiring a ton of knowledge, but also be appreciated by savvy developers.

(It does help if the developer has some knowledge of how Oracle works in order to appreciate the standard — hence why I continue to believe that you want people with some DBA experience to be your SQL developers instead of trying to teach the Java / PHP and Perl people this stuff).

Anyway, I’ve got very few standards for SQL statements:

  1. Whenever there are 2 or more tables in the statement, always alias each table and every referenced column.  Be consistent — don’t use schema.tablename.columnname in one place and then alias.columnname in another.  You don’t need to “standardize” on table aliases across all systems, just within your statement.
  2. I prefer the old-style syntax to the new, fancy ANSI-style join syntax, but that’s just me.  But whatever you do, pick one and stick with it or at least state that all new code will adhere to it and all rewrites will bake it in.
  3. Use bind variables unless you have a absolute reason not to — one you can actually articulate.
  4. Learn to use sqlplus AUTOTRACE and run the statement against a representative data set (we’ll talk about that later) using representative bind variables.  Your goal is 5-10 consistent gets per table join per row retrieved or processed.  (I don’t know how to represent this kind of information in other databases — another reason I like Oracle is that it gives you this kind of statistical feedback “for free”).
  5. Use as few hints as possible — let the optimizer earn its keep (and justify part of the price you’re paying for a robust database).
  6. Try to look at things from the database’s perspective — what information does it have about the data it’s trying to work with?  Understand that it usually does the best it can with the information it has on hand.
  7. Minor nit — don’t rely on GROUP BY for data ordering.  If you want something sorted, use the ORDER BY clause — that’s what it’s there for.
  8. Read the Functions section in the SQL Reference Guide.  You’d be surprised how often you can do something without having to write 3GL code.

A lot of people like to quibble about the AUTOTRACE item.  “Why 5-10, why not 1-2…” — the truth is that I don’t care all that much about the particular number.  What I do care about is that the system uses as few resources as possible related to the size of the result set it’s asked to produce.  The ultimate goal is to produce the largest possible result set using very few “work units” (ultimately, time).  A simple standard and process to measure it is all that is needed.

This simple set of rules usually produces pretty good SQL, even if your developers are new to the game.

Database Development Standards, Part 0 – Principle

I’ve been thinking lately about what topics to post on — and while I’m attracted to the religious arguments about developer DBAs and DBA developers and who develops the SQL and who designs the tables and who writes the PL/SQL and whether or not you should use stored procedures… (deep breath)… I think we’ve all heard variations of that song ad nauseum. :-)

So, given that, I think I’d like to talk about the development process itself.

I’m going to try an describe a possible process for developing an internal applications with reasonably high uptime requirements and fairly frequent feature introduction and/or bug fixing.

I’m NOT going to focus on coding and naming standards.  Honestly, I don’t care too much about those — only that they are consistent.  And heck, anybody who specifies spacing and alignment needs to chill out and look into pretty-printers and code formatters.

This will be a multi-part series and I’d love to hear feedback — even specific questions and scenarios.

BTW, the reason I find this topic interesting has to do with some of my notions around the construction and formation of development teams.  In general, I believe that as a manager my job is to remove as many non-coding obstacles as possible from my staff in order to allow them to be as productive as possible.  (I read a fair amount of Mills and Yourdon a long time ago — I know, today I’m supposed to speak about “agile” development and “extreme programming”, but sometimes I just like to avoid the tags and speak about the underlying methods and results).

I’ll close this post with a guiding principle: A development process should promote efficiency — not add overhead or administrivia.