Outing my source

I know, it’s been a while.  Here’s an older topic that I’ve been thinking about lately — and that’s dealing with outsourced code.

Outsourcing development is attractive for its context-based lower development costs and faster turnaround (I say “context-based”, because often times if you relieved an in-house group of its non-productive responsibilities and allowed it to compete on an equal footing, you’d get the same level of cost and quality).  Unfortunately, too many outsourced projects don’t take post-delivery activities into consideration.  Often the contracts are written with the focus on functionality and time-to-completion without corresponding rigor being applied to code maintainability and compatibility with existing maintenance processes.

It’s a real shame when the cost savings associated with the development and delivery are obliterated by cost increases in maintenance and operation.  The challenge is to make sure that the contract stipulates adherence to internal standards which will support internal takeover of the developed code.

Here are some questions to ask when you get a chance to provide input to an outsourced development contract:

  1. Will the code follow our development standards, naming conventions, packaging rules?  If not, why not?  If not, will we have to re-work the code upon delivery?
  2. Can the code be checked into our source-code control system when we get it?  If not, why not?
  3. Can the code be compiled, built and made installable by our local build system?
  4. Will the code use third party software that we’re not familiar with?  Will it require separate licensing?  Will we be able to get training on it?
  5. Who will do integration, system and/or scalability testing?
  6. Who generates the test data and who owns that testing data?
  7. Is the software built to support regression and/or scenario testing?  Will it run in our local testing environment?

In the end, it becomes your code and your responsibility.  It should be held to the same standards as your internal code development — and that shouldn’t be a problem.

Column Naming

Laurent’s post on column qualification reminded me of a conversation I had with one of my staff about column naming standards.  But before I get to that conversation, I’d like to share my favorite example about using table aliases.

I’m a big fan of always using table aliases and always qualifying column names.  My favorite example of why it’s a good idea to do it is as follows:

select * from v$session where saddr in (select saddr from v$transaction)

Can you spot the problem here?  There isn’t an saddr column in v$transaction.  But if you run that query, you don’t get an error.  In fact, you get all of the sessions from v$session — when you probably only wanted the sessions which had active transactions.  If you were in the habit of always qualifying the table names like so:

select s.* from v$session s where s.saddr in (select t.saddr from v$transaction t)

You’d get ORA-00904: “T”.”SADDR”: invalid identifier.

You can argue about whether or not the original statement should throw an error, but it doesn’t matter.  If you used the good practice of always aliasing tables and qualifying columns, you wouldn’t need to waste your breath.

One “argument” I hear against qualifying column names comes from people who have a policy of making sure every column name in the system is unique, usually by including some kind of table name abbreviation prefix on every column in a table.  My staff member asked me my opinion of this.  Bleh.  I’ve got nothing really against it, other than I think it’s not feasible to enforce and it hampers readability.  I also don’t like how that policy treats foreign key columns — usually inconsistently.  And in the end, it seems to just be an excuse for not having to qualify columns in queries anyway :-)

Superscript

…back to Change Management.

In my earlier post, I talked about the strong change management ethic at my current company.  It was interesting to see how my predecessor here adapted database schema evolution to that ethic.

I need to add here that I find “ancient” software development organizational ideas like “chief” or “surgeon” programming teams to be fascinating.  In particular because each model describes “super” programmers who end up being responsible for 80-90% of the code anyway due to shear talent and dominant personality.  Almost everywhere I’ve gone has had a local expert who seemed to qualify for that kind of title.  The relevance of ideas from Mills and Brooks is commented on here.

The biggest thing I’ve taken away from those ideas of “super” programmers is that, as a manager, I need to help every member of my staff become as close to “super” as possible.  That means removing as many non-productive, non-programming, non-learning tasks as possible.

Another way to put this is to find out ways to reduce the effort and work required to do non-programming tasks.

This is where the DDL generation scripts came into the picture.

When I arrived, every programmer hand-coded the scripts which were checked into the source code control system.  Everyone had to remember to include the source code control tags, and script names weren’t obvious.  Also, it wasn’t always clear that every changed object and/or procedure was being captured with every release.

To address those issues, I:

  1. Required that all code checked into the source code control system had to be generated via the DDL generation scripts.  We modified the scripts to include the source code control tags automatically.  We also did things like include primary key indexes, grants and initial stats analysis in table creation scripts.  We also injected source code control tags into the stored procedure and package code so that they could be seen when browsing the database.
  2. Modified the scripts to include the object type, schema name and object name in the generated file name.  So, a script that created the SCOTT.EMP table ended up being called mk-tbl-scott-emp.sql  (We used mk for created, rm for drop and ch for alter).
  3. Turned on DDL auditing for all objects in the development database.  This allowed us to double-check and make sure we captured and verified which objects were changed leading up to a release.

A note on DDL auditing — I like to try and have this turned on in every database I work with.  Development, Test, QA, Production, you name it.  I think it’s always useful to have a log of structure changes to the database — and why code up a special trigger when the database will track it automatically?  The only objection I’ve heard to this is space — and the space required is so ridiculously small.  Sometimes people complain about performance — and I say that if you have a system that relies on 1000’s of DDLs per day, you’ve got a system that needs work.  Who does DDL as part of an OLTP transaction?

Our biggest remaining challenge with this approach was how to create the “master” installation “super” scripts.  I’ll cover that in a later post…

(One other concern is the shear number of files this approach can generate.  If Oracle did this for the SYS schema, you’d have over 4,000 individual files.  On the other hand, this works well when you only need to re-create a single object without searching through 4-5 cat scripts to find the object definition.  Other systems use object definition files which are then read by specialized programs to generate installation scripts — that seems like a lot of work for an internal corporate application).

Talkin’ ‘Bout My Generation…

With apologies to The Who.

I’ve written before about my obsession with sql.bsq :-)

The genesis of which began during a stint at a customer site where I simply providing on-call support without any specified tasks.  This site made heavy use of Pro*C, with a common problem being making sure that local variables were declared with the same precision as their database column counterparts.

I couldn’t understand why the developers were declaring their variables by hand — didn’t the data dictionary have all of the information to generate this stuff?  And didn’t SQL*Plus have an ability to spool output to a flat file?  How hard could it be to just ask for the table definition and turn it into a .h file?

And so, my first code generation script:

rem
rem SCCS
rem File Name : %W%
rem Check-in Date :
rem SCCS Version : %I%
rem Path : %F%
rem NAME
rem procvars.sql
rem DESCRIPTION
rem Select script that will create Pro*C variable declarations for
rem a given table. All variables except NUMBER are declared as
rem VARCHARs (NUMBERS are declared as long or double, depending on
rem their data scale).
rem
rem This script spools its output to a file with the name
rem [table_name]_vars.h
rem HISTORY
rem Who When Reason
rem —— —— —————————————–
rem Dom 1/10/94 Creation
rem

The interesting thing about this script (I liked source control, even back then), was that it started me down the path of exploring the data dictionary.  I was fascinated by the data dictionary in Oracle 6 — it had EVERYTHING in it.  I’d worked with other databases before (dBase III & IV, SQL/DS), but none of them had anything as detailed as the Oracle data dictionary.  I loved the fact that it was also self-describing — selecting from dba_views gave the names of every view.  To this day, I don’t understand people who have those huge posters of the data dictionary — why would you put up something which would become obsolete so quickly when you could just query the system for the info?

Anyway, the script referenced all_tab_columns, and I wondered why it was so slow — and that’s when I started digging into tab$, col$ and sql.bsq.  I liked going after the base tables instead of the views — and it sped up things considerably.

After that, I began writing scripts (using the same framework) to generate DDL for all kinds of things — since at the time I wasn’t familiar enough with the Unix strings command to pull DDL from Export dump files.

I still have those scripts on my hard drive — they’ve followed me everywhere I go.  Scripts with ALL CAPS names limited to 8 characters long.

MKINDEX.SQL
MKSEQUEN.SQL
MKSYNONY.SQL
MKTABLE.SQL
MKVIEW.SQL

When I first was confronted with the need to maintain source code controlled database scripts, these old scripts became the basis for how I would attack the problem…

Why code stuff by hand when you could generate it?  And not out of thin air either, but based on the actual definition in the system.

Change Management

I know, I’m staring at a box with the word Pandora written all over it.

Over the years, I’ve worked on many ways to let database schema evolution and software change management co-exist peacefully.

During the early years, I was happy if the init.ora file was under SCCS control.  I re-read the original OFA document and couldn’t find any reference to pfile standards, but I recall that we thought it would be neat if changes to the init.ora file were tracked and kept under version control with history. 

James Morle has an excellent section in his book, Scaling Oracle8i, which discusses the init.ora file and mentions SCCS here.

This is one of the reasons I don’t use binary spfiles nowadays….

Anyway, back then we didn’t worry so much about schema evolution and change management.  Who would’ve thought that these systems would last long enough to have such a requirement? :-)

I was introduced more formally to database source code control by John Beresniewicz, whom I met during my time as CTO at Savant Corporation.  John was the primary developer of an application (Savant Q) which had a repository and several robust PL/SQL packages.  It was one of the first retail applications I encountered which made heavy use of PL/SQL.  As such, John had everything under source code control using MS SourceSafe.  Since the company was really small, I ended up chipping in on the installation code.

As with many retail applications, the focus of the installation code was on “clean” installations instead of upgrades.  So, our source code control reflected that with heavy control on the creation commands, and specialized files for delta upgrades.  Packaging usually meant making an image of the entire source tree for burning to CDs.

Source code control for packaged software differs somewhat significantly from that for custom internal corporate applications.  Heck, source code control at many corporations can be a hit-and-miss affair for all kinds of code, let alone database schema evolution.

My current company has a very strong source code control ethic, which has led me to work on ways to manage schema evolution in line with that sense of responsibility.

The goals differ somewhat from the ones we used on packaged software.  “Releases” occur very frequently, often every 2 months.  The emphasis is on “upgrades”, not installations from scratch.  (Every once in a while, I get asked about whether or not we could create the corporate databases using only the scripts from source code control.  While I’d like to support that, the real need for such a capability is low.  The idea that I’d take 5-7 year old scripts to create an empty database and re-import 2-3 TB of data seems silly to me when it would be much, much more practical to practice doing backups and restorations).

I intended this topic to be the subject of my paper at Hotsos 2007, but I ended up not being able to attend the conference, and with that the paper never got written. :-)

2 recent blog posts *here and here) by Dominic Brooks got me to thinking about this topic again, so I’ll be posting on it over the next few days.  Perhaps with your feedback, I’ll eventually coalesce these posts into a decent paper on the subject.