Hilarious Diff

No relation to Hilary Duff.

In my last post, I talked about how I like to remove non-technical work from my staff by giving them tools to do “mundane” things like check code into source code control automatically and how our DDL generation scripts do most of the work.

The scripts work well — they started life as select statements from the data dictionary views, then changed into selects from the sys.xxx$ tables for performance and then somewhere late in 8i I started to have trouble keeping up with all of the various object parameters and options (stuff like IOTs, partitioning, etc).

After we migrated to 9i I started playing with DBMS_METADATA in order to capture all of the options, mixing it in with the older select statements.  It helps that I’ve spent years playing with SQL*Plus and know a lot about things like the noprint option and using new_value.  (When I do presentations to developers about things to know about Oracle, I always spend time on SQL*Plus — heck it’s always there, just like vi).

It seems to work really well for us — we don’t use it for everything (for example, we don’t use it to get PL/SQL code — why bother when XXX_SOURCE is so easy to get at?).  But the table creation script works really well to get things like the DDL and grants/comments on the table (through get_dependent_ddl).  I mentioned before how I like to “inject” stuff into the DDL generation scripts, so for example, our table scripts do the following:

  1. Create REM lines with the source code control tags
  2. Create the table creation command
  3. Create a public synonym for the table if it exists
  4. Create the grant commands on the table
  5. Capture the comments on the table and its columns
  6. Add monitoring to the table
  7. Generate an initial command to analyze the table when its created (to enable STALE monitoring)

Our PL/SQL generation scripts insert the source code control comment directly into the generated code.

I know some people have wondered why DBMS_METADATA appears to be so slow — I don’t really notice it.  I mean, it’s not blazing fast, but it does just fine.  I recently generated commands for approx. 400 tables in 8 minutes.  Usually we only generate commands one or two at a time, so that’s fine for me.

Problems with Storage Options

One interesting problem with using these kinds of scripts for deployments between environments has to do with storage differences (different tablespaces and sizes, partitioning strategies, etc).  Many of the commercial tools I’ve seen that do schema diffs have options to “ignore storage options”.  I don’t like doing that.

I’m a big fan of making QA, Test and Dev databases EXACTLY the same as the production database — to the point of making sure we build out those databases by copying production into the other environments.  The benefits of this approach are HUGE:

  1. No differences in statistics to throw off execution plans
  2. No differences in tablespace layout, so DDL commands don’t need to change between environments
  3. No differences in init.ora settings to change executions plans or profiles

The objections I’ve heard to having full copies of production in development just don’t overweigh the benefits in my opinion:

  1. Storage costs.  You don’t need super-fast, huge-cached disk frames — you just need the same amount of space.  And with 1TB drives starting to come in under $300, “costs” here shouldn’t be an issue.  Yes I know the disks will be slower — and for the record, I like disks to be slower in development than in production.  I think that’s much better than having it the other way around :-)
  2. Memory costs.  Similar to storage costs, I think your development machines should have the same quantity of memory — at least enough to handle the db_cache.
  3. Data localization and security issues.  I take this issue more seriously than the previous ones.  You may have credit cards or production URLs in your system and you’ll need a way to localize or remove them in the development environments.  However, I think the effort here is well worth it.  And, you should write scripts to do this once and save them for each time you refresh production back into the other environments.
  4. Performance.  I’ve heard developers complain that having full copies of the database make their work go slower — too much data to deal with.  I wonder out loud how fast their code will be in production when it has to deal with “too much data” :-)

Source Code Control Filesystem Layout

One other problem we’ve encountered is dealing with the vast quantity of files my approach of single file per object generates and how to map that into a file system structure that can be checked in and out of source code control.  I also think the problem here is different between a packaged application and a corporate application.  During deployment of an incremental release to a corporate application, you only want files relevant to the incremental change.  A packaged application probably has the current, full schema definition and incremental change files from each preceding version to support upgrades.

I don’t have a good answer here yet.  Our corporate applications lack the initial schema definition scripts from December, 2000, so we don’t even have a placeholder for the “original” or “full” schema definition.  We’ve ended up creating separate directories for each incremental set of changes to ease packaging and referring to them during deployment — but I’m not satisfied with that approach.  I’ll need to think on that some more.

Next post…

In an upcoming post I’ll summarize my experience in trying to evaluate Oracle Change Manager / Change Management Pack for this use — in comparison to our “manual” scripts :-)

5 Responses to “Hilarious Diff”

  1. Graham Oakes Says:

    Hi Dominic,

    another informative post with some great ideas. I did have a query about a couple of the points you made though

    >> No differences in statistics to throw off execution plans

    >> You donít need super-fast, huge-cached disk frames ó you just need the same amount of space

    Won’t slower disks on the dev/test/QA system result in different system statistics and therefore potentially different plans? Do you have a method to get around this (e.g. don’t collect system stats or load some values exported from production), or does it not make a difference in your experience?


  2. ddelmoli Says:

    You know, I haven’t thought too much about that. The flip answer is that if differences don’t change the execution plans, then it doesn’t matter. I’d also say that if they stats are proportionally the same (i.e., disk and cpu are BOTH 4x slower in Dev), then it shouldn’t matter. In the end, I guess I’d make sure to set them the same as production if possible.

  3. John Hurley Says:

    I too ( although much more recently than you apparently ) have done some hacking with dbms_metadata.

    Any chance you want to share the scripts your team has put together?

  4. ddelmoli Says:

    Sure, drop me a line and I’ll send ‘em.

  5. Feliz Says:

    I am new to the dbms_metadata world and I hope to learn about it as well as not have to reinvent the wheel. Can I also share in the scripts you have written? Much obliged!

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder