Encyclopedia Spine Problem, one way…

From Perl, TMTOWTDI (There’s more than one way to do it).

I really owe a lot to the paper from Cornell by Todd Olson.  This has been a minor problem that I’ve been playing with for a while, and I finally got an hour or two to explore it.  I google’d hard for solutions to the commonality problem and Todd’s paper came right up when I looked for common leading substrings, which is really the “hard” part of this problem.  However, once I read the paper, I really needed to brush up on my regex skills — I was excited to try out the regex stuff in 10g, but I needed a refresher.

I shouldn’t have been surprised that the Oracle documentation is excellent on this, though.  As always, the Application Developer’s Guide – Fundamentals was a winner here.  Chapter 4 covers this in detail.  It’s funny how many of us don’t go back and read the fundamental and concept guides — sometimes I think we feel that we’re “experienced” and therefore don’t need to.  I always find new stuff in there, every time.

I loved all of the answers I received — VERY inventive.  Kudos to Jeremy Schneider for being the only one to tackle the paper and regex-style solution.  I’ve got to admit that many of the solutions seemed kinda convoluted to me.  One thing I strive for is KISS — Keep It Simple and Sane :-)   And so, my solution is as small as possible, using a small cheat in that I’m using a “magic” character “@”, but you can replace that with CHR(0) if it makes you feel better.

select nvl(nullif(regexp_replace(min(cn)||’@'||min(pn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,'\1\2′)||’…’,min(cn)||’…’),min(cn)) bfn,
nvl(nullif(regexp_replace(max(cn)||’@'||max(nn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,'\1\2′)||’…’,max(cn)||’…’),max(cn)) btn
from
(
select column_name cn,
ntile(15) over (order by column_name) bk#,
lag(column_name) over (order by column_name) pn,
lead(column_name) over (order by column_name) nn
from ac1
)
group by bk#
order by bk#

Encyclopedia Hint

…Just back from vacation, and I’m too wiped out to post my solution.  I’ve read some of the solutions — pretty inventive stuff.

Here are 2 hints to my solution:

This paper from Cornell.  (Go Big Red!)

And, I used 10g.

Encyclopedia Spine

With apologies to Donald J. Sobol.

Quiz time again.

This one comes from trying to create a directory into a result set using tabs or links, and mimicking what you see on a set of encyclopedias.  You know, A-B, C-D, etc.  But what I want to do is give a bit more information about what’s in each book, and mimic what usually happens with the letter S (Sa-Sm and Sn-Sz).

So, create a table like this:

create table ac1 as select distinct column_name from all_tab_columns;

And write a SQL statement to list a set of 15 book spines, each spanning the same number of columns and each spine displaying the most characters up until there is a difference between the spines.

For example, if the first book goes from A to BITS_SET and the next book starts with BLEVEL, the first book spine should say A… thru BI… and the second book spine should say BL… thru (whatever).

I have a solution, but I’m curious to see what you all come up with. 

As always, NO PL/SQL functions allowed (hey, that’s too easy).  You can use any version of Oracle you like — 8i, 9i, 10g, 11x, etc.

Here’s the output of my solution from my little XE database:

 

SPINE

SN

EN

A… thru BI…

A

BITS_SET

BL… thru COMMENT$…

BLEVEL

COMMENT$

COMMENT$T… thru DEFAULT#…

COMMENT$TEXT

DEFAULT#

DEFAULT$… thru ENDPOINT_N…

DEFAULT$

ENDPOINT_NUMBER

ENDPOINT_V… thru FRESHNESS_D…

ENDPOINT_VALUE

FRESHNESS_DATE

FRESHNESS_S… thru INUSE…

FRESHNESS_SCN

INUSE_FEATURES

INUSR… thru LOB_P…

INUSR

LOB_PROPERTY

LOB_R… thru MVL_OLDEST_PK…

LOB_READS

MVL_OLDEST_PK

MVL_OLDEST_PK_… thru OS…

MVL_OLDEST_PK_TIME

OS_USER_NAME

OT… thru PRECC…

OTHER

PRECCNT

PRECE… thru RESERVED…

PRECEDENCE

RESERVED

RESERVED1… thru SES…

RESERVED1

SES_ADDR

SET… thru SUBSTITUTION_STRING_02…

SETNUM

SUBSTITUTION_STRING_02

SUBSTITUTION_STRING_03… thru TRANSLATE_TO_I…

SUBSTITUTION_STRING_03

TRANSLATE_TO_ID

TRANSLATE_TO_L… thru Z

TRANSLATE_TO_LANG_CODE

ZIP

Hilarious Diff

No relation to Hilary Duff.

In my last post, I talked about how I like to remove non-technical work from my staff by giving them tools to do “mundane” things like check code into source code control automatically and how our DDL generation scripts do most of the work.

The scripts work well — they started life as select statements from the data dictionary views, then changed into selects from the sys.xxx$ tables for performance and then somewhere late in 8i I started to have trouble keeping up with all of the various object parameters and options (stuff like IOTs, partitioning, etc).

After we migrated to 9i I started playing with DBMS_METADATA in order to capture all of the options, mixing it in with the older select statements.  It helps that I’ve spent years playing with SQL*Plus and know a lot about things like the noprint option and using new_value.  (When I do presentations to developers about things to know about Oracle, I always spend time on SQL*Plus — heck it’s always there, just like vi).

It seems to work really well for us — we don’t use it for everything (for example, we don’t use it to get PL/SQL code — why bother when XXX_SOURCE is so easy to get at?).  But the table creation script works really well to get things like the DDL and grants/comments on the table (through get_dependent_ddl).  I mentioned before how I like to “inject” stuff into the DDL generation scripts, so for example, our table scripts do the following:

  1. Create REM lines with the source code control tags
  2. Create the table creation command
  3. Create a public synonym for the table if it exists
  4. Create the grant commands on the table
  5. Capture the comments on the table and its columns
  6. Add monitoring to the table
  7. Generate an initial command to analyze the table when its created (to enable STALE monitoring)

Our PL/SQL generation scripts insert the source code control comment directly into the generated code.

I know some people have wondered why DBMS_METADATA appears to be so slow — I don’t really notice it.  I mean, it’s not blazing fast, but it does just fine.  I recently generated commands for approx. 400 tables in 8 minutes.  Usually we only generate commands one or two at a time, so that’s fine for me.

Problems with Storage Options

One interesting problem with using these kinds of scripts for deployments between environments has to do with storage differences (different tablespaces and sizes, partitioning strategies, etc).  Many of the commercial tools I’ve seen that do schema diffs have options to “ignore storage options”.  I don’t like doing that.

I’m a big fan of making QA, Test and Dev databases EXACTLY the same as the production database — to the point of making sure we build out those databases by copying production into the other environments.  The benefits of this approach are HUGE:

  1. No differences in statistics to throw off execution plans
  2. No differences in tablespace layout, so DDL commands don’t need to change between environments
  3. No differences in init.ora settings to change executions plans or profiles

The objections I’ve heard to having full copies of production in development just don’t overweigh the benefits in my opinion:

  1. Storage costs.  You don’t need super-fast, huge-cached disk frames — you just need the same amount of space.  And with 1TB drives starting to come in under $300, “costs” here shouldn’t be an issue.  Yes I know the disks will be slower — and for the record, I like disks to be slower in development than in production.  I think that’s much better than having it the other way around :-)
  2. Memory costs.  Similar to storage costs, I think your development machines should have the same quantity of memory — at least enough to handle the db_cache.
  3. Data localization and security issues.  I take this issue more seriously than the previous ones.  You may have credit cards or production URLs in your system and you’ll need a way to localize or remove them in the development environments.  However, I think the effort here is well worth it.  And, you should write scripts to do this once and save them for each time you refresh production back into the other environments.
  4. Performance.  I’ve heard developers complain that having full copies of the database make their work go slower — too much data to deal with.  I wonder out loud how fast their code will be in production when it has to deal with “too much data” :-)

Source Code Control Filesystem Layout

One other problem we’ve encountered is dealing with the vast quantity of files my approach of single file per object generates and how to map that into a file system structure that can be checked in and out of source code control.  I also think the problem here is different between a packaged application and a corporate application.  During deployment of an incremental release to a corporate application, you only want files relevant to the incremental change.  A packaged application probably has the current, full schema definition and incremental change files from each preceding version to support upgrades.

I don’t have a good answer here yet.  Our corporate applications lack the initial schema definition scripts from December, 2000, so we don’t even have a placeholder for the “original” or “full” schema definition.  We’ve ended up creating separate directories for each incremental set of changes to ease packaging and referring to them during deployment — but I’m not satisfied with that approach.  I’ll need to think on that some more.

Next post…

In an upcoming post I’ll summarize my experience in trying to evaluate Oracle Change Manager / Change Management Pack for this use — in comparison to our “manual” scripts :-)

Superscript

…back to Change Management.

In my earlier post, I talked about the strong change management ethic at my current company.  It was interesting to see how my predecessor here adapted database schema evolution to that ethic.

I need to add here that I find “ancient” software development organizational ideas like “chief” or “surgeon” programming teams to be fascinating.  In particular because each model describes “super” programmers who end up being responsible for 80-90% of the code anyway due to shear talent and dominant personality.  Almost everywhere I’ve gone has had a local expert who seemed to qualify for that kind of title.  The relevance of ideas from Mills and Brooks is commented on here.

The biggest thing I’ve taken away from those ideas of “super” programmers is that, as a manager, I need to help every member of my staff become as close to “super” as possible.  That means removing as many non-productive, non-programming, non-learning tasks as possible.

Another way to put this is to find out ways to reduce the effort and work required to do non-programming tasks.

This is where the DDL generation scripts came into the picture.

When I arrived, every programmer hand-coded the scripts which were checked into the source code control system.  Everyone had to remember to include the source code control tags, and script names weren’t obvious.  Also, it wasn’t always clear that every changed object and/or procedure was being captured with every release.

To address those issues, I:

  1. Required that all code checked into the source code control system had to be generated via the DDL generation scripts.  We modified the scripts to include the source code control tags automatically.  We also did things like include primary key indexes, grants and initial stats analysis in table creation scripts.  We also injected source code control tags into the stored procedure and package code so that they could be seen when browsing the database.
  2. Modified the scripts to include the object type, schema name and object name in the generated file name.  So, a script that created the SCOTT.EMP table ended up being called mk-tbl-scott-emp.sql  (We used mk for created, rm for drop and ch for alter).
  3. Turned on DDL auditing for all objects in the development database.  This allowed us to double-check and make sure we captured and verified which objects were changed leading up to a release.

A note on DDL auditing — I like to try and have this turned on in every database I work with.  Development, Test, QA, Production, you name it.  I think it’s always useful to have a log of structure changes to the database — and why code up a special trigger when the database will track it automatically?  The only objection I’ve heard to this is space — and the space required is so ridiculously small.  Sometimes people complain about performance — and I say that if you have a system that relies on 1000′s of DDLs per day, you’ve got a system that needs work.  Who does DDL as part of an OLTP transaction?

Our biggest remaining challenge with this approach was how to create the “master” installation “super” scripts.  I’ll cover that in a later post…

(One other concern is the shear number of files this approach can generate.  If Oracle did this for the SYS schema, you’d have over 4,000 individual files.  On the other hand, this works well when you only need to re-create a single object without searching through 4-5 cat scripts to find the object definition.  Other systems use object definition files which are then read by specialized programs to generate installation scripts — that seems like a lot of work for an internal corporate application).