Dressed the the nines

My good friend Moans (who posts to his blog about as often as I do), recently wrote up his current opinion on high availability here.  His post resonated with me, as I’ve had a fair amount of experience with system availability over the years.  While ruminating on his post, I dug up some excellent work on the subject from HP (one of the biggest proponents of 5nines during the First Internet Gilded Age).  This presentation by William Sawyer is particularly good, as well as this paper by Tzvi Chumash.

I’m particularly proud of the reliability of the system I’m currently a part of.  From a good design, to talented engineers, to very good operational folks, we have  a system that consistently achieves high availability.  One thing that I like about the system is that it isn’t very complex.  And it’s especially simple in its redundancy features.

I have to agree with most of the authors from above.  Designing, building and maintaining high-availability features can be very costly and error-prone in and of themselves.  A lot of ideas look good on paper, and sound good coming from the vendor, but in the end, your own people have to own and maintain them.  And if they can’t, for whatever reason, you’ve just wasted a ton of time and money.

This doesn’t mean that you don’t need higher levels of availability — it just means that you need to make sure you capture all of the costs.  That includes training and drills for the people responsible for the system.

High availability is achievable, but a realistic assessment of how to get and stay there really makes it easier.

(I also found this article interesting)

Well I’m takin’ my time, I’m just movin’ along

The old song by Boston just keeps playing in my head today, so I thought it would make a good post title.  It’s been a long time since my last post — crazy busy at work leading up to a week-long vacation in DisneyWorld.  I like vacationing at Disney, but it’s not the laid-back, sit by the pool with a good book kind of vacation.  Lots of clean air and exercise running around the theme parks.  With my kids getting older, we get to try more aggressive rides.  The Expedition Everest roller-coaster in Animal Kingdom is especially well done.

Anyway, I’ve got a bunch of post topics ready to roll, so on to today’s:

Have you noticed that some of the things you want to do in Oracle aren’t as well documented as you would like?  Heck, syntax diagrams are fine and all, but they don’t provide much context or advice or even information on conflicting parameters.  I know the options keep getting more and more advanced and complicated, and I’m not surprised that there just isn’t enough space or time to create exhaustive examples for every option.  And we only have so much time to experiment with them.

My latest foray into a new “feature” had to do with online table redefinition.  The relevant documentation (for 9i) is the Administrator’s Guide, Chapter 15 (Managing Tables),  Redefining Tables Online heading.  Actually, the documentation here is pretty good, but a little light on the technical details.  I needed to use this feature to “compress” a table after a set of archiving activities removed 50% of the rows from it.  (Yes, I know I don’t really need to do that, but I opened my big mouth again when someone said the delete would make things better — “Only if you reclaim the space to make full table scans faster”, I said — “When can you do that?” I got asked :-( )

Anyway, I started digging into online table redefinition, because (of course), we needed to do this activity without any outage whatsoever.  And anyway, I wanted to learn about OTR (online table redefinition) anyway.

So, basicially OTR creates a snapshot (oh, sorry — materialized view — MV) on the table you’re about to redefine — using a target table that you pre-create to receive the rows from the snapshot.  When you’re done, it syncs the delta rows using the standard snapshot refresh concept and then does the ole switcheroo on the table names to make the tables switch places.  I figured I’d do it to move the tables from one tablespace to another, thereby reclaiming the space from the deleted rows.

There’s a lot of work involved — you need to keep track of indexes, grants, constraints — everything that was on the base table.  And the docco isn’t very clear on when you want to put what on when.  In any event here’s what we did:

We wanted to use primary key method — that way we didn’t need the extra M_ROW$$ column hanging around on the target table.  But our source table didn’t have a primary key constraint (BTW, did I mention these were tables in a COTS 3rd-party application?).  Fortunately, it had a unique index on a non-null column — which quickly became declared as a primary key constraint :-)

So.  We created the target table as A_[table_name], created the necessary grants, altered the source table with a primary key constraint and got started.  One thing we learned is that you can’t have any MV’s on the original source table, so we had to drop those and save the creation scripts for them to re-run when we were done.

We started the process, which copied all of the rows — that was the “slow” step.  Once that was done, we added the unique index to the target table and did a re-sync.  We figured that the re-sync was primary key based (duh!) and so we wanted the index in place — good call.

Once that was done we created all of the other indexes (some 30-40 per table — yes, it’s a COTS application that begins with the letter S).  One problem — the index names needed to be different from the ones on the source table — otherwise we’d had namespace collisions — yuck.  So, all indexes get created with A_[index_name] as their new name.

We do another sync after all of the indexes are in place, and then finish the redefition.  But we’re not done yet!

What if there are hints referring to specific index names?  We need to rename all of the indexes — dropping or renaming the original ones first!  Reminder to self — you can’t drop an index if it’s being used to support a primary key constraint :-)

All in all, a good learning experience — especially the bit about the indexes and namespace collision problems.

I’ll have some more posts up later on, but I thought this was a good story…

Old indexes never die…

Recently I was asked to describe the “life” of an index where I work.  Here’s what I had to say:

Primary Key and Foreign Key columns are indexed at table creation.  We always try to make sure all foreign key columns are indexed.

Subsequent indexes are created in response to functional requests for queries where we determine that an index would be the best approach.  These kinds of indexes are often Function-Based usually for space reasons (or to keep the index as small as possible).  Sometimes we use normal indexes, sometimes we use parallel full table scans if the data distribution and physical distribution makes the index impractical.  We’ll also see if we need a histogram on the column at this point.  We try to avoid hinting to use the new index.  Occasionally we compress an index.  Sometimes we determine that we should rebuild the table as an IOT.

Indexes are usually destroyed during space analysis where we do periodic reviews of space to see what we can drop and/or reorganize/move.  For example, our tablespaces have gone from being DMTs, to uniform-extent LMTs, to auto-allocate LMTs.  During each transition we evaluate whether or not we still need the index (and/or some tables).  Usually we drop it if we can prove that the index, even if it was in use, was non-useful.  That is, selects via equally aren’t the best way to get at the data.

Finally we’ve been slowly partitioning our largest tables which are basically historical time based INSERT tables.  Here, all indexes are going local, so it’s another opportunity for index review.