Vincent Barbarino and the Book of Database Refactoring

Welcome back.  I realize this blog has been dead for a long time, but like Gabe Kotter, I’ve decided to return to my roots a bit and fire this blog back up.

Those of you that have spoken to me lately or have been following me on Twitter know that I’ve been pretty passionate about database design and development processes.  I’ve gotten to the point where I’ve almost never seen a database where the physical infrastructure is perfect, so ongoing changes to database structures and schemas are just a fact of life.

It’s managing that process of changing the database layout that’s been getting me all worked up lately – even when folks know what to change and how they’d like to change it, they don’t have tools or processes to introduce change into the database system in a traceable manner.

In one of my prior jobs, we were adamant about change-tracking all of our database changes – we used a rigorous change control and CM process based on object-change by object-change individual script files.  Little did I know at the time that what we were practicing was a form of database refactoring…

I had thought that almost everyone understood the importance of maintaining traceability for database changes, but I’ve recently encountered situations where the lack of standards and tools means that changes are applied to databases in a haphazard fashion.  While searching for some good arguments to use when introducing this concept, I came across a book by Scott Ambler and Pramod Sadalage entitled “Refactoring Databases: Evolutionary Database Design”.

Immediately I was happy with the concept: a whole “text-book” of how and why you need to manage the process of database structural change management.  In the remainder of this post, I’ll be reviewing and commenting on this book.

Before I begin, I think it’s interesting to look at the review quotes in the front of the book.  In some ways I wonder if folks know who this book is for – most of the quotes seem to patronize “data-professionals”  saying it’s high time that they joined the modern world in embracing agile development techniques.  References to “strong-armed DBAs” holding back projects seem a little harsh.

And yet.

I continue to believe that the jack-of-all-trades DBA moniker is mostly to blame for the sad state of database physical design today.  Ask folks what the primary task of a DBA group is, and chances are you’ll be told that it’s backup and recovery, not physical database design and construction.  I even have a hard time with the term database development as I don’t really feel like I’m writing code when I’m doing physical database layout.  I’ve been advocating the split of the DBA term into Database Operator (traditional DBA), Database Engineer (physical database designer and constructor) and Database Developer (stored procedure and SQL writer).

Using my terms, this book is best targeted at the Database Engineer and Developer.

What’s funny to me about the opprobrium heaped upon DBAs by agile developers is that I don’t think it’s a criticism of database technology in and of itself – but rather frustration with being forced to work with database professionals who lack the temperament, skills and experience to do database engineering and development.  Let’s face it, a conservative operations DBA is (rightly) concerned primarily with system availability and reliability through ensuring proper backups and minimizing potential workload on the server.  These are the DBAs who prefer to have hundreds of small independent databases in production all running at 2% utilization because it plays to their strengths.

It’s far harder to manage a large, multi-application tenant database running at 30-40% utilization experiencing continual structure changes – and that’s where this book starts to help.

The Preface has a nice section on “Why Evolutionary Database Development?” which starts us off into understanding why its necessary to resist the desire to have a full and complete set of logical and physical models before performing database development.  Early in my career I participated in efforts to create so-called Enterprise Data Models – which, being constructed by ivory-tower oversight and governance groups lacked any sort of applicability to business and mission requirements.  And sadly, were out-of-date even when they were eventually completed.  The book authors do a nice job of highlighting the benefits of the incremental approach, and also caution folks about the potential barriers to its adoption.  In particular they point out the severe lack of tools supporting database SCM (this is written in 2006).

They also mention the need for database sandbox environments – they suggest individual developers get their own databases to experiment with.  I’m not a big fan of this approach – I prefer a single development database that allows me to host a lot of data, with each developer getting their own schema to play around in.  I also ALWAYS enable DDL auditing in ALL of my databases – that way I can track potential changes that might need to be promoted to the next environment (I also get to validate that my changes were applied to the higher environment – and, as icing on the cake, I can trap dumb ideas like embedding DDL statements inside transactional operations).

Chapter 2 introduces the concept of Database Refactoring, with a quick introduction on refactoring in general (“a disciplined way to restructure code in small steps”).  The authors do a nice job of pointing out that database refactoring is conceptually more difficult than code refactoring – that code refactoring only needs to maintain behavioral semantics, while database refactorings must also maintain informational semantics (pg. 15).  The emphasis here includes the ability to introduce change in a transitional way that allows for multiple applications and multiple versions of applications to continue to run against the same database.  A simple example of moving a column from a parent table to a child table is also included.

In section 2.3, the authors categorize database refactorings into 6 broad categories: Structural (modifying table definitions), Data Quality (think check constraints), Referential Integrity (capturing rules that might currently be maintained by application code), Architectural (transferring common logic from applications into database procedures to increase their usefulness), Method (stored procedure refactorings), and Non-Refactoring Transformations (evolving the schema to handle new data concepts).

They also introduce the idea of indicators that your database may require refactoring – they call them “Database Smells” :-)

These include common problems like multipurpose columns, multipurpose tables, redundant storage of data items, overloaded columns, and fear of altering the database because it is too complex.

In section 2.6, the authors explain how it is easier to refactor your database schema when you decrease the coupling between applications and the database – through concepts like persistence layers.

Chapter 3 walks you through the basics of a database refactoring process – including giving you a list of process steps.  It also includes some good checks on determining whether or not the change is necessary and worth the effort.  Finally, they talk about version control and visibility.

Chapter 4 is pretty short, and deals with deploying or migrating changes from environment to environment.  This includes the concept of bundling changes together, scheduling and documenting deployments.  Finally, they discuss the actual deployment process, including defining and possibly testing your backout procedures.

In my environments, we’d break up these deployment items into 3 main groups: items that are pre-deployable (i.e., can be deployed ahead of time without affecting current applications), items that require application outages, and items that can be deployed “post-deployment” (perhaps cleanup activities that require the structure change, but aren’t required by the applications).

Chapter 5 discusses strategies (actually lessons learned) for successfully moving database refactorings through your development process, including implementing traceability for database changes, simplifying database change review processes, and hunting down and eliminating duplicate SQL.

The rest of the book, Chapters 6 through 11, goes through specific kinds of refactorings  (i.e., Introduce Calculated Column) along with basic pros/cons of each one and example SQL scripts (using the Oracle dialect).  It serves as a reference catalog of database change concepts and is useful from a delineation perspective.  I wish there was more meat in the pro and con section for each transformation, but in the end it’s a useful list.

Overall I thoroughly enjoyed the book and would recommend it for many development teams – project managers and developers should read at least the first 50 pages so as to understand how to integrate database development into the overall project plan.  Traditional DBAs supporting development teams absolutely must read this – if only to enhance their ability to interact and fully support development activities.

That’s all I have for now – look for shorter, more incisive posts in the future!

- Dom.

6 Responses to “Vincent Barbarino and the Book of Database Refactoring”

  1. Dominic Brooks Says:

    Good to see you back in the blogging saddle.

    I’ve got this book – it’s not bad, not bad at all.

    I like what you say about a good way of splitting up the traditional DBA role.

    Trouble is I’ve been doing “agile” for some years and I’ve not seen agile database development done properly anywhere.

    All too often you see either the non-database developers doing the database design and frequently getting it wrong or paying no attention to scalable database design, and/or traditional dbas who are not brought into the agile process and won’t empower database engineers with the appropriate privilege.

    And the other problem is that good database design is absolutely dependent, I believe, on having an idea of the big picture which many implementations of agile encourage you to ignore.

  2. Kevin Fries Says:

    Good book, but see my last comment at the bottom.

    I work on a COTS application which gets modified and gets dozens, if not hundreds of custom tables and a ridiculous number of indexes. Most of it is from not following the data model set up by the vendor. So even when you explain how the data should be organized and queried, I’ve found developers creating indexes that actually make performance worse.

    But most of the issues are with the code being written and perversely using the code to determine the custom tables and indexes or ignoring the data model.

    I’ve found that one of the reasons for bad design are due to refusing to plan it properly and a desire to put something behind them. The results generally don’t really show until the data volume grows to a sufficient extent and that takes time. So when the scalability issues arise, the issue is too often pushed off to someone who is reluctant to make necessary changes. The “start coding and I’ll get the specs later” is too often the case.

    What’s needed is a book to educate the layman. So that’s the book I’d love to see pushed into the hands of the business people who can enforce the changes.

  3. Ellis R. Miller Says:

    All interesting comments/insights with regard to database design and refactoring as part of the overall software development life cycle and going watch for the book on Safari online.

    Comments on Kevin’s last couple of insights: capacity planning is almost a lost art these years (odd, too, as just 8 to 10 years ago it was all the rage;) That typed, couldn’t agree more with regard to lack of planning for RDBMS lifecycle, itself, as it is integral to life cycle of any software application.

    In terms of a book for the lay business person…like the optimism yet as I’ve watched tech become an critical part of core business (over 30 contracts and more than a dozen industries/sectors) haven’t witnessed the business side evolving in their understanding or appreciation of software development.

    Started with a BS in Finance and Accounting then migrated to CS in graduate school. Having started as a somewhat more enlightened version of the modern Businessman with some initial insight into successful IT project management (small team of us delivered Oracle data warehouse $800K under budget and 3 months ahead of schedule) would offer this personal opinion: business will fully appreciate tech as (1) IT professionals continue to develop their business/communication skills and (2) the more savvy, interested (in the career path) technical professionals migrate into management and similar to the Google model are “allowed” to spend 25% of their time coding, etc.

    Fact is, typical Fortune 500 Business Managers STILL lack any real insight into managing IT professionals and lacking at least a boot camp where they have the opportunity to program until 3am on Sunday only to return to work on Monday and be asked to “is it done yet” they will always lack both the expertise and the empathy and the latter is just as critical in many respects.

    I’m betting on IT professionals eventually displacing many/most Fortune 500 business managers=) Until then, at least based on my personal experience, not even sure middle (business) managers read;)

  4. Log Buffer #202, A Carnival of The Vanities for DBAs | The Pythian Blog Says:

    [...] Bloggers across the globe are busy in posting their innovations and rants. Dominic Delmolino reviews Database Refactoring book and muses about design and the different roles of DBAs on his weblog. [...]

  5. Dominic Brooks Says:

    There’s another aspect about modern software development with respect to the database – that is commodity developers.

    I just don’t think you can necessarily plug and play developers in any non-trivial code or business-heavy code (and the data model is business-heavy as are other bits of course).

    It’s pretty easy to get swap in any developer with negligile business knowledge to code some sort of GUI action – a button, etc. The same does not apply to all code, however most managers do not get this.

  6. Prod DBA 2.0 says no** « OraStory Says:

    [...] DBA 2.0 says no** Dominic Delmolino has made a welcome return to regular blogging and it’s good to see that I’m not the only one forgetting the [...]

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder