Vincent Barbarino and the Book of Database RefactoringSeptember 8th, 2010 — ddelmoli
Welcome back. I realize this blog has been dead for a long time, but like Gabe Kotter, I’ve decided to return to my roots a bit and fire this blog back up.
Those of you that have spoken to me lately or have been following me on Twitter know that I’ve been pretty passionate about database design and development processes. I’ve gotten to the point where I’ve almost never seen a database where the physical infrastructure is perfect, so ongoing changes to database structures and schemas are just a fact of life.
It’s managing that process of changing the database layout that’s been getting me all worked up lately – even when folks know what to change and how they’d like to change it, they don’t have tools or processes to introduce change into the database system in a traceable manner.
In one of my prior jobs, we were adamant about change-tracking all of our database changes – we used a rigorous change control and CM process based on object-change by object-change individual script files. Little did I know at the time that what we were practicing was a form of database refactoring…
I had thought that almost everyone understood the importance of maintaining traceability for database changes, but I’ve recently encountered situations where the lack of standards and tools means that changes are applied to databases in a haphazard fashion. While searching for some good arguments to use when introducing this concept, I came across a book by Scott Ambler and Pramod Sadalage entitled “Refactoring Databases: Evolutionary Database Design”.
Immediately I was happy with the concept: a whole “text-book” of how and why you need to manage the process of database structural change management. In the remainder of this post, I’ll be reviewing and commenting on this book.
Before I begin, I think it’s interesting to look at the review quotes in the front of the book. In some ways I wonder if folks know who this book is for – most of the quotes seem to patronize “data-professionals” saying it’s high time that they joined the modern world in embracing agile development techniques. References to “strong-armed DBAs” holding back projects seem a little harsh.
I continue to believe that the jack-of-all-trades DBA moniker is mostly to blame for the sad state of database physical design today. Ask folks what the primary task of a DBA group is, and chances are you’ll be told that it’s backup and recovery, not physical database design and construction. I even have a hard time with the term database development as I don’t really feel like I’m writing code when I’m doing physical database layout. I’ve been advocating the split of the DBA term into Database Operator (traditional DBA), Database Engineer (physical database designer and constructor) and Database Developer (stored procedure and SQL writer).
Using my terms, this book is best targeted at the Database Engineer and Developer.
What’s funny to me about the opprobrium heaped upon DBAs by agile developers is that I don’t think it’s a criticism of database technology in and of itself – but rather frustration with being forced to work with database professionals who lack the temperament, skills and experience to do database engineering and development. Let’s face it, a conservative operations DBA is (rightly) concerned primarily with system availability and reliability through ensuring proper backups and minimizing potential workload on the server. These are the DBAs who prefer to have hundreds of small independent databases in production all running at 2% utilization because it plays to their strengths.
It’s far harder to manage a large, multi-application tenant database running at 30-40% utilization experiencing continual structure changes – and that’s where this book starts to help.
The Preface has a nice section on “Why Evolutionary Database Development?” which starts us off into understanding why its necessary to resist the desire to have a full and complete set of logical and physical models before performing database development. Early in my career I participated in efforts to create so-called Enterprise Data Models – which, being constructed by ivory-tower oversight and governance groups lacked any sort of applicability to business and mission requirements. And sadly, were out-of-date even when they were eventually completed. The book authors do a nice job of highlighting the benefits of the incremental approach, and also caution folks about the potential barriers to its adoption. In particular they point out the severe lack of tools supporting database SCM (this is written in 2006).
They also mention the need for database sandbox environments – they suggest individual developers get their own databases to experiment with. I’m not a big fan of this approach – I prefer a single development database that allows me to host a lot of data, with each developer getting their own schema to play around in. I also ALWAYS enable DDL auditing in ALL of my databases – that way I can track potential changes that might need to be promoted to the next environment (I also get to validate that my changes were applied to the higher environment – and, as icing on the cake, I can trap dumb ideas like embedding DDL statements inside transactional operations).
Chapter 2 introduces the concept of Database Refactoring, with a quick introduction on refactoring in general (“a disciplined way to restructure code in small steps”). The authors do a nice job of pointing out that database refactoring is conceptually more difficult than code refactoring – that code refactoring only needs to maintain behavioral semantics, while database refactorings must also maintain informational semantics (pg. 15). The emphasis here includes the ability to introduce change in a transitional way that allows for multiple applications and multiple versions of applications to continue to run against the same database. A simple example of moving a column from a parent table to a child table is also included.
In section 2.3, the authors categorize database refactorings into 6 broad categories: Structural (modifying table definitions), Data Quality (think check constraints), Referential Integrity (capturing rules that might currently be maintained by application code), Architectural (transferring common logic from applications into database procedures to increase their usefulness), Method (stored procedure refactorings), and Non-Refactoring Transformations (evolving the schema to handle new data concepts).
They also introduce the idea of indicators that your database may require refactoring – they call them “Database Smells”
These include common problems like multipurpose columns, multipurpose tables, redundant storage of data items, overloaded columns, and fear of altering the database because it is too complex.
In section 2.6, the authors explain how it is easier to refactor your database schema when you decrease the coupling between applications and the database – through concepts like persistence layers.
Chapter 3 walks you through the basics of a database refactoring process – including giving you a list of process steps. It also includes some good checks on determining whether or not the change is necessary and worth the effort. Finally, they talk about version control and visibility.
Chapter 4 is pretty short, and deals with deploying or migrating changes from environment to environment. This includes the concept of bundling changes together, scheduling and documenting deployments. Finally, they discuss the actual deployment process, including defining and possibly testing your backout procedures.
In my environments, we’d break up these deployment items into 3 main groups: items that are pre-deployable (i.e., can be deployed ahead of time without affecting current applications), items that require application outages, and items that can be deployed “post-deployment” (perhaps cleanup activities that require the structure change, but aren’t required by the applications).
Chapter 5 discusses strategies (actually lessons learned) for successfully moving database refactorings through your development process, including implementing traceability for database changes, simplifying database change review processes, and hunting down and eliminating duplicate SQL.
The rest of the book, Chapters 6 through 11, goes through specific kinds of refactorings (i.e., Introduce Calculated Column) along with basic pros/cons of each one and example SQL scripts (using the Oracle dialect). It serves as a reference catalog of database change concepts and is useful from a delineation perspective. I wish there was more meat in the pro and con section for each transformation, but in the end it’s a useful list.
Overall I thoroughly enjoyed the book and would recommend it for many development teams – project managers and developers should read at least the first 50 pages so as to understand how to integrate database development into the overall project plan. Traditional DBAs supporting development teams absolutely must read this – if only to enhance their ability to interact and fully support development activities.
That’s all I have for now – look for shorter, more incisive posts in the future!