Of Wine and FishJuly 16th, 2012 — ddelmoli
“Data matures like wine, applications like fish”
Which, near as I can tell, came from an Open Source Developer’s Conference in Brisbane, 2009 at which Andy talked about, of all things, “Change in Database Schemas and Source Code“.
I’ve dropped Andy an email to see if his presentation is online anywhere, since it touches the topic that is near and dear to my heart.
In this post, though, I’d like to address some of the humor behind the quote — the implication that data gets better as it ages, while applications get worse (and start to smell like stinky fish).
Having worked with old data and old applications, I’m not sure I agree with the sentiment. Imagine the following:
“Mr. Hawkins, I need to you do some analysis on this music file I have — I want to know how many times C immediately follows A in this song.”
“No problem, Mr. Silver, I’ll get right on it — where’s the data and how do I read it?”
“Here’s the data file, Mr. Hawkins”, Mr. Silver hands Jim a grooved red plastic disk with a small hole in the middle of it, the faded words “78 RPM” written on the attached paper label. “And here’s the application code that reads the data file”, Mr. Silver bends over and grunts to lift a oddly shaped box with a huge bell and crank attached to it.
“Good luck, Mr. Hawkins! Let me know when you’ve finished that analysis!”
In my little story, both the data and application code have become ancient and almost unusable, leading me to another quote, this time from Kurt Bollacker:
“Data that is loved tends to survive”
It’s that aspect of loving your data (nod to my friends at Pythian and the estimable @datachick Karen Lopez) that keeps me interested in the efforts to make data transformation and evolution agile and easier.
There’s a balance to be struck in keeping data fresh and usable — it doesn’t just get better with age, but rather needs to be continually assessed against use cases in order to keep it useful. Applications need the same attention too, lest they start to smell like last week’s catch.
The trick is to minimize the effort in keeping both fresh — some of you may recognize this as minimizing technical debt. Really good software and data engineers (and perhaps this should be the responsibility of the software and data architects) constantly assess data and code against current and future use cases. If they’re smart, they invest in changes which reduce current and probable future technical debt on an ongoing, even agile, basis. The extra challenge for the data engineer is to balance this need not only for individual applications and use cases, but to discern ways to leverage data as is while not placing too much burden on the applications.