Of Wine and Fish

In my last post touching on my case for Data Engineers, my friend Greg Rahn provided a humorous quote about data from Andy Todd:

“Data matures like wine, applications like fish”

Which, near as I can tell, came from an Open Source Developer’s Conference in Brisbane, 2009 at which Andy talked about, of all things, “Change in Database Schemas and Source Code“.

I’ve dropped Andy an email to see if his presentation is online anywhere, since it touches the topic that is near and dear to my heart.

In this post, though, I’d like to address some of the humor behind the quote — the implication that data gets better as it ages, while applications get worse (and start to smell like stinky fish).

Having worked with old data and old applications, I’m not sure I agree with the sentiment. Imagine the following:

“Mr. Hawkins, I need to you do some analysis on this music file I have — I want to know how many times C immediately follows A in this song.”

“No problem, Mr. Silver, I’ll get right on it — where’s the data and how do I read it?”

“Here’s the data file, Mr. Hawkins”, Mr. Silver hands Jim a grooved red plastic disk with a small hole in the middle of it, the faded words “78 RPM” written on the attached paper label. “And here’s the application code that reads the data file”, Mr. Silver bends over and grunts to lift a oddly shaped box with a huge bell and crank attached to it.

“Good luck, Mr. Hawkins! Let me know when you’ve finished that analysis!”

In my little story, both the data and application code have become ancient and almost unusable, leading me to another quote, this time from Kurt Bollacker:

“Data that is loved tends to survive”

It’s that aspect of loving your data (nod to my friends at Pythian and the estimable @datachick Karen Lopez) that keeps me interested in the efforts to make data transformation and evolution agile and easier.

There’s a balance to be struck in keeping data fresh and usable — it doesn’t just get better with age, but rather needs to be continually assessed against use cases in order to keep it useful. Applications need the same attention too, lest they start to smell like last week’s catch.

The trick is to minimize the effort in keeping both fresh — some of you may recognize this as minimizing technical debt. Really good software and data engineers (and perhaps this should be the responsibility of the software and data architects) constantly assess data and code against current and future use cases. If they’re smart, they invest in changes which reduce current and probable future technical debt on an ongoing, even agile, basis. The extra challenge for the data engineer is to balance this need not only for individual applications and use cases, but to discern ways to leverage data as is while not placing too much burden on the applications.

3 Responses to “Of Wine and Fish”

  1. chet Says:

    I like the analogy of the record and record player. Had never thought of it that way.

    I guess it’s about the context, as you talk about. Does x still apply (10 years later)?

    Can you provide an example, a use case, of data that may have aged out? How would you compare this with the Big Data movement in capturing everything and sorting it out later?

  2. Karen Lopez Says:

    Thanks for the shout out. Love your data, indeed.

    I also made this:
    http://pinterest.com/pin/12807180160202332/

  3. ddelmoli Says:

    Chet,

    I’ve seen data suffer from lack of love — including having to access old mainframe-formatted fairly large data sets that suffer from lack of tools.

    In the Big Data world, I’m seeing an emphasis on simpler data structures that are hopefully self-describing as well as splittable (key point here) so that it’s easy to write / create data adapters or processors to access the data. Keeping the data simple, as well as enabling parallel transformation, supports the Big Data usefulness paradigm. Even in the high-volume OLTP Big Data world, I’m seeing an emphasis on very simple, “shardable” data structures (e.g., key/value pairs).

    In theory, these simple data structures make it easier for a multitude of uses through code.

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder