I know why it’s always the database

I got to this post via Elisa Gabbert and Jeff Hunter after doing my daily reading which started with Jeremy Schneider’s excellent post on a recent healthcheck.

Jeff sums it up best in wondering why, when an application is a complex combination of system components, is the database always blamed for poor performance?

I’ll tell you why.

People ask for data. And they don’t like the first set of data they get, so they ask again, and then they browse back and forth through the data.

And where is the data? In the database.

People don’t ask the network, or the disk, or the CPU or the memory for their data. And despite the amount of Javascript, PHP and Java code in their browsers and middle-tiers, they don’t ask their browser or application server either.

The database is responsible for serving data quickly. And since you’re the DBA, when the data doesn’t come back quickly enough, you’re asked to “look at the database”.

BTW, this is one of the reasons I’m a huge fan of constant monitoring — I like to see if the database is acting differently from when “everything was ok”. This requires some kind of historical view or “baseline”. Some way to look back in time to yesterday, or last week at this time, or last month during the monthly close, etc. If the profile during the “slow response” is the same as it was during the “fast response”, generally there’s nothing wrong with the database overall — there may be something going on with that particular request, but you know how to fix those.

And, unfortunately, the database is ridiculously well instrumented, and you’ve convinced everybody that you can quickly see everything going on in the database. Oracle’s event model has really no counterpart on the other parts of the application stack.

Check out Jeremy’s post again — look at how “easy” it was to see that AIX was hopelessly mis-configured and was using a ton of swap space.

I’ve never really liked the idea of swap space — Why would you ever want anything at all swapped out of memory on a database server? Do you monitor memory usage on your PC? What happens when you start using swap space? You go to the Kingston web site and purchase more memory, that’s what. I used to get annoyed at H/W O/S vendors when I’d ask why we needed so much swap. My favorite answers:

  1. Just in case
  2. We’ll crash / panic otherwise
  3. We “pre-swap” code text images to speed up swapping
  4. The system becomes more efficient during non-peak usage as it can handle more processes then as long as they’re idle

This is nuts.

Another fun excuse I’ve heard is when there is significant data caching at the middle tier and web tier — “it takes a long time for the database to give us data to fully populate the cache, so queries are slow”.

This is why I advocate SIMPLE architectures for data-intensive applications. Fewer non-database places where things can go wrong. Fewer times that the DBA gets asked about “why isn’t my data coming back fast enough” and has to spend time finding out which part of the application stack isn’t pulling its weight.

3 Responses to “I know why it’s always the database”

  1. Tyler Muth Says:

    You’ve just described one of the main reasons Application Express is so successful.

  2. Chen Shapira Says:

    Unless, of course, you are a DBA and then it is “always” either the network, the storage or the OS configuration :-)

    My application developers are very nice and they share the blame between the database and the load balancers.

  3. Jeff Hunter Says:

    Well, I guess I didn’t make it clear that I was talking about application developers and not end users. In a properly designed system, the end user only sees the presentation layer. The user doesn’t know the difference, he just knows he can’t get his work done.

    Application developers, on the other hand, have the ability to go in and actually do some investigation and see the root cause of the problem. If their code has proper instrumentation, then they will be able to point out that they aren’t getting data returned quick enough. Just because they aren’t getting data fast enough doesn’t mean it’s a database problem. Maybe the middle-tier is swapping. Maybe the http server is out of CPU. Maybe somebody is copying a 20G .mp3 file over the network. Doesn’t matter, but it’s not the database.

    I 100% agree with you, though, that you have to know your systems and what “normal” is. When you know “normal”, you can point out “slow”.

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder