RMOUG TD 2013

Last week I attended the Rocky Mountain Oracle User’s Group Training Days Conference in Denver, Colorado. RMOUG TD is one of the largest independent Oracle User’s Groups for Oracle database technology and draws a large number of Oracle Aces and Ace Directors, Oracle OakTable members and senior Oracle corporate technology leaders.

On Monday I attended a session by Cary Millsap from Method-R Corporation. In this full day session, Cary talked about how to use Oracle’s trace files to accurately diagnose performance problems related to specific business transactions by profiling the execution of database activity at the detailed call level. Using a variety of tools, Cary sliced and diced Oracle trace files to pinpoint problems related to disk, network and cpu performance for a variety of business transactions. Cary’s company, Method R, develops tools that make it easy to analyze trace files both from the command line as well as from within Oracle SQL Developer.

Monday evening I attended a VIP reception for Oracle Aces (having received by Oracle Ace designation in January), where I spent time talking to Kyle Hailey, a fellow OakTable member and Cornell alum, who has spent time as a Director at Oracle, Director of Performance Products at Embarcadero and is currently a Performance Architect and Technical Evangelist at Delphix. Delphix is a company that specializes in creating virtual copies of databases in order to provide massive storage savings and full-copy databases within organizations. It’s estimated that organizations make between 5 and 8 copies of their production databases, and Delphix provides software that reduces that number to 2, while enabling many “virtual” copies of the database to be used. A single physical copy is used as the basis for many virtual databases, all of which maintain only their changes against the base copy. Delphix’ technical team is top-notch, with former members of Oracle’s ASM storage team, RAC team and the Sun ZFS file-system development team. Delphix also provides a nice “timeflow-like” user interface influenced by Apple.

I believe the Delphix technology has significant benefits in providing full-copies of production data into QA, Test and Dev environments without requiring huge physical storage.

One concept I’ve been playing with is an 8-rack Exadata solution in which all of the storage cells are allocated to production servers within the 8-rack (for example, assume 5 racks are used for production, 2 racks for QA, and 1 rack for Development). Delphix provides a virtual dNFS-like mount point that could be used by the QA and Dev database servers to have full-copy of production data, while all of the expensive storage cells are fully utilized by Production.

Kyle told me their biggest barrier to adoption is the status-quo behavior of places that don’t think you can have full-copies of production in Dev and QA because it’s too hard today – Delphix makes that easy.

On Tuesday, I attended Karen Morton’s presentation on basic tuning for developers. I like to attend some of these introductory sessions to see what questions are asked and what information is being conveyed to new developers. As always, Karen does a nice job simplifying complex topics and making them clear in an introductory manner.

Tuesday lunch was “Lunch with an Oracle Ace” day, where conference attendees were given an opportunity to sit at tables with specific Oracle Aces. I had a table with my name on it full of database developers and had an engaging lunch answering questions.

After lunch I was invited to be a panelist on the Oracle Women in Technology panel (along with Karen Morton from Enkitec, Maria Colgan from Oracle, and Debra Lilley from Fujitsu. I was the male representative on the panel and was asked to talk about the value of diversity and my experiences in the Oracle field (where I’ve worked for over 23 years now).

Wednesday morning I attended 2 sessions – one by Tyler Muth from Oracle Corporation on Exadata tuning. Tyler did an excellent presentation on graphing Exadata performance results using R. After Tyler I attended a presentation on using GoldenGate with Exadata from the folks at Enkitec.

In the afternoon I gave my 2 presentations – one on Fuzzy String Matching Algorithms (based on phonetic matching) and the other on Recursive SQL Constructs – both were well received.

Pascal matrix SQL puzzle solution

I’ve been impressed with the solutions to my little problem of generating a symmetric Pascal matrix using SQL. Charles Hooper in particular has provided some very nice commentary on the problem, complete with diagrams and 2 alternative solutions.

I thought I’d walk through my solution in order to explain my thought process and see if it resonates with anyone.

Usually when I think about generating rows with SQL I think about the Oracle “trick” of using the CONNECT BY clause to generate rows:

select level from dual connect by level <= k

You’ll see a lot of constructs like this in the solutions provided by commenters.

However, lately I’ve been tending more toward the ANSI SQL Recursive construct instead, which works in Oracle and many other SQL databases:

with f (n) as (
select 1 from dual
union all
select n+1 from f where n < k
)
select n from f;

I particularly like the way this statement works, even if it’s technically not recursive, as it defines rows in terms of prior rows, which leads to all kinds of unique possibilities of ways to construct new rows.

I think the symmetric Pascal matrix lends itself to this kind of technique especially because it involves factorials.
Charles (and others) have used the Wikipedia definition of the Pascal matrix for each cell entry which is Aij = (i+j-2)! / ((i-1)!(j-1)!). And most of the solutions have used a clever way to calculate the necessary factorials:

select exp(sum(ln(level))) from dual connect by level <=n 

However, the neat thing about factorials is that they are easily defined in recursive terms:

f(0) = 1
f(n+1) = (n+1) * f(n)

And this definition maps really nicely to the ANSI SQL Recursive construct:

with f(n,nf) as (
select 0,1 from dual
union all
select n+1, n+1 * nf from f where n < k
)
select n,nf from from f;

Given this idea of using the Recursive construct, I set about seeing if I could use it to solve my challenge.

The first thing I did was to construct the row (i) and column (j) indices, doing all of the rows for each column — to do that I started with a simple row generator:

with m(i) as (
select 1 from dual
union all
select i+1 from m where i < s
)
select i from m;

With s = 4 this generates a simple list from 1 to 4:

1
2
3
4

Next, I wanted to generate the columns, repeating the rows for each column as I moved from column to column. I did this by “re-setting” the row counter and incrementing the column counter after finishing each column:

with m(s,i,j) as (
select 4,1,1 from dual
union all
select 
s,
case when i < s then i+1 else 1 end,
case when i < s then j else j+1 end
from m where i < s or j < s
)
select i,j from m;

In this statement I included the size (s) in the initial query and “passed it down” each row in order to easily refer to it. This query produces:

1,1
2,1
3,1
4,1
1,2
2,2

4,4

It stops when both i and j reach 4.

Now that I’ve got the cell indicies I started to think about how to calculate the information necessary for the values. Based on the cell definition from Wikipedia, I need 3 values:

(i+j-2)!, (i-1)!, and (j-1)!

What’s interesting about the required values is that they look like they could be defined via information from the prior item since they refer to things like i-1 and factorial — both of which should be pretty easy to generate. Let’s try to do (i-1)! (which I’ll refer to as im1f or “i minus 1 factorial”)

with m(s,i,j,im1f) as (
select 4,1,1,1 from dual /* This is cell 1,1. (i-1)! is (1-1)! is 0! which equals 1 */
union all
select 
s,
case when i < s then i+1 else 1 end,
case when i < s then j else j+1 end,
case when i < s then im1f*i else 1 end
from m where i < s or j < s
)
select i,j,im1f from m;

Basically, this just multiplies the prior (i-1)! times i to get the new (i-1)!, and “resets” to 0! whenever we move to the next column. This produces:

1,1,1
2,1,1
3,1,2
4,1,6
1,2,1
2,2,1
3,2,2
4,2,6

4,4,6

We can get (j-1)! in a similar fashion:

with m(s,i,j,im1f,jm1f) as (
select 4,1,1,1,1 from dual
union all
select 
s,
case when i < s then i+1 else 1 end,
case when i < s then j else j+1 end,
case when i < s then im1f*i else 1 end,
case when i < s then jm1f else jm1f*j end
from m where i < s or j < s
)
select i,j,im1f,jm1f from m;

So far, so good — I’m liking this approach as each cell takes advantage of the prior cell’s information to simply get the necessary values. Now, how do I get (i+j-2)!

I started out seeing if I could simply get (i+j-2) or ipjm2:

with m(s,i,j,im1f,jm1f,ipjm2) as (
select 4,1,1,1,1,0 from dual
union all
select 
s,
case when i < s then i+1 else 1 end,
case when i < s then j else j+1 end,
case when i < s then im1f*i else 1 end,
case when i < s then jm1f else jm1f*j end,
case when i < s then ipjm2+1 else j end
from m where i < s or j < s
)
select i,j,im1f,jm1f,ipjm2 from m;

What struck me about this was then when i=1 then i+j-2=j-1, and I’m already calculating (j-1)!, so it looks like I can use it when I “reset” back to i=1 when switching columns. Let’s see what the pattern looks like for the factorial as we go “down” the column row by row:

(j-1)! for the first row
previous row value * (i+j-1), for subsequent rows

Which is easy for us to write into our query now as ipjm2f:

with m(s,i,j,im1f,jm1f,ipjm2f) as (
select 4,1,1,1,1,1 from dual
union all
select 
s,
case when i < s then i+1 else 1 end,
case when i < s then j else j+1 end,
case when i < s then im1f*i else 1 end,
case when i < s then jm1f else jm1f*j end,
case when i < s then ipjm2f*(i+j-1) else j*jm1f end
from m where i < s or j < s
)
select i,j,im1f,jm1f,ipjm2f from m;

Which now produces:

1,1,1,1,1
2,1,1,1,1
3,1,2,1,2
4,1,6,1,6
1,2,1,1,1
2,2,1,1,2
3,2,2,1,6
4,2,6,1,24
1,3,1,2,2
2,3,1,2,6
3,3,2,2,24
4,3,6,2,120
1,4,1,6,6
2,4,1,6,24
3,4,2,6,120
4,4,6,6,720

At this point, it’s simple to just put together the whole formula and display it as the value for each cell, noting that the initial value for every column (row 1) is just 1.

with m(s,i,j,im1f,jm1f,ipjm2f,v) as (
select 4,1,1,1,1,1,1 from dual
union all
select
s,
case when i<s then i+1 else 1 end,
case when i<s then j else j+1 end,
case when i<s then im1f*i else 1 end,
case when i<s then jm1f else jm1f*j end,
case when i<s then ipjm2f*(i+j-1) else j*jm1f end,
case when i<s then (ipjm2f*(i+j-1)) / (i*im1f*jm1f) else 1 end
from m where i<s or j<s
)
select i,j,v
from m;

Which works in Oracle 11g without any special Oracle functions or syntax to produce:

1,1,1
2,1,1
3,1,1
4,1,1
1,2,1
2,2,2
3,2,3
4,2,4
1,3,1
2,3,3
3,3,6
4,3,10
1,4,1
2,4,4
3,4,10
4,4,20

And also works perfectly well in PostgreSQL 9.1 with the addition of the “recursive” keyword:

with recursive m(s,i,j,im1f,jm1f,ipjm2f,v) as (
select 4,1,1,1,1,1,1 from dual
union all
select
s,
case when i<s then i+1 else 1 end,
case when i<s then j else j+1 end,
case when i<s then im1f*i else 1 end,
case when i<s then jm1f else jm1f*j end,
case when i<s then ipjm2f*(i+j-1) else j*jm1f end,
case when i<s then (ipjm2f*(i+j-1)) / (i*im1f*jm1f) else 1 end
from m where i<s or j<s
)
select i,j,v
from m;

And there you have it — an ANSI SQL version.

Now, while I think the code is pretty elegant, it does have a drawback — each cell value requires the prior cell value in order to be calculated. The cell calculations are NOT independent, which ends up making this a serial algorithm. Many of the other solutions in the comments compute the cell values independently of each other, which facilitates parallelizing the production if that’s necessary and should be a consideration when looking at the various techniques.

I hope you’ve enjoyed this rather long post describing my process for creating the query, and that such a process might be useful to you as you construct similar queries.

New SQL Puzzle

Sorry for the drought — to keep everyone’s mind fresh, how about a little puzzle?

Use SQL to create a symmetric Pascal matrix, with the output being (i,j,v). So that a Pascal matrix of size 4 would end up as:

1,1,1
2,1,1
3,1,1
4,1,1
1,2,1
2,2,2
3,2,3
4,2,4
1,3,1
2,3,3
3,3,6
4,3,10
1,4,1
2,4,4
3,4,10
4,4,20

Try to stay with ANSI SQL 2008 if possible — bailing out to other languages and functions is discouraged :-)

Efficiency is an interesting idea here, as it’s pretty easy to do this by brute force calculations of factorials for every entry — but where’s the fun in that?

Extra credit for solutions which can take in the matrix size as some sort of parameter.

There’s convincing and then there’s curiosity

This past week, Cary Millsap posted one of the ways he’s teaching his children to understand concepts and to prove to him that they understand the methods for solving problems by presenting them with a classic puzzle.

The puzzle talks about the length of a piece of string wrapped around a circle (in this case, the whole Earth), and how much longer it would need to be if it was raised 4 more inches off the ground? As his kids worked through the problem, Cary provided us with the neat answer in which you didn’t need to know the circumference of the Earth, or any other circle for that matter — the increase in length is a simple function applied to the additional height above the ground.

Near the end of Cary’s post, he talks about how the understanding of basic relationships and methods gives the ability to convince someone that you actually know what you are talking about and goes on to challenge you to demand the same information from people trying to sell you something.

It’s a wonderful post, and I encourage you to go read it — but I also want you to go beyond the original post a bit.

While Cary’s formula holds for circles, what about other kinds of shapes? What about simple regular polygons (where all sides are the same length)? Do they have the same kind of simple formula? Do you have the curiosity to try and see if you can determine methods for generalizing what you know, applying it, and demonstrating to others?

Convincing me makes me trust you with a particular problem, showing me you won’t stop there makes me trust you even further…

I spent the afternoon playing with the idea and I think I’ve found it — I came up with the following answers:

  • Equilateral triangle (3 sides) = 41.5692 inches
  • Square (4 sides) = 32 inches
  • Pentagon (5 sides) = 29.0617 inches
  • Hexagon (6 sides) = 27.7128 inches
  • Octagon (8 sides) = 26.5097 inches

Looks like it’s tending toward’s Cary’s answer for a circle — perhaps there is a formula after all?  :-)

Good Forever

Oracle 10.2.0.1 (I know, version out-of-date) on Windows 2003:

The other day one of my stored procedures error’ed out with:

ORA-01801: date format is too long for internal buffer

somewhere near a line in which I was doing the following assignment:

d := to_char(c.date_field,'YYYYMMDD');

Where d is of type varchar2, and c is a row from a cursor for loop, and date_field is a date column in the cursor

selecting the date field from the database didn’t look odd:

select date_field from table_name t where t.id = <id of failing row>;

’04-AUG-17′

However, selecting to_char did yield something odd:

select to_char(date_field,'YYYYMMDD') from table_name t where t.id = <id of failing row>;

’00000000′

Odd, no 04, no AUG and no 17.

At this point I could tell something was up with the data in the field, so I resorted to dump’ing it’s contents:

select dump(date_field) from table_name t where t.id = <id of failing row>;

Typ=12 Len=7: 220,117,8,4,1,1,1

Uh-oh — the first byte is supposed to indicate the century (subtract 100 from the value to get the century), and legitimate centuries stop at 99. This value indicates a century value of 120. For an “actual” date of:

August 4th, 12017

I guess what was weird to me was that the PL/SQL to_char reported an error, while the SQL to_char didn’t (instead it returned an odd ’0000000′ result).

Going forward, it was relatively easy to find rows like this in the table:

select id, dump(date_field) from table_name t where date_field > to_date('99991231','YYYYMMDD');

We actually left the bad data alone and popped a view on top of the table using LEAST / GREATEST to cap the value at 9999-12-31, which minimized the code impact to our PL/SQL routines.

Still and all, I can’t wait to hop into my time machine with Jules Verne and see what happens on that date.