## Pascal matrix SQL puzzle solution

June 16th, 2012 — ddelmoliI’ve been impressed with the solutions to my little problem of generating a symmetric Pascal matrix using SQL. Charles Hooper in particular has provided some very nice commentary on the problem, complete with diagrams and 2 alternative solutions.

I thought I’d walk through my solution in order to explain my thought process and see if it resonates with anyone.

Usually when I think about generating rows with SQL I think about the Oracle “trick” of using the CONNECT BY clause to generate rows:

select level from dual connect by level <= k

You’ll see a lot of constructs like this in the solutions provided by commenters.

However, lately I’ve been tending more toward the ANSI SQL Recursive construct instead, which works in Oracle and many other SQL databases:

with f (n) as ( select 1 from dual union all select n+1 from f where n < k ) select n from f;

I particularly like the way this statement works, even if it’s technically not recursive, as it defines rows in terms of prior rows, which leads to all kinds of unique possibilities of ways to construct new rows.

I think the symmetric Pascal matrix lends itself to this kind of technique especially because it involves factorials.

Charles (and others) have used the Wikipedia definition of the Pascal matrix for each cell entry which is Aij = (i+j-2)! / ((i-1)!(j-1)!). And most of the solutions have used a clever way to calculate the necessary factorials:

select exp(sum(ln(level))) from dual connect by level <=n

However, the neat thing about factorials is that they are easily defined in recursive terms:

f(0) = 1

f(n+1) = (n+1) * f(n)

And this definition maps really nicely to the ANSI SQL Recursive construct:

with f(n,nf) as ( select 0,1 from dual union all select n+1, n+1 * nf from f where n < k ) select n,nf from from f;

Given this idea of using the Recursive construct, I set about seeing if I could use it to solve my challenge.

The first thing I did was to construct the row (i) and column (j) indices, doing all of the rows for each column — to do that I started with a simple row generator:

with m(i) as ( select 1 from dual union all select i+1 from m where i < s ) select i from m;

With s = 4 this generates a simple list from 1 to 4:

1

2

3

4

Next, I wanted to generate the columns, repeating the rows for each column as I moved from column to column. I did this by “re-setting” the row counter and incrementing the column counter after finishing each column:

with m(s,i,j) as ( select 4,1,1 from dual union all select s, case when i < s then i+1 else 1 end, case when i < s then j else j+1 end from m where i < s or j < s ) select i,j from m;

In this statement I included the size (s) in the initial query and “passed it down” each row in order to easily refer to it. This query produces:

1,1

2,1

3,1

4,1

1,2

2,2

…

4,4

It stops when both i and j reach 4.

Now that I’ve got the cell indicies I started to think about how to calculate the information necessary for the values. Based on the cell definition from Wikipedia, I need 3 values:

(i+j-2)!, (i-1)!, and (j-1)!

What’s interesting about the required values is that they look like they could be defined via information from the prior item since they refer to things like i-1 and factorial — both of which should be pretty easy to generate. Let’s try to do (i-1)! (which I’ll refer to as im1f or “i minus 1 factorial”)

with m(s,i,j,im1f) as ( select 4,1,1,1 from dual /* This is cell 1,1. (i-1)! is (1-1)! is 0! which equals 1 */ union all select s, case when i < s then i+1 else 1 end, case when i < s then j else j+1 end, case when i < s then im1f*i else 1 end from m where i < s or j < s ) select i,j,im1f from m;

Basically, this just multiplies the prior (i-1)! times i to get the new (i-1)!, and “resets” to 0! whenever we move to the next column. This produces:

1,1,1

2,1,1

3,1,2

4,1,6

1,2,1

2,2,1

3,2,2

4,2,6

…

4,4,6

We can get (j-1)! in a similar fashion:

with m(s,i,j,im1f,jm1f) as ( select 4,1,1,1,1 from dual union all select s, case when i < s then i+1 else 1 end, case when i < s then j else j+1 end, case when i < s then im1f*i else 1 end, case when i < s then jm1f else jm1f*j end from m where i < s or j < s ) select i,j,im1f,jm1f from m;

So far, so good — I’m liking this approach as each cell takes advantage of the prior cell’s information to simply get the necessary values. Now, how do I get (i+j-2)!

I started out seeing if I could simply get (i+j-2) or ipjm2:

with m(s,i,j,im1f,jm1f,ipjm2) as ( select 4,1,1,1,1,0 from dual union all select s, case when i < s then i+1 else 1 end, case when i < s then j else j+1 end, case when i < s then im1f*i else 1 end, case when i < s then jm1f else jm1f*j end, case when i < s then ipjm2+1 else j end from m where i < s or j < s ) select i,j,im1f,jm1f,ipjm2 from m;

What struck me about this was then when i=1 then i+j-2=j-1, and I’m already calculating (j-1)!, so it looks like I can use it when I “reset” back to i=1 when switching columns. Let’s see what the pattern looks like for the factorial as we go “down” the column row by row:

(j-1)! for the first row

previous row value * (i+j-1), for subsequent rows

Which is easy for us to write into our query now as ipjm2f:

with m(s,i,j,im1f,jm1f,ipjm2f) as ( select 4,1,1,1,1,1 from dual union all select s, case when i < s then i+1 else 1 end, case when i < s then j else j+1 end, case when i < s then im1f*i else 1 end, case when i < s then jm1f else jm1f*j end, case when i < s then ipjm2f*(i+j-1) else j*jm1f end from m where i < s or j < s ) select i,j,im1f,jm1f,ipjm2f from m;

Which now produces:

1,1,1,1,1

2,1,1,1,1

3,1,2,1,2

4,1,6,1,6

1,2,1,1,1

2,2,1,1,2

3,2,2,1,6

4,2,6,1,24

1,3,1,2,2

2,3,1,2,6

3,3,2,2,24

4,3,6,2,120

1,4,1,6,6

2,4,1,6,24

3,4,2,6,120

4,4,6,6,720

At this point, it’s simple to just put together the whole formula and display it as the value for each cell, noting that the initial value for every column (row 1) is just 1.

with m(s,i,j,im1f,jm1f,ipjm2f,v) as ( select 4,1,1,1,1,1,1 from dual union all select s, case when i<s then i+1 else 1 end, case when i<s then j else j+1 end, case when i<s then im1f*i else 1 end, case when i<s then jm1f else jm1f*j end, case when i<s then ipjm2f*(i+j-1) else j*jm1f end, case when i<s then (ipjm2f*(i+j-1)) / (i*im1f*jm1f) else 1 end from m where i<s or j<s ) select i,j,v from m;

Which works in Oracle 11g without any special Oracle functions or syntax to produce:

1,1,1

2,1,1

3,1,1

4,1,1

1,2,1

2,2,2

3,2,3

4,2,4

1,3,1

2,3,3

3,3,6

4,3,10

1,4,1

2,4,4

3,4,10

4,4,20

And also works perfectly well in PostgreSQL 9.1 with the addition of the “recursive” keyword:

with recursive m(s,i,j,im1f,jm1f,ipjm2f,v) as ( select 4,1,1,1,1,1,1 from dual union all select s, case when i<s then i+1 else 1 end, case when i<s then j else j+1 end, case when i<s then im1f*i else 1 end, case when i<s then jm1f else jm1f*j end, case when i<s then ipjm2f*(i+j-1) else j*jm1f end, case when i<s then (ipjm2f*(i+j-1)) / (i*im1f*jm1f) else 1 end from m where i<s or j<s ) select i,j,v from m;

And there you have it — an ANSI SQL version.

Now, while I think the code is pretty elegant, it does have a drawback — each cell value requires the prior cell value in order to be calculated. The cell calculations are NOT independent, which ends up making this a serial algorithm. Many of the other solutions in the comments compute the cell values independently of each other, which facilitates parallelizing the production if that’s necessary and should be a consideration when looking at the various techniques.

I hope you’ve enjoyed this rather long post describing my process for creating the query, and that such a process might be useful to you as you construct similar queries.

June 22nd, 2012 at 4:01 am

[...] Rahn has a great post about Linux 6 Transparent Huge Pages and Hadoop Workloads.Dominic Delmolinoâ€™ve been impressed with the solutions to his little problem of generating a symmetric Pascal [...]