Encyclopedia Spine Problem, one way…May 25th, 2007 — ddelmoli
From Perl, TMTOWTDI (There’s more than one way to do it).
I really owe a lot to the paper from Cornell by Todd Olson. This has been a minor problem that I’ve been playing with for a while, and I finally got an hour or two to explore it. I google’d hard for solutions to the commonality problem and Todd’s paper came right up when I looked for common leading substrings, which is really the “hard” part of this problem. However, once I read the paper, I really needed to brush up on my regex skills — I was excited to try out the regex stuff in 10g, but I needed a refresher.
I shouldn’t have been surprised that the Oracle documentation is excellent on this, though. As always, the Application Developer’s Guide – Fundamentals was a winner here. Chapter 4 covers this in detail. It’s funny how many of us don’t go back and read the fundamental and concept guides — sometimes I think we feel that we’re “experienced” and therefore don’t need to. I always find new stuff in there, every time.
I loved all of the answers I received — VERY inventive. Kudos to Jeremy Schneider for being the only one to tackle the paper and regex-style solution. I’ve got to admit that many of the solutions seemed kinda convoluted to me. One thing I strive for is KISS — Keep It Simple and Sane :-) And so, my solution is as small as possible, using a small cheat in that I’m using a “magic” character “@”, but you can replace that with CHR(0) if it makes you feel better.
select nvl(nullif(regexp_replace(min(cn)||’@’||min(pn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,’\1\2′)||’…’,min(cn)||’…’),min(cn)) bfn,
nvl(nullif(regexp_replace(max(cn)||’@’||max(nn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,’\1\2′)||’…’,max(cn)||’…’),max(cn)) btn
select column_name cn,
ntile(15) over (order by column_name) bk#,
lag(column_name) over (order by column_name) pn,
lead(column_name) over (order by column_name) nn
group by bk#
order by bk#