Encyclopedia Spine Problem, one way…
May 25th, 2007 — ddelmoliFrom Perl, TMTOWTDI (There’s more than one way to do it).
I really owe a lot to the paper from Cornell by Todd Olson. This has been a minor problem that I’ve been playing with for a while, and I finally got an hour or two to explore it. I google’d hard for solutions to the commonality problem and Todd’s paper came right up when I looked for common leading substrings, which is really the “hard” part of this problem. However, once I read the paper, I really needed to brush up on my regex skills — I was excited to try out the regex stuff in 10g, but I needed a refresher.
I shouldn’t have been surprised that the Oracle documentation is excellent on this, though. As always, the Application Developer’s Guide - Fundamentals was a winner here. Chapter 4 covers this in detail. It’s funny how many of us don’t go back and read the fundamental and concept guides — sometimes I think we feel that we’re “experienced” and therefore don’t need to. I always find new stuff in there, every time.
I loved all of the answers I received — VERY inventive. Kudos to Jeremy Schneider for being the only one to tackle the paper and regex-style solution. I’ve got to admit that many of the solutions seemed kinda convoluted to me. One thing I strive for is KISS — Keep It Simple and Sane :-) And so, my solution is as small as possible, using a small cheat in that I’m using a “magic” character “@”, but you can replace that with CHR(0) if it makes you feel better.
select nvl(nullif(regexp_replace(min(cn)||’@'||min(pn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,'\1\2′)||’…’,min(cn)||’…’),min(cn)) bfn,
nvl(nullif(regexp_replace(max(cn)||’@'||max(nn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,'\1\2′)||’…’,max(cn)||’…’),max(cn)) btn
from
(
select column_name cn,
ntile(15) over (order by column_name) bk#,
lag(column_name) over (order by column_name) pn,
lead(column_name) over (order by column_name) nn
from ac1
)
group by bk#
order by bk#
May 25th, 2007 at 11:40 am
Well it took me a little while to figure out that regex… but now that I’ve got it, I suddenly feel that I did more work than I needed to. Very cool, much more elegant than my enormous query.
May 25th, 2007 at 11:56 am
Oh now I see it, method #9 in the paper.
May 25th, 2007 at 12:21 pm
Wow, I learned something today! I didn’t know regexps could use backreferences inside the matching pattern. I thought they could only be used in the replacement text.
I loved the idea of an XOR-based solution (method #12), but between UTL_RAW.bit_xor being PL/SQL (which was against the rules) and the difference between characters and bytes (and UTF, oh my) it never became as beautiful as I’d hoped.
May 25th, 2007 at 2:42 pm
very impressive. Regular expressions look a lot like cuneiform writing to me - but they seem to be the greatest thing since sliced bread.
Regards
Martin Preiss
May 25th, 2007 at 3:21 pm
I can’t claim to be a regex expert — they’re not that intuitive to me yet. I also found this resource to be useful:
http://www.regular-expressions.info/
May 26th, 2007 at 6:34 am
Nice. (pronounced with a lot of tongue in cheeck, British humour). Although I do not pretend I have fully grasped the use of the reg_exp_replace string.
It made me think though what ‘modern’ uses could be found for presenting data in an Encyclopedia Spine style. Any ideas?
May 26th, 2007 at 8:30 am
The idea came from when I was at Network Solutions and they were looking at ways to present someone with their list of domains. If someone had a lot of domains, they wanted to provide them with tabs into their portfolio. Using letter tabs like A,B,C,D wouldn’t work well if all of their domains began with one or two letters. I thought it would be neat to have all tabs with the same number of domains, and yet conserve space on the tab labels…