Encyclopedia Spine Problem, one way…

From Perl, TMTOWTDI (There’s more than one way to do it).

I really owe a lot to the paper from Cornell by Todd Olson.  This has been a minor problem that I’ve been playing with for a while, and I finally got an hour or two to explore it.  I google’d hard for solutions to the commonality problem and Todd’s paper came right up when I looked for common leading substrings, which is really the “hard” part of this problem.  However, once I read the paper, I really needed to brush up on my regex skills — I was excited to try out the regex stuff in 10g, but I needed a refresher.

I shouldn’t have been surprised that the Oracle documentation is excellent on this, though.  As always, the Application Developer’s Guide – Fundamentals was a winner here.  Chapter 4 covers this in detail.  It’s funny how many of us don’t go back and read the fundamental and concept guides — sometimes I think we feel that we’re “experienced” and therefore don’t need to.  I always find new stuff in there, every time.

I loved all of the answers I received — VERY inventive.  Kudos to Jeremy Schneider for being the only one to tackle the paper and regex-style solution.  I’ve got to admit that many of the solutions seemed kinda convoluted to me.  One thing I strive for is KISS — Keep It Simple and Sane :-)  And so, my solution is as small as possible, using a small cheat in that I’m using a “magic” character “@”, but you can replace that with CHR(0) if it makes you feel better.

select nvl(nullif(regexp_replace(min(cn)||’@’||min(pn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,’\1\2′)||’…’,min(cn)||’…’),min(cn)) bfn,
nvl(nullif(regexp_replace(max(cn)||’@’||max(nn), ‘^([^@]*)(.)[^@]*@\1[^@]*$’,’\1\2′)||’…’,max(cn)||’…’),max(cn)) btn
from
(
select column_name cn,
ntile(15) over (order by column_name) bk#,
lag(column_name) over (order by column_name) pn,
lead(column_name) over (order by column_name) nn
from ac1
)
group by bk#
order by bk#

7 Responses to “Encyclopedia Spine Problem, one way…”

  1. Jeremy Schneider Says:

    Well it took me a little while to figure out that regex… but now that I’ve got it, I suddenly feel that I did more work than I needed to. Very cool, much more elegant than my enormous query. :)

  2. Jeremy Schneider Says:

    Oh now I see it, method #9 in the paper.

  3. Milo van der Leij Says:

    Wow, I learned something today! I didn’t know regexps could use backreferences inside the matching pattern. I thought they could only be used in the replacement text.

    I loved the idea of an XOR-based solution (method #12), but between UTL_RAW.bit_xor being PL/SQL (which was against the rules) and the difference between characters and bytes (and UTF, oh my) it never became as beautiful as I’d hoped.

  4. Martin Preiss Says:

    very impressive. Regular expressions look a lot like cuneiform writing to me – but they seem to be the greatest thing since sliced bread.

    Regards

    Martin Preiss

  5. ddelmoli Says:

    I can’t claim to be a regex expert — they’re not that intuitive to me yet. I also found this resource to be useful:

    http://www.regular-expressions.info/

  6. Eric Says:

    Nice. (pronounced with a lot of tongue in cheeck, British humour). Although I do not pretend I have fully grasped the use of the reg_exp_replace string.
    It made me think though what ‘modern’ uses could be found for presenting data in an Encyclopedia Spine style. Any ideas?

  7. ddelmoli Says:

    The idea came from when I was at Network Solutions and they were looking at ways to present someone with their list of domains. If someone had a lot of domains, they wanted to provide them with tabs into their portfolio. Using letter tabs like A,B,C,D wouldn’t work well if all of their domains began with one or two letters. I thought it would be neat to have all tabs with the same number of domains, and yet conserve space on the tab labels…

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder