NEOOUG Ohio, ODTUG Kscope12 and something unique from Redgate

If you’ve known me for a long time, and let’s face it, if you’re visiting this blog, you probably do, you know that I’m particularly proud of the work I was able to do at Network Solutions as the Director of Database Engineering and Development.

I had the privilege of working with some of the best software development colleagues I’ve ever encountered, including Pete Fox (Vice President of Engineering), Brian Taylor (Release Management), Eric Faulkner (Product/Portfolio Owner), Mike Cocozza (Application Tier) and Mona Nahavandi (Presentation Tier), Sujata Nakhre (Billing) and Michelle Lee (CRM), as well as a host of other professionals in QA and Operations to bring the Network Solutions website and IT applications to life.

I didn’t know it at the time (2002-2007) because it wasn’t well known yet, but we practiced what I now know to be proto-Agile and proto-Helsinki development of a complex system that was timeboxed and data-centric. We regularly released features in three-month cycles based on release themes made up of capabilities and features which were managed and prioritized by product owners.

Recently I had the privilege of presenting at the Northeast Ohio Oracle User’s Group training day conference in Cleveland, Ohio and during the sessions I actually started to hear the stirrings of the Oracle community becoming more aware of Agile and Helsinki as ways to improve our ability to deliver quality Oracle-based software systems in a more rapid manner.

BTW, I couldn’t have been more impressed with the NEOOUG training day conference — I was highly surprised given that I had assumed attendees would be limited to North East Ohio, and that the conference itself would reflect a minimal ability to host a larger event.

I was wrong.

Held at Cleveland State University, I estimated the crowd at about 100-150 participants in state-of-the-art venues including university classrooms and ballroom spaces. Excellent wifi coverage was the icing on the cake.

I thought the quality of the presentations was pretty good as well — and I’ll definitely be returning in the future, as well as encouraging folks from the Midwest and Great Lakes regions to attend. The folks at NEOOUG told me they’re aiming to be as good as the RMOUG Training Days conference and I’d say they’re clearly on the right track.

As an aside, I think it’s really a good idea for Oracle Technology conferences to be more closely identified with academia through the use of university / college facilities as opposed to the Applications-focused conferences which are more of a vendor event — I think it more clearly aligns the content with the audience and venue.

I gave 2 presentations at the conference: Implementing MapReduce with SQL and PL/SQL; and An Examination of Schema Architecture Patterns for Edition-Based Redefinition.

I’m particularly proud of the MapReduce presentation — it’s evolved nicely since I gave it at RMOUG and I’ll be reprising it at ODTUG KScope12 in June as well, but I was really pleasantly surprised at how well the EBR presentation went.

It’s probably because any discussion of EBR naturally aligns itself with how to use it to support a development process, and so I got to discuss some of my methods for doing database development:

1. Agile-style timeboxed deliveries
2. Helsinki-style database APIs through stored procedures
3. Describing the database as a “data server” accessed via an API (lately I prefer a RESTful API exposed via the APEX listener), instead of a “database server” with tables mapped into client-side ORMs
4. Versioning of database objects — historically by appending version numbers to programmatic objects (packages and procedures), now by using Edition-Based Redefinition
5. Extracting definitions of database objects to the file-system for source-code control check-in
6. DDL auditing everywhere — Development, QA, Production — to track all database structural changes

#4, #5 and #6 have been areas of keen interest to me for over 7 years now — and I’m happy to say that tools and vendors have really been responsive to my requests for improvements. I can’t say enough good things about Kris Rice and Bryn Llewellyn from Oracle, who listen to my requests for features in SQL Developer and how I might use EBR — I’ve seen changes that make SQL Developer get better and better at #4 and #5 in particular – now you can generate individual scripts per object and use the new Cart feature for deployment.

Today I want to talk about another vendor looking to provide capability in #4 and #5 in a unique way: through a live opportunity to engage the Oracle developer community in a rapid feedback loop on features at the upcoming ODTUG KScope12 conference in San Antonio, TX on June 24-28, 2012.

Redgate Software has done a bunch of work in the SQL Server space for years and more recently has upped their game in the Oracle tools world with their Schema Compare, Data Compare and Schema Doc products, as well as hosting the AllThingsOracle.com website. And at KScope they’re looking to really interact with the development community on building Source Code Control for Oracle in a live, agile fashion.

Instead of a simple product demonstration booth, Redgate will be using their vendor space as a mini-development shop, soliciting features from conference attendees and using an agile development process to actualize those features into a prototype product for Oracle Source Code Control.

Needless to say, I’m excited about it — I have definite ideas on what works and what doesn’t work for Oracle source code control and I can’t wait to try and get those things implemented while seeing what other developers and DBAs also want.

If database development is at all important to you, this sounds like a great opportunity to get your ideas out there.

Recursive Subqueries for FizzBuzz

So in yesterday’s post I mentioned that I wasn’t happy with my solution. Among several reasons were the limit on the first 100 numbers, and the lack of using what I think to be a more elegant solution with recursive sub-queries.

Also, I received a comment about problems with the MarkDown plugin — which was also affecting my code posting, so I disabled MarkDown and can now handle code posting much, much better.

I couldn’t let it rest, so I dug into the recursive sub-queries with a vengeance and wrote up something more elegant.

with
fb0 (input) as (
select      'buzz,fizz,fizzbuzz,fizz,buzz,fizz,fizz' input
from        dual
),
fb2 (n,pos,n_start,n_str) as 
(
select      /* Anchor query of root nodes for recursion */
            fb1.column_value n, 
            1 pos,
            fb1.column_value n_start,
            to_char(fb1.column_value) n_str
from        fb0,
            table(sys.odcinumberlist(3,5,6,9,10,12,15)) fb1
where       /* Limit root nodes to ones that match the first word of input */
            case regexp_substr(fb0.input,'\w+')
              when 'fizz' then mod(fb1.column_value,3)
              when 'buzz' then mod(fb1.column_value,5)
              when 'fizzbuzz' them mod(fb1.column_value,15)
            end = 0
union all
select      /* Build "child" nodes by walking fizzbuzz matching to input string */
            case 
              when mod(fb2.n,15) in (5,9) then n+1
              when mod(fb2.n,15) in (3,10) then n+2
              when mod(fb2.n,15) in (0,6,12) then n+3
            end,
            fb2.pos + 1,
            fb2.n_start,
            fb2.n_str || ',' ||
            case 
              when mod(fb2.n,15) in (5,9) then n+1
              when mod(fb2.n,15) in (3,10) then n+2
              when mod(fb2.n,15) in (0,6,12) then n+3
            end
from        fb2,
            fb0
where       case 
              when mod(fb2.n,15) in (0,5,6,10) then 'fizz'
              when mod(fb2.n,15) in (3,9) then 'buzz'
              when mod(fb2.n,15) in (12) then 'fizzbuzz'
            end =
            regexp_substr(fb0.input,'\w+',1,fb2.pos + 1)
),
fb3 (input, n_str, n_length, n_start, min_length, min_start) as 
(
select      /* Measure lengths of resulting matches, discard ones that are not long enough */
            fb0.input,
            fb2.n_str,
            fb2.n - fb2.n_start n_length,
            fb2.n_start,
            min(fb2.n - fb2.n_start) over () min_length,
            min(fb2.n_start) over (partition by fb2.n - fb2.n_start) min_start
from        fb2,
            fb0
where       fb2.pos = nvl(length(regexp_replace(fb0.input,'\w')),0)+1
),
fb4 (input, n_str) as 
(
select      /* Retain the matches which are shortest and start at the smallest value */
            fb3.input,
            fb3.n_str
from        fb3
where       fb3.n_length = fb3.min_length
and         fb3.n_start = fb3.min_start
)
select      /* Display output */
            input,
            n_str
from        fb4;

What’s all the buzz about?

During the course of recent twitter / RSS reading, I came across an interesting post related to the old FizzBuzz problem. The post intrigued me for 2 reasons: it talks about solving a basic problem in Scala, and it emphasizes the use of functional programming. And just for grins and chuckles it calls out OOP based on an old paper from 1984 by John Hughes on Why Functional Programming Matters.

I’m a big fan of functional programming, even if my language of choice, SQL, is a poor example of it. Although, I tend to think functional when I write SQL, I know that the language is limited in that respect.

However, when I think about problems that need to be solved in an Oracle database system, I always try to exhaust SQL-based solutions before looking to PL/SQL (I’ve always enjoyed Tom Kyte’s mantra on the subject). It’s one of the reasons that systems I have influence over have very little imperative PL/SQL constructs.

I also strive to write modular, relatively self-explanatory SQL to combat the reputation of SQL as a “difficult” language — all languages are difficult when you are confronted with them :-)

Anyway, the post under discussion describes an inverse FizzBuzz problem with a description of working through various solutions using the function constructs of Scala, and many people have had a go at solving it.

One of my favorites is in F#, Microsoft’s functional language.

Anyway, here’s my attempt at it — I’m not entirely satisfied with it yet, but it was fun to write:

with 
fb0 as (
select       'fizz,buzz' input
from         dual
),
fb1 as (
select       level n,
             case
               when mod(level,3) = 0 and mod(level,5) <> 0 then 'fizz'
               when mod(level,5) = 0 and mod(level,3) <> 0 then 'buzz'
               when mod(level,3) = 0 and mod(level,5) = 0  then 'fizzbuzz'
             end fb,
             lead(level,1) over (order by level) next_n
from         dual
where        mod(level,3) = 0 or mod(level,5) = 0
connect by   level <= 100
),
fb2 as (
select       connect_by_root n n_root,
             level pos,
             fb1.n,
             fb1.fb
from         fb0,
             fb1
where        level <= nvl(length(regexp_replace(fb0.input,'\w'))+1,1)
start with   fb1.fb = regexp_substr(fb0.input,'\w+')
connect by   prior fb1.next_n = fb1.n
),
fb3 as (
select       listagg(fb2.fb,',') within group (order by fb2.pos) fb_str,
             listagg(fb2.n,',') within group (order by fb2.pos) n_str,
             max(fb2.n) - min(fb2.n) n_length,
             min(fb2.n) n_min
from         fb0,
             fb2
group by     fb2.n_root
having       listagg(fb2.fb,',') within group (order by fb2.pos) = fb0.input
),
fb4 as (
select       fb3.n_length,
             fb3.n_min,
             fb3.fb_str,
             fb3.n_str,
             min(fb3.n_length) over () min_length,
             min(fb3.n_min) over (partition by fb3.n_length) min_start
from         fb3
),
fb5 as (
select       fb4.fb_str,
             fb4.n_str
from         fb4
where        fb4.n_length = fb4.min_length
and          fb4.n_min = fb4.min_start
)
select       *
from         fb5;

I'm not entirely happy with it -- I'd like to combine fb2 with fb3, and fb4 with fb5. I'd also like to rewrite it with recursive subqueries instead of the connect by's and listagg's, but it's not too bad.

One of the things I'm starting to like with the WITH construct is how easy it is to debug the overall statement -- I can just replace the final SELECT with whichever subquery I want to test in order to see what results it's "passing" to the next subquery. Go ahead and try it to see!

The Social Development Database

I’ve always been fascinated by development databases — more so sometimes than huge, heavily utilized production ones. Mainly because I’ve seen how the beginnings of a performance problem, or the start of an elegant solution takes shape within a development database. It’s one of the reasons why I love high levels of visibility through full DDL-auditing within development. I love to SEE what database developers are thinking, and how they are implementing their ideas using specific shapes of data structures.

One of the concepts I’d love to see is a “river of news” panel within development tools to see what is going on within a development database. Some of the good distributed source code control systems do this now.

Here’s a good example of what I mean:

http://github-images.s3.amazonaws.com/blog/2011/mac-screenshots/commits-full.png

Imagine if there was a panel in SQL*Developer or TOAD that looked something like this:

This wouldn’t be all that hard to implement if full DDL-auditing was turned on…

The 3 Letters Which Can Put an American DBA to Sleep are NL…zzzzz

Pity us poor US DBAs — safely secure using our ancient, many-times-upgraded Oracle 6 databases with their US7ASCII character sets.

We knew that ASCII only covered 0-127, but who could blame us when we started putting WE8MSWIN1252 “international” characters into those fields — the database let us, and it felt kind of sexy putting in cool European characters with umlauts and accents on them.

Besides, all of our business was with other American companies, and if someone had some “funny” characters in their name, then they just had to change them!

Of course, all of this is said with tongue firmly planted in cheek.  Nowadays you’d better be able to handle Unicode in your database if you want to have a prayer of not being labeled as something older than teleprinters and typewriters.

I first encountered this situation when working with a US7ASCII database where we started using XMLTYPE columns — little did I know that XMLTYPE columns actually validated the character set of the XML document coming in — one of our fields was the country name.

Everything was fine until February 13th, 2004 — the day ISO added an entry for the Aland Islands… (which has an A with a diacritical ring above it).

We started seeing errors inserting our XML documents — all due to strict validation of the character set.  Did we change character sets?  No — we stopped using the XMLTYPE columns :-(

Fast forward a few years and now I’m lucky enough to work with proper databases created with the AL32UTF8 character set — so now I can store my friend Mogens Noorgard name correctly (or I would if I could spell it�)

However, little did I realize that I needed to declare my columns differently…

You see, back in the day, VARCHAR2(10) meant that I wanted to store up to 10 characters in the column gosh darn it — I didn’t worry about bytes vs. characters — same thing right?

Er, no.

So in a brand new database with an AL32UTF8 character set, why was I getting column length errors trying to insert the string H,U,Y with an umlaut into a VARCHAR2(3) field?

Heck, isn’t Y with an umlaut just another character? Its just WESMSWIN1252 character 255, right?

Don’t tell me it’s a character set issue — I’ve been trying to avoid opening up that NLS manual for years…

Ok, ok — open the manual and start reading about Unicode — specifically UTF-8.  Uh-oh, I read the words “variable-length encoding” and the light starts to dawn…

Turns out that Y with an umlaut is 1 byte in WESMSWIN1252 (specifically 0xFF), but it’s 2 bytes in UTF-8 (0xC3BF).

But didn’t I declare the column to be 3 characters in length?  So why does it care about the underlying encoding?

Enter NLS_LENGTH_SEMANTICS and the fact that the default is set to BYTE.

From the documentation:

http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch3globenv.htm#NLSPG235

NLS_LENGTH_SEMANTICS

Property
Description

Parameter type
String

Parameter scope
Environment variable, initialization parameter, and ALTER SESSION

Default value
BYTE

Range of values
BYTE or CHAR

By default, the character data types CHAR and VARCHAR2 are specified in bytes, not characters. Hence, the specification CHAR(20) in a table definition allows 20 bytes for storing character data.

This works well if the database character set uses a single-byte character encoding scheme because the number of characters is the same as the number of bytes. If the database character set uses a multibyte character encoding scheme, then the number of bytes no longer equals the number of characters because a character can consist of one or more bytes. Thus, column widths must be chosen with care to allow for the maximum possible number of bytes for a given number of characters. You can overcome this problem by switching to character semantics when defining the column size.

NLS_LENGTH_SEMANTICS enables you to create CHAR, VARCHAR2, and LONG columns using either byte or character length semantics. NCHAR, NVARCHAR2, CLOB, and NCLOB columns are always character-based. Existing columns are not affected.

You may be required to use byte semantics in order to maintain compatibility with existing applications.

NLS_LENGTH_SEMANTICS does not apply to tables in SYS and SYSTEM. The data dictionary always uses byte semantics.

Note that if the NLS_LENGTH_SEMANTICS environment variable is not set on the client, then the client session defaults to the value for NLS_LENGTH_SEMANTICS on the database server. This enables all client sessions on the network to have the same NLS_LENGTH_SEMANTICS behavior. Setting the environment variable on an individual client enables the server initialization parameter to be overridden for that client.

 

Sigh

Can anyone tell me why the default would be BYTE?  Why would I want to declare character fields with BYTE lengths?  Thank goodness it’s not in bits…

Anyway, we adjusted our standard to make sure that DDL always specifies BYTE or CHAR in the declaration now:

VARCHAR2(10 CHAR) instead of VARCHAR2(10), so now we can be sure…