NEOOUG Ohio, ODTUG Kscope12 and something unique from Redgate

If you’ve known me for a long time, and let’s face it, if you’re visiting this blog, you probably do, you know that I’m particularly proud of the work I was able to do at Network Solutions as the Director of Database Engineering and Development.

I had the privilege of working with some of the best software development colleagues I’ve ever encountered, including Pete Fox (Vice President of Engineering), Brian Taylor (Release Management), Eric Faulkner (Product/Portfolio Owner), Mike Cocozza (Application Tier) and Mona Nahavandi (Presentation Tier), Sujata Nakhre (Billing) and Michelle Lee (CRM), as well as a host of other professionals in QA and Operations to bring the Network Solutions website and IT applications to life.

I didn’t know it at the time (2002-2007) because it wasn’t well known yet, but we practiced what I now know to be proto-Agile and proto-Helsinki development of a complex system that was timeboxed and data-centric. We regularly released features in three-month cycles based on release themes made up of capabilities and features which were managed and prioritized by product owners.

Recently I had the privilege of presenting at the Northeast Ohio Oracle User’s Group training day conference in Cleveland, Ohio and during the sessions I actually started to hear the stirrings of the Oracle community becoming more aware of Agile and Helsinki as ways to improve our ability to deliver quality Oracle-based software systems in a more rapid manner.

BTW, I couldn’t have been more impressed with the NEOOUG training day conference — I was highly surprised given that I had assumed attendees would be limited to North East Ohio, and that the conference itself would reflect a minimal ability to host a larger event.

I was wrong.

Held at Cleveland State University, I estimated the crowd at about 100-150 participants in state-of-the-art venues including university classrooms and ballroom spaces. Excellent wifi coverage was the icing on the cake.

I thought the quality of the presentations was pretty good as well — and I’ll definitely be returning in the future, as well as encouraging folks from the Midwest and Great Lakes regions to attend. The folks at NEOOUG told me they’re aiming to be as good as the RMOUG Training Days conference and I’d say they’re clearly on the right track.

As an aside, I think it’s really a good idea for Oracle Technology conferences to be more closely identified with academia through the use of university / college facilities as opposed to the Applications-focused conferences which are more of a vendor event — I think it more clearly aligns the content with the audience and venue.

I gave 2 presentations at the conference: Implementing MapReduce with SQL and PL/SQL; and An Examination of Schema Architecture Patterns for Edition-Based Redefinition.

I’m particularly proud of the MapReduce presentation — it’s evolved nicely since I gave it at RMOUG and I’ll be reprising it at ODTUG KScope12 in June as well, but I was really pleasantly surprised at how well the EBR presentation went.

It’s probably because any discussion of EBR naturally aligns itself with how to use it to support a development process, and so I got to discuss some of my methods for doing database development:

1. Agile-style timeboxed deliveries
2. Helsinki-style database APIs through stored procedures
3. Describing the database as a “data server” accessed via an API (lately I prefer a RESTful API exposed via the APEX listener), instead of a “database server” with tables mapped into client-side ORMs
4. Versioning of database objects — historically by appending version numbers to programmatic objects (packages and procedures), now by using Edition-Based Redefinition
5. Extracting definitions of database objects to the file-system for source-code control check-in
6. DDL auditing everywhere — Development, QA, Production — to track all database structural changes

#4, #5 and #6 have been areas of keen interest to me for over 7 years now — and I’m happy to say that tools and vendors have really been responsive to my requests for improvements. I can’t say enough good things about Kris Rice and Bryn Llewellyn from Oracle, who listen to my requests for features in SQL Developer and how I might use EBR — I’ve seen changes that make SQL Developer get better and better at #4 and #5 in particular – now you can generate individual scripts per object and use the new Cart feature for deployment.

Today I want to talk about another vendor looking to provide capability in #4 and #5 in a unique way: through a live opportunity to engage the Oracle developer community in a rapid feedback loop on features at the upcoming ODTUG KScope12 conference in San Antonio, TX on June 24-28, 2012.

Redgate Software has done a bunch of work in the SQL Server space for years and more recently has upped their game in the Oracle tools world with their Schema Compare, Data Compare and Schema Doc products, as well as hosting the AllThingsOracle.com website. And at KScope they’re looking to really interact with the development community on building Source Code Control for Oracle in a live, agile fashion.

Instead of a simple product demonstration booth, Redgate will be using their vendor space as a mini-development shop, soliciting features from conference attendees and using an agile development process to actualize those features into a prototype product for Oracle Source Code Control.

Needless to say, I’m excited about it — I have definite ideas on what works and what doesn’t work for Oracle source code control and I can’t wait to try and get those things implemented while seeing what other developers and DBAs also want.

If database development is at all important to you, this sounds like a great opportunity to get your ideas out there.

The 3 Letters Which Can Put an American DBA to Sleep are NL…zzzzz

Pity us poor US DBAs — safely secure using our ancient, many-times-upgraded Oracle 6 databases with their US7ASCII character sets.

We knew that ASCII only covered 0-127, but who could blame us when we started putting WE8MSWIN1252 “international” characters into those fields — the database let us, and it felt kind of sexy putting in cool European characters with umlauts and accents on them.

Besides, all of our business was with other American companies, and if someone had some “funny” characters in their name, then they just had to change them!

Of course, all of this is said with tongue firmly planted in cheek.  Nowadays you’d better be able to handle Unicode in your database if you want to have a prayer of not being labeled as something older than teleprinters and typewriters.

I first encountered this situation when working with a US7ASCII database where we started using XMLTYPE columns — little did I know that XMLTYPE columns actually validated the character set of the XML document coming in — one of our fields was the country name.

Everything was fine until February 13th, 2004 — the day ISO added an entry for the Aland Islands… (which has an A with a diacritical ring above it).

We started seeing errors inserting our XML documents — all due to strict validation of the character set.  Did we change character sets?  No — we stopped using the XMLTYPE columns :-(

Fast forward a few years and now I’m lucky enough to work with proper databases created with the AL32UTF8 character set — so now I can store my friend Mogens Noorgard name correctly (or I would if I could spell it�)

However, little did I realize that I needed to declare my columns differently…

You see, back in the day, VARCHAR2(10) meant that I wanted to store up to 10 characters in the column gosh darn it — I didn’t worry about bytes vs. characters — same thing right?

Er, no.

So in a brand new database with an AL32UTF8 character set, why was I getting column length errors trying to insert the string H,U,Y with an umlaut into a VARCHAR2(3) field?

Heck, isn’t Y with an umlaut just another character? Its just WESMSWIN1252 character 255, right?

Don’t tell me it’s a character set issue — I’ve been trying to avoid opening up that NLS manual for years…

Ok, ok — open the manual and start reading about Unicode — specifically UTF-8.  Uh-oh, I read the words “variable-length encoding” and the light starts to dawn…

Turns out that Y with an umlaut is 1 byte in WESMSWIN1252 (specifically 0xFF), but it’s 2 bytes in UTF-8 (0xC3BF).

But didn’t I declare the column to be 3 characters in length?  So why does it care about the underlying encoding?

Enter NLS_LENGTH_SEMANTICS and the fact that the default is set to BYTE.

From the documentation:

http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch3globenv.htm#NLSPG235

NLS_LENGTH_SEMANTICS

Property
Description

Parameter type
String

Parameter scope
Environment variable, initialization parameter, and ALTER SESSION

Default value
BYTE

Range of values
BYTE or CHAR

By default, the character data types CHAR and VARCHAR2 are specified in bytes, not characters. Hence, the specification CHAR(20) in a table definition allows 20 bytes for storing character data.

This works well if the database character set uses a single-byte character encoding scheme because the number of characters is the same as the number of bytes. If the database character set uses a multibyte character encoding scheme, then the number of bytes no longer equals the number of characters because a character can consist of one or more bytes. Thus, column widths must be chosen with care to allow for the maximum possible number of bytes for a given number of characters. You can overcome this problem by switching to character semantics when defining the column size.

NLS_LENGTH_SEMANTICS enables you to create CHAR, VARCHAR2, and LONG columns using either byte or character length semantics. NCHAR, NVARCHAR2, CLOB, and NCLOB columns are always character-based. Existing columns are not affected.

You may be required to use byte semantics in order to maintain compatibility with existing applications.

NLS_LENGTH_SEMANTICS does not apply to tables in SYS and SYSTEM. The data dictionary always uses byte semantics.

Note that if the NLS_LENGTH_SEMANTICS environment variable is not set on the client, then the client session defaults to the value for NLS_LENGTH_SEMANTICS on the database server. This enables all client sessions on the network to have the same NLS_LENGTH_SEMANTICS behavior. Setting the environment variable on an individual client enables the server initialization parameter to be overridden for that client.

 

Sigh

Can anyone tell me why the default would be BYTE?  Why would I want to declare character fields with BYTE lengths?  Thank goodness it’s not in bits…

Anyway, we adjusted our standard to make sure that DDL always specifies BYTE or CHAR in the declaration now:

VARCHAR2(10 CHAR) instead of VARCHAR2(10), so now we can be sure…

Outing my source

I know, it’s been a while.  Here’s an older topic that I’ve been thinking about lately — and that’s dealing with outsourced code.

Outsourcing development is attractive for its context-based lower development costs and faster turnaround (I say “context-based”, because often times if you relieved an in-house group of its non-productive responsibilities and allowed it to compete on an equal footing, you’d get the same level of cost and quality).  Unfortunately, too many outsourced projects don’t take post-delivery activities into consideration.  Often the contracts are written with the focus on functionality and time-to-completion without corresponding rigor being applied to code maintainability and compatibility with existing maintenance processes.

It’s a real shame when the cost savings associated with the development and delivery are obliterated by cost increases in maintenance and operation.  The challenge is to make sure that the contract stipulates adherence to internal standards which will support internal takeover of the developed code.

Here are some questions to ask when you get a chance to provide input to an outsourced development contract:

  1. Will the code follow our development standards, naming conventions, packaging rules?  If not, why not?  If not, will we have to re-work the code upon delivery?
  2. Can the code be checked into our source-code control system when we get it?  If not, why not?
  3. Can the code be compiled, built and made installable by our local build system?
  4. Will the code use third party software that we’re not familiar with?  Will it require separate licensing?  Will we be able to get training on it?
  5. Who will do integration, system and/or scalability testing?
  6. Who generates the test data and who owns that testing data?
  7. Is the software built to support regression and/or scenario testing?  Will it run in our local testing environment?

In the end, it becomes your code and your responsibility.  It should be held to the same standards as your internal code development — and that shouldn’t be a problem.

Column Naming

Laurent’s post on column qualification reminded me of a conversation I had with one of my staff about column naming standards.  But before I get to that conversation, I’d like to share my favorite example about using table aliases.

I’m a big fan of always using table aliases and always qualifying column names.  My favorite example of why it’s a good idea to do it is as follows:

select * from v$session where saddr in (select saddr from v$transaction)

Can you spot the problem here?  There isn’t an saddr column in v$transaction.  But if you run that query, you don’t get an error.  In fact, you get all of the sessions from v$session — when you probably only wanted the sessions which had active transactions.  If you were in the habit of always qualifying the table names like so:

select s.* from v$session s where s.saddr in (select t.saddr from v$transaction t)

You’d get ORA-00904: “T”.”SADDR”: invalid identifier.

You can argue about whether or not the original statement should throw an error, but it doesn’t matter.  If you used the good practice of always aliasing tables and qualifying columns, you wouldn’t need to waste your breath.

One “argument” I hear against qualifying column names comes from people who have a policy of making sure every column name in the system is unique, usually by including some kind of table name abbreviation prefix on every column in a table.  My staff member asked me my opinion of this.  Bleh.  I’ve got nothing really against it, other than I think it’s not feasible to enforce and it hampers readability.  I also don’t like how that policy treats foreign key columns — usually inconsistently.  And in the end, it seems to just be an excuse for not having to qualify columns in queries anyway :-)

Superscript

…back to Change Management.

In my earlier post, I talked about the strong change management ethic at my current company.  It was interesting to see how my predecessor here adapted database schema evolution to that ethic.

I need to add here that I find “ancient” software development organizational ideas like “chief” or “surgeon” programming teams to be fascinating.  In particular because each model describes “super” programmers who end up being responsible for 80-90% of the code anyway due to shear talent and dominant personality.  Almost everywhere I’ve gone has had a local expert who seemed to qualify for that kind of title.  The relevance of ideas from Mills and Brooks is commented on here.

The biggest thing I’ve taken away from those ideas of “super” programmers is that, as a manager, I need to help every member of my staff become as close to “super” as possible.  That means removing as many non-productive, non-programming, non-learning tasks as possible.

Another way to put this is to find out ways to reduce the effort and work required to do non-programming tasks.

This is where the DDL generation scripts came into the picture.

When I arrived, every programmer hand-coded the scripts which were checked into the source code control system.  Everyone had to remember to include the source code control tags, and script names weren’t obvious.  Also, it wasn’t always clear that every changed object and/or procedure was being captured with every release.

To address those issues, I:

  1. Required that all code checked into the source code control system had to be generated via the DDL generation scripts.  We modified the scripts to include the source code control tags automatically.  We also did things like include primary key indexes, grants and initial stats analysis in table creation scripts.  We also injected source code control tags into the stored procedure and package code so that they could be seen when browsing the database.
  2. Modified the scripts to include the object type, schema name and object name in the generated file name.  So, a script that created the SCOTT.EMP table ended up being called mk-tbl-scott-emp.sql  (We used mk for created, rm for drop and ch for alter).
  3. Turned on DDL auditing for all objects in the development database.  This allowed us to double-check and make sure we captured and verified which objects were changed leading up to a release.

A note on DDL auditing — I like to try and have this turned on in every database I work with.  Development, Test, QA, Production, you name it.  I think it’s always useful to have a log of structure changes to the database — and why code up a special trigger when the database will track it automatically?  The only objection I’ve heard to this is space — and the space required is so ridiculously small.  Sometimes people complain about performance — and I say that if you have a system that relies on 1000′s of DDLs per day, you’ve got a system that needs work.  Who does DDL as part of an OLTP transaction?

Our biggest remaining challenge with this approach was how to create the “master” installation “super” scripts.  I’ll cover that in a later post…

(One other concern is the shear number of files this approach can generate.  If Oracle did this for the SYS schema, you’d have over 4,000 individual files.  On the other hand, this works well when you only need to re-create a single object without searching through 4-5 cat scripts to find the object definition.  Other systems use object definition files which are then read by specialized programs to generate installation scripts — that seems like a lot of work for an internal corporate application).