NLS, Part Deux

A guest post today, by Brian Ledbetter, a co-worker at Agilex:

On a customer’s database, we ran across a table that would not migrate.  It was admittedly a log table, containing long chunks of HTTP header data, but whenever we tried importing it into our 11gR2 database, we ended up getting:

IMP-00058: ORACLE error 1461 encountered
ORA-01461: can bind a LONG value only for insert into a LONG column

After looking at the table structure, the first thing we noticed was that there was a VARCHAR2(4000) column in the table.  Considering that this column was already the maximum size (in bytes) for a CHAR-based data type, it became the focus of our attention.

Looking online for solutions, we found references [1] suggesting that Oracle was implicitly converting this column to a VARCHAR2(4000 CHAR) type, creating a column that can contain up to 4 bytes per character.[2]  Because this overflows the 4000 byte limit on column length, Oracle then attempted to implicitly convert the datatype to a LONG VARCHAR2, which is apparently deprecated in 11gR2.[3]  (We’re not sure why Oracle is still trying to make this conversion, if that’s the case.)

Anyway, we tried precreating the table with a CLOB datatype, and that didn’t work either, so as a workaround, we created a copy of the table with the data trimmed to 1000 characters (leaving plenty of room after UTF8 conversion):

create tabname_migtmp as select col1, col2, substr(col3,1,1000) col3 from tabname;

We then used exp/imp to copy tabname_migtmp over to the 11gR2 server, and inserted the data from it into the final location.

insert into tabname select * from tabname_migtmp;

drop table tabname_migtmp;

[1] http://forums.oracle.com/forums/thread.jspa?threadID=1038043

[2] http://stackoverflow.com/questions/5230346/char-semantics-and-ora-01461

[3] http://forums.oracle.com/forums/thread.jspa?threadID=2230351

See Also: Technote 444171.1, https://supporthtml.oracle.com/ep/faces/secure/km/DocumentDisplay.jspx?id=444171.1

 

Hold it right there…

Looks like our MySQL migration work is on hold for now — the server is undergoing recovery.  But in any event I think I’m going to abandon work with using ODBC through Oracle HS to Oracle XE conversion.  In all, I think that approach would work for a light-weight system (i.e, systems between 1-3GB in database size, and without TEXT / LONG columns).  But the MySQL database I’m looking at is about 80-100GB in size — small, but big enough (and including TEXT columns), that I’m probably going to move up to Oracle Migration Workbench andfull-blown Oracle 10g.

Now, if I can only scrounge up a stable test / development system…

Overcoded

Those of you that know me at work know that I don’t like complicated code — I call it “over-engineered”.  My staff now all know who Rube Goldberg is because that’s another thing I call over-engineered code — “Rube Goldberg code”.

Anyway, you know when you’re going down that path when you can’t get something to work — even though your code was copied straight from the Doc?  And then you keep hammering away at it, adding more and more functions, wrappers, etc until it finally works?  Now you have ugly code (but it works!) and still don’t understand why the code doesn’t work like the documentation says.

It’s often a lot of work, but worth it in the end to understand why it doesn’t work like it’s supposed to.

So, revisiting my migration story…

I get a call from the group and they’ve decided to use sqlldr and control files instead external tables.  Ok, I guess.  I think the external tables are cooler, but Ok.  And they try to load 9M rows into a partitioned IOT using a single stream, convential path.  Time to load = 28 minutes.  Too slow for me.  So I look at the control file, and I see something odd:

In the fields spec, I see:

VERSION_LOCK “to_number(trim(both ‘ ‘ from :VERSION_LOCK))”

The fields in the datafile are simple integers separated by tabs.  So why the complicated conversion routine?  They tell me they keep getting “invalid number” during the loads, so this is what they “HAD” to do.  Turns out that VERSION_LOCK is an optional column and some rows in data source don’t contain any data for the column.  And TRAILING NULLCOLS wasn’t taking care of it either.

This looks like overcoding to me.  So I did some digging.  And while I’m at it, I play some more with the external tables.  I declare the external table and lo and behold, I get the “invalid number” errors too.  So finally, I declare all of the external columns as CHAR to see what’s in them.  And then I use the good ole DUMP() function to see.

Guess what?  The column that was “empty” actually had a single blank character in it.  And here’s a simple quiz:

What does to_number(‘    100   ‘) return?  Answer, 100.
What does to_number(’59′) return?  Answer, 59.
What does to_number(’86 ‘) return?  Answer, 86.
What does to_number(NULL) return? Answer, NULL.
What does to_number(‘ ‘) return (single space inside the quotes)?

I think you could be forgiven if you guessed NULL — after all, it appears that to_number trims both leading and trailing spaces, and if it trims all the spaces, it should come up with NULL — but it doesn’t.  It says “invalid number”.

So, I need to convert the single space to NULL, because that’s what it is.  A simple LRTRIM option on the fields spec for the external table takes care of that — and now I can go back to declaring the columns as NUMBER because that’s what they are.

I do one other thing — I convert their partitioned IOT table into a simple heap table with a primary key index (actually I will create the index after the load is complete).  So far I haven’t gotten a reason why their 9M row table needs to be an IOT or partitioned, so I’ll try the heap one.

Time to load 9M records = 90 seconds.  Time to index 9M records = 150 seconds.  Total time = roughly 4 minutes.  Much better.

Do I think the time savings was due to the extra code? No.  It was probably due to the paritioned IOT.  But I still learned a lot.  Moral of the story?  One we all know and don’t apply enough — check your assumptions before starting to write “Rube Goldberg code”.

I’ll see your measly 6,000 tps and raise you…

Remember the other day when I was so impressed by updating roughly 6,000 rows per second.  What a rube I was.  That’s nothing.

So, Saturday night (I know, I need to get out more), I decided to work on the data loading problem I described in the last post.  We have roughly 60,000,000 rows to load and are required to get the job done in 3-4 hours.  Round up a bit and that’s 15,000 seconds.   Do the math and we need to be able to load about 4,000 rows per second.  Should be pretty easy — heck, if I can do 6,000 rows per second on the big AIX/EMC box, I should be able to do 4,000 rows per second on this Linux box.

I decided to try external tables for the loading — haven’t had a chance to play with them yet, so why not?  Here’s the documentation link.

So, I create an empty table for my target.  And then I create a 1,000,000 table to use to generate some test data by inserting as selecting 1,000 rows from dba_objects (twice and use a cartesian product to generate 1,000,000 rows).

(At this point, I should have been prepared for the results — but, it was late, I was tired, and I was just focussed on getting to the final step of loading the table.  But, for those of you who want the foreshadowing: the insert as select of 1,000,000 rows from dba_objects took only 8 seconds).

So then I use sqlplus to dump out the 1,000,000 rows to a flat file — took roughly 1 hour (no tuning of sqlplus fetching, forgot to turn termout off, etc..).

At this point it’s 12:30am and I’m really struggling to type commands correctly.  I finally get the external table created and fire up the insert /*+ append */ as select.

Total time to insert the 1,000,000 rows = 5.85 seconds.

I had to rub my eyes a bit to make sure I didn’t make a typo and only insert 100,000 rows.  But, nope — 1,000,000 rows in under 6 seconds.

I almost don’t believe it myself — at the Hotsos conference, Cary Millsap said: “Isn’t Oracle really fast?  Why is it so hard to make it go slow?  How come so many people are good at that?” :-)

So, that’s roughly 170,000 rows per second — or 40X faster than I need it to be.

Looks like we can load 60,000,000 rows in 6 minutes.

Where can I get more of these Linux boxes running 10gR2?

That’ll teach me

You know there are some questions for which the correct answer certainly isn’t the “right” answer.  Like when your 6-year old son comes over to you and says, “Dad, where do we keep the lawnmower?” — telling him: “It’s in the garage” probably won’t be the best thing.

You’d think I’d learn my lesson on this one.  True story — back in mid-95, 96 I was considered an expert on Oracle Replication.  And I get the following question: “Can I replicate the Oracle*CASE Repository?”.  Silly me, I step right into it by saying, “Well, it’s a regular Oracle schema, I don’t see why not…”.  2 weeks later I get an angry call from Oracle Support telling me that I shouldn’t go around saying Oracle supports replication of the CASE Repository unless I was willing to support it myself :-(

Anyway, today’s example is similar — about 2 months ago I was asked by an internal group for help on migrating a Postgres database into Oracle.  The complaint was “There aren’t any migration toolkits, what can we do?”.  So I do what every savvy tech person does nowadays, I google / sourceforge / freshmeat on “Oracle Postgres Migration” and I come with 4 or 5 tools which claim to do the job.  And I send them to the questioner with the suggestion — “go check these out”.  Silly me, again.

The group looks at them and picks the DTM Migration Kit – fair enough.  At first blush, looks fine.

Now at this point, I have to explain, I think there are 2 parts to any database migration — there’s the schema migration (structure conversion, often converting datatype definitions), and then there’s the data migration.  And it’s OK to use different tools for each one.  As a matter of fact, it’s probably a good idea.

Anyway, a couple of months go by, and things appear to be going swimmingly — they convert their schemas just fine.  And then, about 7 weeks prior to production cutover from Postgres to Oracle, I get a call — “It’s taking 40 hours to convert our production database”.  Mind you, the production database is only 10GB in size.  And they tell me — stop me if you’ve heard this one before — the Oracle database is the problem / bottleneck / source of all pain and misery :-)

So, I do more dumb stuff — I tell them: drop the indexes, disable the constraints, set to nologging — the whole shebang.  And gee — it still seems slow.  So I go take a look — and I just start shaking my head.  Imagine the slowest possible way to convert a set of data.  The migration toolkit connects to the datasource using ODBC, sucks out the data, and then connects to Oracle using ODBC and CONVERTS ONE ROW AT A TIME WITHOUT USING BIND VARIABLES.

Oh boy — every row its own special conversion routine — its own special, unique SQL statement, its own set of network roundtrips.  It was hilarious.  I asked the group to ask the DTM folks if there was an option to use bind variables (let alone do some bulk binds) — the response from the DTM people (their “architect”) — “hmm, that sounds interesting”.

Ah well — I think we’ll just use it for the schema migration.  My wife suggested we look at hetereogeneous services — I love it when she talks like that :-)