Chrysopylae, Part 1

As an old hand at replication, I often like to read about advances in data replication technology and how different products approach various replication issues.  Core issues in data replication have remained the same for many years, including capturing of changes to be replicated, potentially filtering or altering those changes en-route to one or more destinations, and dealing with possible conflicts associated with keeping data synchronized.

The basis of my experience comes from my work with Oracle’s Symmetric Replication option from Oracle 7 – in which replication definitions were created and stored in database tables, changes were captured by triggers, stored in “outbox” tables, and pushed to other sites using an RPC-like mechanism.  I documented many of those changes in an Oracle whitepaper: Strategies and Techniques for Using Oracle7 Replication.

Even then, it was clear that a process which added significant overhead to capturing, storing and propagating data changes would interfere with application data processing.  Transaction log replication using products like SharePlex as well as the newer Oracle Streams option handled data capture and propagation in a much more efficient manner.

While Oracle Streams increased the capability and throughput of the concept of declared data replication, many people have found it difficult to configure, monitor, diagnose and debug.  The hybrid nature of external capture and internal queuing and propagation adds complexity to a process that is already full of technical conceptual challenges.

Oracle’s acquisition of GoldenGate for replication seems to indicate that Oracle has recognized the need for more straightforward way of configuring and managing replication.  With simple configuration files outside of the database, GoldenGate appeals to those of us who don’t want replication to complicate an already functioning application.

That’s not to say that the replication lessons of the past aren’t relevant anymore – you still need to take them into account, especially if you’ve been seduced into trying to achieve multi-way, multi-site, any-row update replication.

But if you’ve decided to embark on an Oracle data replication project in 2011, you’d best become familiar with GoldenGate.

I’ve been to Oracle’s training on GoldenGate and also worked through the manuals as they are – I can honestly say that additional introductory material is a welcoming development.  Hence my recent reading of the Oracle GoldenGate 11g Implementer’s Guide by John Jeffries.

Below is my review.

I’m always curious about who contributes to books and I was surprised that 2 out of the 3 reviewers for this book were SOA architects – while I could see that the messaging aspect of replication could possibly relate to SOA, it seems like a stretch to me.

Chapter 1 – Getting Started

I find it interesting that this chapter starts out by referring to Oracle GoldenGate 10.4 (since the title refers to Oracle GoldenGate 11g) – I’m sure this is just due to the version shakeout, but nonetheless it starts me off askance.

I was surprised that the competitor section didn’t mention SharePlex, but seemed to focus more on storage replication products.

I found the five proposed data replication solutions to be a bit odd:

  1. High Availability
  2. Zero-downtime Upgrades and Migrations
  3. Live Reporting
  4. Operational Business Intelligence
  5. Transaction Data Integration

3 & 4 seemed to be duplicative, and 5 seemed to be the rationale for the tenuous link to SOA.  The accompanying diagram also seemed to emphasis one-way replication instead of some of the more generic replication environments related to data placement, data usage and data change.

The chapter then quickly jumps into the technology – leaving the reader with a scant 1 1/2 pages on possible replication architectures.

The technology overview is pretty good – covering the main building blocks of GoldenGate (although I wish the author had clearly noted that the Data Pump is not the same thing as Data Pump Export.  It’s funny, since the author refers to a Capture process which is later referred to by it’s proper name as the Extract process).

The descriptions of the processes are good, and I liked that the Apply (or Replicat) process mentions the ability to replicate both DML and DDL (with the caveat that DDL can only be replicated in unidirectional configurations – reminds me of the old master definition site).

The process data flow diagram is also pretty good – although again one might be confused by the ckpt process listed, which is not the Oracle checkpoint process, but rather the name of a checkpoint file for GoldenGate.

From here we move on to a section called Oracle GoldenGate architecture, which now describes more topological solutions like:

  1. One-to-one source to target
  2. One-to-many
  3. Many-to-one
  4. Cascading
  5. Bi-directional active / active
  6. Bi-directional active / passive

These are much better descriptions than the solutions mentioned earlier in the chapter.

The many-to-one architecture description is good, and also brings mention of possible conflicts.

The bi-directional architecture description is also good, and makes mention of avoiding change loops so that incoming replication changes aren’t captured and sent back to the source.

After a section on certified version combinations, we encounter a section called “Process topology”

I found this section to be extremely uneven – we begin with a semi-deep description of “the rules” for parallel Extract and/or Replicat processes – the information here seems out-of-place in Chapter 1.

After this blip, we have a quick look at the INFO ALL and STATS commands for looking at Statistics before moving on to a section on Design considerations.

As we move through this section, the author does a nice job talking about good database schema design, deciding what data to replicate and some of the GoldenGate commands for selecting objects and data to replicate.  Requirements like supplemental logging and primary keys are also highlighted.  There is a brief mention of initial data loads, but not much information making sure that you start capturing data before instantiating remote systems.

Chapter 2 – Installing and Preparing

I like how this chapter starts off by calling for DBAs who have Sys Admin privileges and skills.  It’s good to know before you start working with GoldenGate.

This chapter appears to be where the author comes into his own – the prerequisites and system requirements are well-documented.

I did find it odd that the recommended approach for installing on Unix calls for downloading a .zip file to Windows, unzipping it there, and then FTP’ing the resulting .tar file to the target Unix / Linux server.

It might have been a good idea to reference the fact that the installation directories are not under the control of the Oracle installer or inventory – a fact that might change where you want to place the files.

Past that, the chapter does a nice job of describing the directories, preparing databases for replication and setting up and starting the GoldenGate processes.

Chapter 3 – Design Considerations

Chapter 3 starts in with active / active, active / passive, cascading and use of GoldenGate in a Physical Data Guard configuration.

From those descriptions we quickly move to network reliability and detailed discussion of NIC teaming for network redundancy.

After the network discussion, we bounce up to various database architectures like single-server, grid and clusters.

From there we move to a discussion on “Changed data management” which talks about Point-in Time Recovery, RMAN and Flashback.

I found this chapter to be very uneven with regard to design considerations.

Chapter 4 – Configuring Oracle GoldenGate

Here the author gets back on solid ground with a nice description of options for loading remote sites, with a good level of detail about the various ways to take a set of data, transport it to the remote site, and load it up while making sure to capture changes during this process.  Each option is explained in detail and examples are provided as well.

The data unload and load processes are covered in detail, as well as the change data capture configuration. 

Trail files are also covered in detail, including trail file purging.

I’ll be covering Chapters 5-10 in Part II of my review.

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder