Chrysopylae, Part 2

Part 2 of my review of the Oracle GoldenGate 11g Implementer’s Guide begins with Chapter 5, Configuration Options.

The Configuration options chapter deals with more advanced options like batching, compression, encryption, triggering events, loop and conflict detection and DDL replication.

Batching and how SQL statements are cached to support batching, along with error handling and fallback processing are thoroughly explained.

Compression is also covered in some detail, with information about how GoldenGate cannot replicate data from Oracle compressed tables (including the EHCC compression from Exadata database machines).

In-flight (message) encryption and at-rest (trail) encryption is covered as well.

Event triggering is covered at a basic level, but gives a good insight as to what is possible – including the ability to have GoldenGate fire off a shell script in response to a particular set of values being detected in the capture process.

The discussion of bi-directional replication begins with a thorough list of items to be considered, including loops, conflict detection and resolution, sequences and triggers.

Conflict resolution options are slightly limited, and aren’t clearly defined – for example, applying a net difference instead of the after image is only useful in a subset of mathematical operations on numerical columns.  And there is no mention of prioritization by site (by which some sites updates always take precedence).  In truth, conflict resolution procedures can get pretty complicated, and I’m surprised there isn’t more information about them in this section or a referral to a later section (For example, Chapter 7 on Advanced Configuration).

The section on sequences is equally lacking in options, starting with a rather unclear statement about not supporting the replication of sequence values – what is really meant is that sequences themselves are not synchronized across multiple databases.  And the recommendation to use odd / even strategies is also rather simplistic – missing out on multi-master scenarios.  One can always reserve lower digits to enable more than 2 sites, and technically one can set up sequences to allow for an infinite number of sites as well…

Trigger handling advice is also rather simplistic – leading to more questions than answers as it talks about disabling triggers during the application of replicated data – there isn’t a mention of how that will affect an active / active system where local transactions are occurring.

There is a good discussion on DDL replication, with the information that the RECYCLEBIN must be disabled.

Chapter 6 – Configuring GoldenGate for HA

This chapter talks about GoldenGate in RAC environments, including the need for shared filesystems, and configuring GoldenGate with VIPs and with clusterware.  Sample scripts and commands are included – overall this chapter stays on point.

Chapter 7 – Advanced Configuration

In reality I’d call this chapter Configuration Details, but it does a very nice job of going through details around how to map objects in a replication configuration, as well as exploring the ability for GoldenGate to detect errors and execute SQL and/or stored procedures in response to those conditions.

Basic transformation is also covered.

Chapter 8 – Managing Oracle GoldenGate

This chapter covers basic command level security and spends a lot of time on the command interpreter GGSCI.  Also nice is a set of scripts and instructions to take performance output and format it for graphing in Excel.

Chapter 9 – Performance Tuning

The performance tuning chapter focuses on how to parallelize replication traffic and thoroughly exploit all system resources to increase throughput.  It also make mention of the 11.1 release of GoldenGate.  Details about tuning DBFS are also included. 

Like a lot of Performance Tuning advice, this section is more about throughput than performance optimization – in that vein it succeeds in covering ways to push more data more quickly.

Chapter 10 – Troubleshooting GoldenGate

The troubleshooting chapter begins with a good section on tracking down why replication may not be working – getting statistics on every process to see if they think they are capturing or sending data.  There is also good information on the CHECKPARAMS command which can be used to validate configuration files and the author also covers potential issues with the command.

The author covers the looking at checkpoints and networks as well.

There is a good section on creating exception handlers to capture and diagnose duplicate and missing record errors, including capture of before and after images.

Finally the chapter goes into detail on the LOGDUMP utility which can be used to examine trail files for error conditions.

Summary

Overall I found the book to be a good companion to the GoldenGate manuals and training materials.  It’s obvious that the author has a lot of configuration and operational experience with GoldenGate.  I found the book weak on design and planning for replication environments, so if you’re new to replication I’d suggest adding another book to your library above and beyond this one.

Chrysopylae, Part 1

As an old hand at replication, I often like to read about advances in data replication technology and how different products approach various replication issues.  Core issues in data replication have remained the same for many years, including capturing of changes to be replicated, potentially filtering or altering those changes en-route to one or more destinations, and dealing with possible conflicts associated with keeping data synchronized.

The basis of my experience comes from my work with Oracle’s Symmetric Replication option from Oracle 7 – in which replication definitions were created and stored in database tables, changes were captured by triggers, stored in “outbox” tables, and pushed to other sites using an RPC-like mechanism.  I documented many of those changes in an Oracle whitepaper: Strategies and Techniques for Using Oracle7 Replication.

Even then, it was clear that a process which added significant overhead to capturing, storing and propagating data changes would interfere with application data processing.  Transaction log replication using products like SharePlex as well as the newer Oracle Streams option handled data capture and propagation in a much more efficient manner.

While Oracle Streams increased the capability and throughput of the concept of declared data replication, many people have found it difficult to configure, monitor, diagnose and debug.  The hybrid nature of external capture and internal queuing and propagation adds complexity to a process that is already full of technical conceptual challenges.

Oracle’s acquisition of GoldenGate for replication seems to indicate that Oracle has recognized the need for more straightforward way of configuring and managing replication.  With simple configuration files outside of the database, GoldenGate appeals to those of us who don’t want replication to complicate an already functioning application.

That’s not to say that the replication lessons of the past aren’t relevant anymore – you still need to take them into account, especially if you’ve been seduced into trying to achieve multi-way, multi-site, any-row update replication.

But if you’ve decided to embark on an Oracle data replication project in 2011, you’d best become familiar with GoldenGate.

I’ve been to Oracle’s training on GoldenGate and also worked through the manuals as they are – I can honestly say that additional introductory material is a welcoming development.  Hence my recent reading of the Oracle GoldenGate 11g Implementer’s Guide by John Jeffries.

Below is my review.

I’m always curious about who contributes to books and I was surprised that 2 out of the 3 reviewers for this book were SOA architects – while I could see that the messaging aspect of replication could possibly relate to SOA, it seems like a stretch to me.

Chapter 1 – Getting Started

I find it interesting that this chapter starts out by referring to Oracle GoldenGate 10.4 (since the title refers to Oracle GoldenGate 11g) – I’m sure this is just due to the version shakeout, but nonetheless it starts me off askance.

I was surprised that the competitor section didn’t mention SharePlex, but seemed to focus more on storage replication products.

I found the five proposed data replication solutions to be a bit odd:

  1. High Availability
  2. Zero-downtime Upgrades and Migrations
  3. Live Reporting
  4. Operational Business Intelligence
  5. Transaction Data Integration

3 & 4 seemed to be duplicative, and 5 seemed to be the rationale for the tenuous link to SOA.  The accompanying diagram also seemed to emphasis one-way replication instead of some of the more generic replication environments related to data placement, data usage and data change.

The chapter then quickly jumps into the technology – leaving the reader with a scant 1 1/2 pages on possible replication architectures.

The technology overview is pretty good – covering the main building blocks of GoldenGate (although I wish the author had clearly noted that the Data Pump is not the same thing as Data Pump Export.  It’s funny, since the author refers to a Capture process which is later referred to by it’s proper name as the Extract process).

The descriptions of the processes are good, and I liked that the Apply (or Replicat) process mentions the ability to replicate both DML and DDL (with the caveat that DDL can only be replicated in unidirectional configurations – reminds me of the old master definition site).

The process data flow diagram is also pretty good – although again one might be confused by the ckpt process listed, which is not the Oracle checkpoint process, but rather the name of a checkpoint file for GoldenGate.

From here we move on to a section called Oracle GoldenGate architecture, which now describes more topological solutions like:

  1. One-to-one source to target
  2. One-to-many
  3. Many-to-one
  4. Cascading
  5. Bi-directional active / active
  6. Bi-directional active / passive

These are much better descriptions than the solutions mentioned earlier in the chapter.

The many-to-one architecture description is good, and also brings mention of possible conflicts.

The bi-directional architecture description is also good, and makes mention of avoiding change loops so that incoming replication changes aren’t captured and sent back to the source.

After a section on certified version combinations, we encounter a section called “Process topology”

I found this section to be extremely uneven – we begin with a semi-deep description of “the rules” for parallel Extract and/or Replicat processes – the information here seems out-of-place in Chapter 1.

After this blip, we have a quick look at the INFO ALL and STATS commands for looking at Statistics before moving on to a section on Design considerations.

As we move through this section, the author does a nice job talking about good database schema design, deciding what data to replicate and some of the GoldenGate commands for selecting objects and data to replicate.  Requirements like supplemental logging and primary keys are also highlighted.  There is a brief mention of initial data loads, but not much information making sure that you start capturing data before instantiating remote systems.

Chapter 2 – Installing and Preparing

I like how this chapter starts off by calling for DBAs who have Sys Admin privileges and skills.  It’s good to know before you start working with GoldenGate.

This chapter appears to be where the author comes into his own – the prerequisites and system requirements are well-documented.

I did find it odd that the recommended approach for installing on Unix calls for downloading a .zip file to Windows, unzipping it there, and then FTP’ing the resulting .tar file to the target Unix / Linux server.

It might have been a good idea to reference the fact that the installation directories are not under the control of the Oracle installer or inventory – a fact that might change where you want to place the files.

Past that, the chapter does a nice job of describing the directories, preparing databases for replication and setting up and starting the GoldenGate processes.

Chapter 3 – Design Considerations

Chapter 3 starts in with active / active, active / passive, cascading and use of GoldenGate in a Physical Data Guard configuration.

From those descriptions we quickly move to network reliability and detailed discussion of NIC teaming for network redundancy.

After the network discussion, we bounce up to various database architectures like single-server, grid and clusters.

From there we move to a discussion on “Changed data management” which talks about Point-in Time Recovery, RMAN and Flashback.

I found this chapter to be very uneven with regard to design considerations.

Chapter 4 – Configuring Oracle GoldenGate

Here the author gets back on solid ground with a nice description of options for loading remote sites, with a good level of detail about the various ways to take a set of data, transport it to the remote site, and load it up while making sure to capture changes during this process.  Each option is explained in detail and examples are provided as well.

The data unload and load processes are covered in detail, as well as the change data capture configuration. 

Trail files are also covered in detail, including trail file purging.

I’ll be covering Chapters 5-10 in Part II of my review.