Adventures in Setting up OEM Agents

So I’ve been working to push out OEM Agents from our Grid Control OMS on Windows XP to a couple of Linux servers running OEL 5.  It’s been a lot of fun, and I thought I’d share of the obstacles I overcame:

1. SSH User Equivalence. 

You need to get this working before anything else will work.  Fortunately there’s a pretty good script in oracle/oms10g/sysman/prov/resources/scripts that will do the hard work for you.  However, my first run of this resulted in (BTW, this is all run from a Cygwin bash shell since all of the scripts are Unix-based):

./sshUserSetupNT.sh.org: line 17: $’\r’: command not found
./sshUserSetupNT.sh.org: line 20: $’\r’: command not found
./sshUserSetupNT.sh.org: line 27: $’\r’: command not found
./sshUserSetupNT.sh.org: line 231: syntax error near unexpected token `elif’
./sshUserSetupNT.sh.org: line 231: `   elif ! test -f “$CLUSTER_CONFIGURATION_FI’E”

Couldn’t find much info in Metalink on this — turns out I needed to dos2unix the file before running it.

Also, the documentation on running this script is pretty good — especially this section.

2. You have to be careful with your Cygwin installation — I usually have Cygwin running anyway, and I install the net tools like telnet, ssh, ping, etc. 

However, I got errors saying that the following command in the .sh script was failing:

ping $host -n 5 -w 5

With the following message:

Usage:  ping [-dfqrv] host [packetsize [count [preload]]]

Turns out the Cygwin ping command doesn’t implement the -n and -w options.

Solution was to replace the ping command in the script with /cygdrive/c/windows/system32/ping.exe

3. Also turns out I didn’t actually have a Linux agent staged, even though I had a /oms10g/sysman/agent_download/10.2.0.2.0/linux directory, I didn’t have an oui/oui_linux.jar file.  Once I downloaded that and unzipped it cleanly, I still got some odd errors.

Searching Metalink, you’ll see some references to an oddly named file: OC4J~OC4J_EMPROV~default_island~1.  It’s a log file kept in /oms10g/opmn/logs and it has a lot of detail about the attempt to deploy the agent.  I was getting the following error:

error:  cannot open zipfile [ /app/oracle/product/agent/tmp/oui/oui_linux.jar ]
unzip:  cannot find /app/oracle/product/agent/tmp/oui/oui_linux.jar, /app/oracle/product/agent/tmp/oui/oui_linux.jar.zip or /app/oracle/product/agent/tmp/oui/oui_linux.jar.ZIP.
chmod: cannot access `/app/oracle/product/agent/tmp/oui/Disk1/’: No such file or directory
chmod: cannot access `/app/oracle/product/agent/tmp/oui/Disk1/install/unzip’: No such file or directory

This even though I had the oui_linux.jar file and it clearly transferred to the target machine, I could see it:

$ ls -l
total 42884
———-+ 1 oracle oinstall 43909460 May  5  2006 oui_linux.jar

Odd, why are the permissions 000 instead of 755?

Turns out the permissions on the Windows side were 000 and the script uses scp -p to transfer the file (and retain permissions), so the unjar command on the remote host couldn’t see the file.  I didn’t know you could create a file with 000 permissions on it, although I guess it makes sense — you’d have to chmod it to 000 after creating it though.  A quick chmod command on the Windows side fixed this.

4. Finally I ran into something I haven’t solved yet — I’m running OEL 5 (redhat-5) and the Installer doesn’t recognize that as a supported O/S (it’s looking for redhat-3 and redhat-4).  I’m trying different ways around that but haven’t quite gotten it.

To be precise, we’re mistaken

“Bother!  We were mistaken!”
“To be precise: we’re a mistake.”
- Thomson and Thompson, Cigars of the Pharaoh

This post belongs to that category of dusty old bits of knowledge that lay deep in your head — stuff you never really thought would be useful, but you learned it anyway and it’s there, waiting for its turn in the sun.

So a friend of mine comes up to me and says, “What do you know about Excel?”.  I say, “a little”.  And she asked me to come look at a problem she’s having.  She’s got a spreadsheet that she’s using to build a data entry form on (I know), and she’s got a cell where someone can put in an Account Number. 

“Watch”, she says.  And she types in 123456789123456789 and hits enter.

Excel immediately changes the cell contents to 123456789123456000, replacing the final 3 digits (789) with 3 zeros.

“Why is it doing that?’, she asks.

I make one change to the cell, and ask her to try it again.  This time is works without replacing the final 3 digits.

I give you a 4 question quiz:

  1. What did I do?
  2. Why did I do it?
  3. What about Excel was causing it to do what it did?
  4. Why would this knowledge be useful to me at all when dealing with databases?
Posted in Skills. 2 Comments »

Finding new features

Links to 2 things here.  I recently had an opportunity to throw an idea into a debate/quiz going on at Howard Rogers’ site.  I always find Howard’s quizzes to be provocative in that the debate is more interesting than either the question or the answer.  My interpretation of Howard’s question was: “Is there a single metric you can get from an Oracle database that will tell you the general well-being of the server hosting the database?” — no O/S tools allowed, SQL only, with a preference for a point-in-time number not requiring sampling.

Another way to say this is “a quick-and-dirty” answer :-)

Clearly, responsible adults are debating whether or not it’s a good idea to purport to hold up a single, non-sampled metric as the only thing you’ll ever need.  It’s not a good idea.  But if someone only allows you one command, what would you run?

Since Howard asked for the status of the server (not just the database), we need something that looks at the O/S from within Oracle.  We know that Oracle can do that — where do you think it gets sysdate from anyway? :-)

So, how do you find out if Oracle exposes monitoring statistics from the O/S?  I just read the reference manual, with an eye on the V$ views.  Gee, V$OSSTAT looks interesting, doesn’t it?  Oh well — how about:

select value from v$osstat where stat_name = ‘LOAD’;

Simple, and a reasonable thing to look at — as long as you know how many CPUs are on the box (and cpu_count is probably good enough for that).

Most people hadn’t heard of the v$osstat view… It’s “new”.  Which brings me to an excellent podcast from Tom Kyte – in it he talks about “new features”.  He also talks about how people look for the “best” new feature.  Which always depends on their specific needs.  :-)

How do I find new features?  I re-read the Concepts Guide and I always, always re-read the Reference guide (specifically the V$ views).  Here’s the thing, any cool new feature generally has a way to be monitored.  And if it doesn’t then I don’t want to use it :-)  So I can usually reverse back to a new feature by understanding the ways the Oracle engineers have exposed ways to monitor it.

(There are many exceptions to this — especially in the ways that Oracle extends SQL and PL/SQL — for those I look at the SQL Guide with special attention to the functions section, and also the PL/SQL Guide with special attention to the table of contents, and finally the Supplied Packages Guide — lotsa goodies in there).