Download Zipped scripts for this Automation (goldengate_dr_automation.zip)
Oracle GoldenGate
Disaster Recovery
Switchover / Failover
Automation
Author: Vladimir Grigorian
Review Date: March 20, 2015
Last Updated: March 20, 2015
Version: 2.0
Table of
Contents
Document Control ………………………………………………………………………. 3
Introduction ……………………………………………………………………………… 4
Overview of Standby and GoldenGate Configuration ……………………………….. 5
1.
GoldenGate Failover scripts Functionality and
Purpose……………………… 8
2.
Implementation …………………………………….…………………………… 10
3.
Testing of GoldenGate Failover Capaility.abase for
Target ………………….. 11
APPENDIX
……………………………………………………………………………….. 16
Change Record
Date |
Author |
Version |
Change Reference |
|
|
|
|
|
|
March 10, 2011 |
Vladimir Grigorian |
1.0 |
First Draft |
|
|
|
|
|
|
March 20, 2011 |
Vladimir Grigorian |
2.0 |
Changed scripts from 5 mins checks
from cron to on-demand run. Added first SCN after failover check. |
|
|
|
|
|
|
Name |
Position |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Distribution
Copy No. |
Name |
Location |
|
|
|
|
|
1
|
|
|
|
2
|
|
|
|
3 |
|
|
|
This publication may not be reproduced, stored in a
retrieval system, or transmitted in whole or in part, in any form, or by any
means, electronic, mechanical, photocopying, recording or otherwise, without
the prior written permission of Cognizant/ Xerox Company.
The purpose of this document is document is to outline a procedure for failing over GoldenGate from a primary database to its standby in an Oracle DataGuard configuration. This is a Proof of Concept setup.
Usually, production databases with a running GoldenGate configuration, which replicates to a DSS or another environment, also utilize a standby database for disaster recovery. The challenge most DBA’s are facing is that if a primary datacenter is inaccessible, the rebuilding of a new GoldenGate setup on the failedover standby instance either takes time, or becomes impossible because primary GoldenGate configuration files and trails are no longer available. If DataGuard failover takes minutes to complete, the reinstantiation of GoldenGate on the new primary could take weeks.
Therefore, the purpose of this proof of concept test is to implement and successfully test a functionality that will allow the following:
1) The periodic (every 5 minutes) parameter file sync up between a primary GoldenGate and standby. This will run from the primary database. It will first make backups of the files being replaced. The sync up should automatically change copied parameter files on the standby to reflect its specific parameters, such as $TNS_ADMIN, $ORACLE_SID, dirdat, etc. The sync up should not overwrite parameter files that do not exist on primary.
2) The on-demand script run which will get the first SCN after failover to ensure no data is lost after standby has been started as primary. The script will enable instance GoldenGate parameters, logon to GGSCI, enable and start processes synced up and changed to the standby’s specs. The script then will pass the SCN to the obey GoldenGate script. The script should then clean up the obey script, so the first SCN after failover is deleted.
This document is intended for:
· Oracle DBAs and GoldenGate Administrators.
Following assumptions have been verified prior to installing and configuring Oracle GoldenGate.
1. Oracle Dataguard has been installed and is functioning properly.
2. GoldenGate is installed on the primary and standby, but functioning on primary.
3. This is a test system which is acceptable to switchover for testing purposes.
Overview of Standby and GoldenGate
Configuration
The following outlines the intended
setup for this configuration.
Table
1. Server Information (Due to Hardware Constraint this POC was executed on a
Single Server)
|
Source |
Destination |
Hostname |
Oralinux6 |
Oralinux6 |
OS |
3.8.13-55.1.2.el7uek.x86_64 |
3.8.13-55.1.2.el7uek.x86_64 |
GoldenGate
Home |
/u02/gg/source |
/u02/gg/target |
Oracle
Instance |
PBMP |
STBMP |
Oracle
Version |
12.1.0.2.0 |
12.1.0.2.0 |
GoldenGate
version |
11.2.1.0.27 |
11.2.1.0.27 |
Table 2. GoldenGate Processes
Information
|
Primary |
Standby |
Manager
Port |
7809 |
7810 |
DataPump |
ESNDRT01 |
ESNDRT01 |
Failover
Scripts in |
/u02/gg/source/scripts |
/u02/gg/source/scripts |
Trails |
/u02/gg/source/dirdat/s1 |
/u02/gg/source/dirdat/s1 |
GoldenGate
Admin |
GGATE |
GGATE |
User
schemas |
SENDER |
SENDER |
Fig 1. Regular DataGuard and GoldenGate
Operation
Fig 2. Switchedover DataGuard and GoldenGate
Configuration
1.
GoldenGate
Failover Scripts Functionality and
Purpose
Script Name |
Purpose |
Function |
ggdr_sync_01_rsync.ksh |
Sync up |
Backup parameter files, syncup files, but exclude MGR |
ggdr_sync_02_rename_SID.ksh |
Sync up |
Rename SID to the standby’s or other STBY specific parameters |
ggdr_01_if_switchedover_enable_GG.ksh |
GG Failover |
Check STANDBY_BECAME_PRIMARY_SCN in V$DATABASE. Insert it into obey script (script #4) |
ggdr_02_SQL_Eable_GG_param.ksh |
GG Failover |
Login as SYSDBA and issue “ALTER SYSTEM SET enable_goldengate_replication=TRUE SCOPE=BOTH; then kick off the next script |
ggdr_03_failover_GoldenGate.ksh |
GG Failover |
This is a GG wrapper script that takes input from GGSCI script _obey_switchover.ksh |
ggdr_04_obey_switchover.ksh (identical to ggdr_04_obey_switchover_VANILLA.ksh when idle, except at run time when SCN is inserted into it) |
GG Failover |
This is the actual GG script that logs into db in GGSCI, starts manager, forcestops processes, adds new processes, adds exttrails, starts processes from STANDBY_BECAME_PRIMARY_SCN (from script #1). |
ggdr_05_rename_obey_afterGG_switchover.ksh |
GG Failover |
Deletes the obey script ggdr_04_obey_switchover.ksh with a STANDBY_BECAME_PRIMARY_SCN so it cannot run again. Copies ggdr_04_obey_switchover_VANILLA..ksh (which doesn’t have the SCN) to ggdr_04_obey_switchover.ksh so it can be used for future switchover. |
Table 3. Script Names and their Purpose
Fig 3. Sync Up Functionality
Fig 4. GoldenGate Failover Functionality
2.
Implementation
To enable automatic DataGuard and GoldenGate failover capability the following tasks are executed:
1) Alter scripts in the APPENDIX of this document so they reflect your environment ($GG, $SID, $ORACLE_HOME, etc.). Put them in $GG/scripts
2) Schedule the sync up script to run from cron (the failover script is run on-demand). The failover script can be run on-demand. To failover GoldenGate only the first script needs to be run - ggdr_01_if_switchedover_enable_GG.ksh
3) Maintain scripts , especially the obey scripts, to reflect new processes. All this information can be received from dirprm and periodic reports.
Once these tasks are executed, GoldenGate will failover automatically 5 minutes after standby is started as the primary.
Task |
Scripts |
Server |
Runs from |
Sync up |
ggdr_sync_01_rsync.ksh, ggdr_sync_02_rename_SID.ksh |
PRIMARY |
Cron, every 5 mins |
GG Failover |
ggdr_01_if_switchedover_enable_GG.ksh, ggdr_02_SQL_Eable_GG_param.ksh, ggdr_03_failover_GoldenGate.ksh, ggdr_04_obey_switchover.ksh, ggdr_05_rename_obey_afterGG_switchover.ksh |
STANDBY |
On-demand |
Table 5. DataGuard and GoldenGate
Failover Scripts Implementation
3.
Testing of
DataGuard GoldenGate Failover Capability
To
test this functionality simply switchover Dataguard and the sync up scripts
will be kicked off automatically. Don’t forget to disable sync up scripts from
the primary because they are based on the assumption that the primary server is
out of commission and they therefore cannot run, which is not the case in the
failover situation. The progress of the process can be derived from /tmp logs,
ggserr.log or simple email alerts as shown below.
Fig 6. Email Notification for
GoldenGate PRIM => STBY Parameters Sync Up
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Oracle GoldenGate Disaster Recovery
Failover + Step 3 of 5. + Start Goldengate wrapper OBEY script
from new primary GG home. + Proceed to step 4. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Oracle GoldenGate Command Interpreter
for Oracle Version 12.1.2.0.2 19269784 19683584_FBO Linux, x64, 64bit (optimized), Oracle
12c on Oct 19 2014 16:12:14 Operating system character set identified
as UTF-8. Copyright (C) 1995, 2014, Oracle and/or
its affiliates. All rights reserved. GGSCI (oralinux6) 1> Successfully
logged into database. GGSCI (oralinux6) 2> MGR is already
running. GGSCI (oralinux6) 3> ERROR: Invalid
command. GGSCI (oralinux6) 4> GGSCI (oralinux6) 5> Sending FORCESTOP request to EXTRACT
ESNDRT01 ... Request processed. GGSCI (oralinux6) 6> Deleted EXTRACT
ESNDRT01. GGSCI (oralinux6) 7> GGSCI (oralinux6) 8> EXTRACT added. GGSCI (oralinux6) 9> GGSCI (oralinux6) 10> EXTTRAIL added. GGSCI (oralinux6) 11> GGSCI (oralinux6) 12> GGSCI (oralinux6) 13> EXTRACT
altered. GGSCI (oralinux6) 14> GGSCI (oralinux6) 15> Sending START request to MANAGER ... EXTRACT ESNDRT01 starting GGSCI (oralinux6) 16> GGSCI (oralinux6) 17> GGSCI (oralinux6) 18> Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT STOPPED ESNDRT01 00:00:00 00:00:00 GGSCI (oralinux6) 19> GGSCI (oralinux6) 20> GGSCI (oralinux6) 21> bash: EOF:
command not found... +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Oracle GoldenGate Disaster Recovery
Failover + Step 4 of 5. + Wipe out old process. Add extract. Add
exttrail. Start mgr and extract. + Proceed to step 5. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Renaming GG OBEY file
ggdr_04_obey_switchover.ksh + to
ggdr_04_obey_switchover.ksh.already_run + to prevent in-loop processes dropping + and recreation. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Oracle GoldenGate Disaster Recovery
Failover + Step 5 of 5. + GoldenGate has now switched over and
runs on STANDY + Comments to
vladimir.grigorian@xerox.com. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Completed GoldenGate DR failover |
Fig 7. A
Completed GoldenGate Failover (from /tmp Log)
APPENDIX
Exclude_list.txt
mgr.prm /u02/gg/target/dirprm/mgr.prm jagent.prm /u02/gg/target/dirprm/jagent.prm |
ggdr_sync_01_rsync.ksh############################################ # Name: ggdr_sync_01_rsync.ksh # # Author: vladimir.grigorian@xerox.com ########################################### export GGATE=/opt/app/gg2/gg alias gate='clear;cd $GGATE;./ggsci' export
PATH=/opt/app/gg2/gg:/usr/sbin:/usr/lib/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:$PATH export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib:/opt/app/gg2/gg: LOGDIR=/opt/app/gg2/gg/log EMAILFile=/tmp/ggdr.log BOX=$(uname -a | awk '{print $2}') rm -rf /tmp/ggdr.log rm -rf /u02/gg/target/diprm echo -e "Starting GoldenGate
parameters sync between production MA $BOX and Disaster recovery server
tt0sludb07pma \n Refreshing dirdat\n " >> /tmp/ggdr.log rsync -avzhe ssh /u02/gg/source/dirprm
--exclude-from '/u02/gg/source/scripts/exclude_list.txt'
localhost:/u02/gg/target/ >> /tmp/ggdr.log echo -e " \n \n Changing DR GoldenGate
parameter files to reflect the Standby SID
\n \n " >> /tmp/ggdr.log /u02/gg/source/scripts/ggdr_sync_02_rename_SID.ksh
>> /tmp/ggdr.log echo -e " \n \n GoldenGate DR Sync
Completed successfully on $(date) on $BOX \n \n " >> /tmp/ggdr.log ########################################################## ## SENDING EMAIL IF ERRORS ARE IN LOGFILE
### ########################################################## mailx -s "PROD MA GoldenGate Disaster
Recovery Parameters Sync Completed on $BOX at $(date)"
grigorianvlad@gmail.com < $EMAILFile |
ggdr_sync_02_rename_SID.ksh############################################ # Name: ggs_lag.ksh # # Author: vladimir.grigorian@acs-in.com # PURPOSE: TO MONITOR LAG OF GOLDEN GATE # # NOTE: THIS SCRIPT CALLS ggs.ksh # # THIS SCRIPT NOTIFY IF LAG IS MORE THEN 30
MIN # # ONLY FOR FOR EXT AND PMP PROCESS GROUP # ########################################### export GGATE=/opt/app/gg2/gg alias gate='clear;cd $GGATE;./ggsci' export PATH=/opt/app/gg2/gg:/usr/sbin:/usr/lib/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:$PATH export
LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib:/opt/app/gg2/gg: LOGDIR=/opt/app/gg2/gg/log EMAILFile=/tmp/ggdr.log BOX=$(uname -a | awk '{print $2}') filepath="/u02/gg/target/dirprm" searchstring="PBMP" replacestring="STBMP" i=0; for file in $(grep -l -R $searchstring
$filepath) do cp
$file $file.bak sed
-e "s/$searchstring/$replacestring/ig" $file > tempfile.tmp mv
tempfile.tmp $file let
i++;
echo "Modified: " $file done |
ggdr_01_if_switchedover_enable_GG.ksh#!/bin/bash echo -e " \n \n +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n + Oracle GoldenGate Disaster Recovery Failover \n + Created 3/15/2015 vladimir.grigorian@xerox.com 5 \n + Step 1 of 5. Check if standby has failed or switched over, if it has get the first SCN for GoldenGate. \n + Proceed to step 2. \n ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " . /home/oracle/stbmp.env export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 export ORACLE_SID=STBMP sqlplus -s sys/oracle12cDB@STBMP as sysdba <<EOF > get_STANDBY_BECAME_PRIMARY_SCN.log SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF trimspool on select STANDBY_BECAME_PRIMARY_SCN from v\$database; EXIT; EOF result=`cat get_STANDBY_BECAME_PRIMARY_SCN.log` #echo "alter extract ESNDRT01 SCN $result " > /u02/gg/source/scripts/SCN.log echo "alter extract ESNDRT01 SCN $result " >> /u02/gg/source/scripts/ggdr_04_obey_switchover.ksh echo -e " \n start e* \n " >> /u02/gg/source/scripts/ggdr_04_obey_switchover.ksh echo -e " \n info all \n " >> /u02/gg/source/scripts/ggdr_04_obey_switchover.ksh echo -e " \n exit \n " >> /u02/gg/source/scripts/ggdr_04_obey_switchover.ksh echo "The first after failover SCN is written to GoldenGate obey script $result " /u02/gg/source/scripts/ggdr_02_SQL_Eable_GG_param.ksh |
ggdr_02_SQL_Eable_GG_param.ksh#!/bin/sh echo -e " \n \n +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\n + Oracle GoldenGate Disaster Recovery Failover \n + Step 2 of 5. \n + Enable SQL
"enable_goldengate_replication=TRUE". \n + Proceed to step 3. \n
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " export
ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 export ORACLE_SID=STBMP cd /home/oracle . ./stbmp.env sqlplus / as sysdba <<EOF ALTER SYSTEM SET
enable_goldengate_replication=TRUE SCOPE=BOTH; exit EOF #exit /u02/gg/source/scripts/ggdr_03_failover_GoldenGate.ksh
|
ggdr_03_failover_GoldenGate.ksh (identical to ggdr_04_obey_switchover_VANILLA.ksh)#!/bin/bash echo -e " \n \n
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n + Oracle
GoldenGate Disaster Recovery Failover \n + Step 3 of 5. \n + Start Goldengate wrapper OBEY script
from new primary GG home. \n + Proceed to step 4. \n
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " . /home/oracle/stbmp.env /u02/gg/target/ggsci <
/u02/gg/source/scripts/ggdr_04_obey_switchover.ksh EOF echo -e " \n \n
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n + Oracle
GoldenGate Disaster Recovery Failover \n + Step 4 of 5. \n + Wipe out old process. Add extract. Add
exttrail. Start mgr and extract. \n + Proceed to step 5. \n
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " /u02/gg/source/scripts/ggdr_05_rename_obey_afterGG_switchover.ksh echo "Completed GoldenGate DR
failover" exit |
ggdr_04_obey_switchover.kshdblogin userid ggate@stbmp, password oracle start mgr sleep 7 sh sleep 7 stop e*, forcestop! delete ESNDRT01 sh sleep 5 add extract ESNDRT01, tranlog, begin now sh sleep 5 add exttrail /u02/gg/source/dirdat/s1,
extract ESNDRT01 sh sleep 5 |
ggdr_05_rename_obey_afterGG_switchover.kshecho -e " \n \n
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n + Cleaning GG
OBEY file ggdr_04_obey_switchover.ksh \n + for the next failover \n
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " rm -rf /u02/gg/source/scripts/ggdr_04_obey_switchover.ksh cp
/u02/gg/source/scripts/ggdr_04_obey_switchover_VANILLA.ksh
/u02/gg/source/scripts/ggdr_04_obey_switchover.ksh echo -e " \n \n
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n + Oracle
GoldenGate Disaster Recovery Failover \n + Step 5 of 5. \n + GoldenGate has now switched over and
runs on STANDY \n + Comments to vladimir.grigorian@xerox.com. \n
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \n " |
END OF DOCUMENT