Friday, March 31, 2017

Job Failover In Oracle RAC

In RAC, if a node crashes, a jobs running on that node can automatically failover to another node but that job should be mapped to a service.

# Create a  database service j_srvice
        preferred instance : orcl1
        available instance : orcl2


[oracle@rac1 ~]$ srvctl add service -s j_srvice -d orcl -r orcl1 -a orcl2

[oracle@rac1 ~]$ srvctl status service -d orcl
Service j_srvice is not running.


# Create tnsnames.ora entry for service j_srvice


j_service =
      (DESCRIPTION =
         (ADDRESS = (PROTOCOL = TCP)(HOST = rac-cluster.rac1.dba.com)(PORT = 1521))
         (LOAD_BALANCE = YES)
         (CONNECT_DATA =
           (SERVER = DEDICATED)
           (SERVICE_NAME = j_srvice)
         )
      )


# Start the service and check that service is running on host01 (preferred instance)


[oracle@rac1 admin]$ srvctl start service -d orcl
PRCC-1014 : j_srvice was already running

[oracle@rac1 admin]$ srvctl status service -d orcl
Service j_srvice is running on instance(s) orcl1


# Create job class JC which maps to the service j_srvice


SYS@orcl1>BEGIN
sys.dbms_scheduler.create_job_class(
service => 'j_srvice',
job_class_name => '"JC"');
END;
/  2    3    4    5    6 

PL/SQL procedure successfully completed.

SYS@orcl1>SELECT JOB_CLASS_NAME FROM DBA_SCHEDULER_JOB_CLASSES where JOB_CLASS_NAME LIKE 'J%';

JOB_CLASS_NAME
------------------------------
JC


# Create a job and attach it to job class JC which inserts a row in table test every 5 seconds


SYS@orcl1>
BEGIN
DBMS_SCHEDULER.CREATE_JOB (
   job_name             => 'sys.my_job1',
   job_type             => 'PLSQL_BLOCK',
   job_action           => 'begin insert into test values (sysdate);commit;end;',
   start_date           =>  systimestamp,
   repeat_interval      => 'FREQ=SECONDLY;INTERVAL=5',
   end_date             => '31-MAR-17 06.55.00AM US/Pacific',
   job_class            => '"JC"',
   enabled              =>  TRUE);
END;
/
PL/SQL procedure successfully completed.


# Check for the records


SYS@orcl1>select to_char(dte, 'dd-mon-yyyy hh24:mi:ss') from test;

TO_CHAR(DTE,'DD-MON-YYYYHH24:
-----------------------------
31-mar-2017 06:37:02
31-mar-2017 06:37:07
31-mar-2017 06:37:12
31-mar-2017 06:37:17
31-mar-2017 06:37:22
31-mar-2017 06:37:27
......
14 rows selected.


# Fetch the job details


SYS@orcl1>select owner,job_name,state,job_class from dba_scheduler_jobs where job_name='MY_JOB1';

OWNER                          JOB_NAME                       STATE           JOB_CLASS
------------------------------ ------------------------------ --------------- ------------------------------
SYS                            MY_JOB1                        SCHEDULED       JC

SYS@orcl1>select distinct JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name ='MY_JOB1';

JOB_NAME                       STATUS                         INSTANCE_ID
------------------------------ ------------------------------ -----------
MY_JOB1                        SUCCEEDED                                1


 To simulate the crash of host01, kill the database instance on host01


[oracle@rac1 admin]$ ps -ef | grep pmon
oracle    4329     1  0 05:46 ?        00:00:00 asm_pmon_+ASM1
oracle    5161     1  0 05:48 ?        00:00:00 ora_pmon_orcl1
oracle    8518  5388  0 06:45 pts/1    00:00:00 grep pmon

[oracle@rac1 admin]$ kill -9 5161


# Check that job has failed over to host02


[oracle@rac1 admin]$ srvctl status service -d orcl
Service j_srvice is running on instance(s) orcl2

SYS@orcl1>select distinct JOB_NAME, status , INSTANCE_ID from ALL_SCHEDULER_JOB_RUN_DETAILS where job_name ='MY_JOB1';

JOB_NAME                       STATUS                         INSTANCE_ID
------------------------------ ------------------------------ -----------
MY_JOB1                        SUCCEEDED                                2
MY_JOB1                        SUCCEEDED                                1


# Verify that job is still running


SYS@orcl1>select to_char(dte, 'dd-mon-yyyy hh24:mi:ss') from test;

31-mar-2017 06:47:12
31-mar-2017 06:47:17
31-mar-2017 06:47:22
31-mar-2017 06:47:27
31-mar-2017 06:47:32
31-mar-2017 06:47:37
31-mar-2017 06:47:42

128 rows selected.

SYS@orcl1>!date
Fri Mar 31 06:47:43 EDT 2017


So from this example you can clearly understand that a job can failover to another node in case of node failure, but that job should be mapped to a service


No comments:

Post a Comment