High Availbility

OS & Virtualization

Monday, October 29, 2012

Active Database duplication

Oracle 11g introduced the ability to create duplicate databases directly without the need for a backup. This is known as active database duplication. The process is similar to the backup-based duplication
  • You don't need a backup of the source system, but it does have to be in archive mode
  • The passwords in the password files must match for both servers
  • Both the source and destination database servers require a "tnsnames.ora" entry for the destination database
  • The destination server requires static listener configuration in a "listener.ora" file
    SID_LIST_LISTENER =
      (SID_LIST =
        (SID_DESC =
          (GLOBAL_DBNAME = DB11G.WORLD)
          (ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1)
          (SID_NAME = DB11G)
        )
      )

    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = TCP)(HOST = ol5-112-dup2.localdomain)(PORT = 1521))
        )
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
        )
     
    )


The command are






RMAN > DUPLICATE DATABASE TO DB11G
  FROM ACTIVE DATABASE
  SPFILE
  NOFILENAMECHECK;











To create a standby database from active database files:



 
DUPLICATE TARGET DATABASE
  FOR STANDBY
  FROM ACTIVE DATABASE
  DORECOVER
  SPFILE
    SET "db_unique_name"="foou" COMMENT ''Is a duplicate''
    SET LOG_ARCHIVE_DEST_2="service=inst3 ASYNC REGISTER
     VALID_FOR=(online_logfile,primary_role)"
    SET FAL_CLIENT="inst3" COMMENT "Is standby"
    SET FAL_SERVER="inst1" COMMENT "Is primary"
  NOFILENAMECHECK








Format Oracle OCFS Drive

Offline cfs drive. Need to offline the drive before you can format it

  1. c:\> ocfsutil /c offlinevol /m X:

To format cfs on O drive

  1. create a drive letter O in the windows
  2. C:\oracle\1020\crs\cfs>ocfsformat /m o: /c 1024 /v data /f

Check ocr status

  1. ocrcheck

To do a manual export ocr

  1. ocrconfig -export <backup_file_name>

To import from the backup

  1. stop crs service first
  2. create an empty file data.ocr
  3. ocrconfig -import <OCR export_filename>

Add ocr disk

  1. create an empty file first
  2. ocrconfig -replace mirror x:\cdata\orange_cluster\data.ocr

Remove ocr disk

  1. ocrconfig -replace ocr

Query voting disk

  1. C:\>crsctl query css votedisk
    0.     0    J:\cdata\orange_cluster\votedsk

Add voting disk

  1. create the directory first
  2. c:\> crsctl add css votedisk J:\cdata\orange_cluster\votedsk -force  

Delete voting disk

  1. stop the crs service before deleting voting disk
  2. crsctl delete css votedisk <VOTEDISK_LOCATION> -force

Monday, October 15, 2012

ADRCI: A survival guide for the DBA

adrci: A survival guide for the DBA

Starting with 11gR1, we have a new way to deal with Oracle Errors & Tracefiles: There is now a special command line utility dedicated for that purpose called adrci (Automatic Diagnostic Repository Command Interpreter). This posting is intended to show you the (in my view) essential commands, a DBA ought to know in order to use it. We will look at

  1. Viewing the alert.log
  2. The relation between incident & problem
  3. Creation of Packages & ZIP files to send to Oracle Support
  4. Managing, especially purging tracefiles

I will at first create a problem. Don’t do that with your Production Database! Especially: Never do DML on dictionary tables!

[oracle@uhesse ~]$ sqlplus / as sysdba  
SQL*Plus: Release 11.2.0.2.0 Production on Wed Jun 1 10:25:06 2011  Copyright (c) 1982, 2010, Oracle.  All rights reserved.  
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options  
SQL> select * from v$version;  
BANNER -------------------------------------------------------------------------------- 
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE    11.2.0.2.0    
Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production  
SQL> show parameter diagnostic  
NAME                     TYPE     VALUE 
------------------------------------ ----------- ------------------------------ 
diagnostic_dest              string     /u01/app/oracle  
SQL> grant dba to adam identified by adam;  
Grant succeeded.  SQL> connect adam/adam 
Connected. 
SQL> create table t (n number);  
Table created.  
SQL> select object_id from user_objects;   
OBJECT_ID ----------      75719  
SQL> connect / as sysdba 
Connected. 
SQL> update tab$ set cols=2 where obj#=75719;  
1 row updated.  
SQL> commit;  
Commit complete.  
SQL> alter system flush shared_pool;  
System altered.  
SQL> connect adam/adam Connected. 
SQL> select * from t; 
select * from t               * 
ERROR at line 1: 
ORA-03113: end-of-file on communication channel 
Process ID: 2236 Session ID: 29 Serial number: 9

I flushed the Shared Pool to get the Data Dictionary Cache empty. Else the select may not crash the session as it did. Imagine the user calls me now on the phone. Our first idea as an experienced DBA: We look at the alert.log! Right so. Please notice that we now have two different kinds of the alert.log.

One is present in the conventional text format, per OFA in $ORACLE_BASE/diag/rdbms/name of the db/name of the instance/trace This location is determined by the new initialization parameter DIAGNOSTIC_DEST, while BACKGROUND_DUMP_DEST is deprecated in 11g.

1. Viewing the alert.log

The other one is in XML format placed in $ORACLE_BASE/diag/rdbms/name of the db/name of the instance/alert This version of the alert.log is accessed by adrci:

[oracle@uhesse ~]$ adrci  ADRCI: Release 11.2.0.2.0 - Production on Wed Jun 1 10:20:08 2011  
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.  
ADR base = "/u01/app/oracle" 
adrci> show home 
ADR Homes: diag/tnslsnr/uhesse/listener diag/rdbms/orcl/orcl

Please notice that we have different ADR Homes. In my case only two, because I am not using Grid Infrastructure on this Demo System, else there would be another one. I specify my Database Home first. Then I look at the alert.log. Good news if you are on Windows: Finally, you can tail -f your alert.log also :-)

adrci> set home diag/rdbms/orcl/orcl 
adrci> show alert -tail -f 
2011-06-01 10:16:35.337000 +02:00 db_recovery_file_dest_size of 4032 MB is 0.00% used. 
This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Starting background process CJQ0 CJQ0 started with pid=21, OS id=2204 2011-06-01 10:18:42.668000 +02:00 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x90D891A, qcstda()+702] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2236.trc  (incident=6153): ORA-07445: exception encountered: core dump [qcstda()+702] [SIGSEGV] [ADDR:0x0] [PC:0x90D891A] [Address not mapped to object] [] Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6153/orcl_ora_2236_i6153.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2011-06-01 10:18:47.518000 +02:00 Dumping diagnostic data in directory=[cdmp_20110601101847], requested by (instance=1, osid=2236), summary=[incident=6153]. 2011-06-01 10:18:48.727000 +02:00 Sweep [inc][6153]: completed Sweep [inc2][6153]: completed

2. The relation between Incident & Problem

You see the incident was recorded in the alert.log. And it tells you “Use ADRCI or Support Workbench to package the incident.” We will soon see how to do that. First I’d like to explain the relation between incident and problem: An incident is the concrete occurrence of a problem. In other words: The same problem may have multiple incidents. To show that, I will open another terminal and do again a select against the table t, while still tailing the alert log from the first session.

Second terminal:

[oracle@uhesse ~]$ sqlplus adam/adam  
SQL*Plus: Release 11.2.0.2.0 Production on Wed Jun 1 10:21:52 2011  Copyright (c) 1982, 2010, Oracle.  All rights reserved.  
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options  
SQL> select * from t where n=42; 
select * from t where n=42                          
* ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 2299 Session ID: 36 Serial number: 11

First terminal:

2011-06-01 10:21:31.367000 +02:00 Starting background process SMCO SMCO started with pid=19, OS id=2268 2011-06-01 10:22:08.781000 +02:00 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x90D891A, qcstda()+702] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2299.trc  (incident=6201): ORA-07445: exception encountered: core dump [qcstda()+702] [SIGSEGV] [ADDR:0x0] [PC:0x90D891A] [Address not mapped to object] [] Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2011-06-01 10:22:11.135000 +02:00 Dumping diagnostic data in directory=[cdmp_20110601102211], requested by (instance=1, osid=2299), summary=[incident=6201]. 2011-06-01 10:22:13.370000 +02:00 Sweep [inc][6201]: completed Sweep [inc2][6201]: completed

I have seen the second incident recorded. I exit out of the tail -f with CTRL+C and continue:

adrci> show problem  
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl: 
************************************************************************* 
PROBLEM_ID           PROBLEM_KEY                                                 LAST_INCIDENT        LASTINC_TIME                              -------------------- ----------------------------------------------------------- -------------------- ---------------------------------------- 1                    ORA 7445 [qcstda()+702]                                     6201                 2011-06-01 10:22:08.924000 +02:00        1 rows fetched

So I have one problem with the ID 1 and the last incident occurred at 10:22. Are there more?

adrci> show incident  
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl: 
************************************************************************* 
INCIDENT_ID          PROBLEM_KEY                                                 CREATE_TIME                               
-------------------- ----------------------------------------------------------- ---------------------------------------- 
6153                 ORA 7445 [qcstda()+702]                                     2011-06-01 10:18:42.995000 +02:00        6201                 ORA 7445 [qcstda()+702]                                     2011-06-01 10:22:08.924000 +02:00        2 rows fetched

I want to see some more detail about the incidents:

adrci> show incident -mode detail -p "incident_id=6201"  
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl: 
*************************************************************************  ********************************************************** INCIDENT INFO RECORD 1 **********************************************************    INCIDENT_ID                   6201    STATUS                        ready    CREATE_TIME                   2011-06-01 10:22:08.924000 +02:00    PROBLEM_ID                    1    CLOSE_TIME                       FLOOD_CONTROLLED              none    ERROR_FACILITY                ORA    ERROR_NUMBER                  7445    ERROR_ARG1                    qcstda()+702    ERROR_ARG2                    SIGSEGV    ERROR_ARG3                    ADDR:0x0    ERROR_ARG4                    PC:0x90D891A    ERROR_ARG5                    Address not mapped to object    ERROR_ARG6                       ERROR_ARG7                       ERROR_ARG8                       ERROR_ARG9                       ERROR_ARG10                      ERROR_ARG11                      ERROR_ARG12                      SIGNALLING_COMPONENT          SQL_Parser    SIGNALLING_SUBCOMPONENT          SUSPECT_COMPONENT                SUSPECT_SUBCOMPONENT             ECID                             IMPACTS                       0    PROBLEM_KEY                   ORA 7445 [qcstda()+702]    FIRST_INCIDENT                6153    FIRSTINC_TIME                 2011-06-01 10:18:42.995000 +02:00    LAST_INCIDENT                 6201    LASTINC_TIME                  2011-06-01 10:22:08.924000 +02:00    IMPACT1                       0    IMPACT2                       0    IMPACT3                       0    IMPACT4                       0    KEY_NAME                      ProcId    KEY_VALUE                     25.3    KEY_NAME                      Client ProcId    KEY_VALUE                     oracle@uhesse (TNS V1-V3).2299_140262306875136    KEY_NAME                      PQ    KEY_VALUE                     (0, 1306916528)    KEY_NAME                      SID    KEY_VALUE                     36.11    OWNER_ID                      1    INCIDENT_FILE                 /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2299.trc    OWNER_ID                      1    INCIDENT_FILE                 /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc 1 rows fetched

I want to look at the incident tracefile mentioned above:

adrci> show trace /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc  ----------------------------------------------------------  LEVEL PAYLOAD  ----- ------------------------------------------------------------------------------------------------------------------------------------------------  Dump file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc  Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production  With the Partitioning, OLAP, Data Mining and Real Application Testing options  ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1  System name:      Linux  Node name:        uhesse  Release:  2.6.32-100.28.5.el6.x86_64  Version:  #1 SMP Wed Feb 2 18:40:23 EST 2011  Machine:  x86_64  Instance name: orcl  Redo thread mounted by this instance: 1  Oracle process number: 25  Unix process pid: 2299, image: oracle@uhesse (TNS V1-V3) *** 2011-06-01 10:22:08.929  *** SESSION ID:(36.11) 2011-06-01 10:22:08.929  *** CLIENT ID:() 2011-06-01 10:22:08.929  *** SERVICE NAME:(SYS$USERS) 2011-06-01 10:22:08.929  *** MODULE NAME:(SQL*Plus) 2011-06-01 10:22:08.929  *** ACTION NAME:() 2011-06-01 10:22:08.929 Dump continued from file: /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2299.trc  1>     ***** Error Stack *****  ORA-07445: exception encountered: core dump [qcstda()+702] [SIGSEGV] [ADDR:0x0] [PC:0x90D891A] [Address not mapped to object] []  1<     ***** Error Stack *****  1>     ***** Dump for incident 6201 (ORA 7445 [qcstda()+702]) *****  2>      ***** Beginning of Customized Incident Dump(s) *****  2>      ***** Beginning of Customized Incident Dump(s) *****  Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x90D891A, qcstda()+702] [flags: 0x0, count: 1]  Registers:  %rax: 0x0000000000000000 %rbx: 0x00007f915c77f0e0 %rcx: 0x0000000000000007  %rdx: 0x0000000000000000 %rdi: 0x00007f915c77be98 %rsi: 0x0000000000000000  %rsp: 0x00007fffc65178e0 %rbp: 0x00007fffc6517960  %r8: 0x0000000000000028  %r9: 0x0000000000002000 %r10: 0x00000000093849c0 %r11: 0x0000000000000168  %r12: 0x00007f915c77ade8 %r13: 0x000000008edbb178 %r14: 0x00007f915c777da0  %r15: 0x00007f915c77ae28 %rip: 0x00000000090d891a %efl: 0x0000000000010246  qcstda()+686 (0x90d890a) mov -0x40(%rbp),%rdi  qcstda()+690 (0x90d890e) mov %rdx,0x18(%rbx)  qcstda()+694 (0x90d8912) mov 0x60(%r15),%rsi  qcstda()+698 (0x90d8916) mov %ecx,0x8(%r15)  > qcstda()+702 (0x90d891a) mov %ecx,(%rsi)  qcstda()+704 (0x90d891c) mov 0x78(%rdi),%rdx  qcstda()+708 (0x90d8920) test %rdx,%rdx  qcstda()+711 (0x90d8923) jnz 0x90d8d03  qcstda()+717 (0x90d8929) mov -0x70(%rbp),%rdi *** 2011-06-01 10:22:08.963  dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)  3>       ***** Current SQL Statement for this session (sql_id=8r222qucmawdt) *****           select * from t where n=42  3<       ***** current_sql_statement ***** 3 

3. Creation of Packages & ZIP files to send to Oracle Support

I may not be able to solve the problem myself. Oracle Support will help me with that one. I gather all the required information with a method called “Incident Packaging Service” (IPS):

adrci> ips create package problem 1 correlate all  
Created package 2 based on problem id 1, correlation level all

This did not yet create a ZIP file and is therefore referred to as “Logical Package”. The ZIP file is generated from the Logical Package that was created:

adrci> ips generate package 2 in "/home/oracle"   
Generated package 2 in file /home/oracle/ORA7445qc_20110601112533_COM_1.zip, 
mode complete

4. Managing, especially purging of tracefiles

Now to the management of tracefiles. You may notice that 11g creates lots of tracefiles that need to be purged from time to time. In fact, this is done automatically, but you may want to change the default purge policy:

adrci> show tracefile -rt  
01-JUN-11 10:31:48  diag/rdbms/orcl/orcl/trace/orcl_mmon_2106.trc  
01-JUN-11 09:43:43  diag/rdbms/orcl/orcl/trace/orcl_ckpt_2100.trc  
01-JUN-11 09:22:13  diag/rdbms/orcl/orcl/trace/alert_orcl.log  
01-JUN-11 09:22:11  diag/rdbms/orcl/orcl/trace/orcl_diag_2088.trc  
01-JUN-11 09:22:10  diag/rdbms/orcl/orcl/trace/orcl_ora_2299.trc  
01-JUN-11 09:22:10  diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc  
01-JUN-11 09:18:47  diag/rdbms/orcl/orcl/trace/orcl_ora_2236.trc  
01-JUN-11 09:18:47  diag/rdbms/orcl/orcl/incident/incdir_6153/orcl_ora_2236_i6153.trc 
 
I have already got some tracefiles. How long are they going to stay?
adrci> show control  
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl: 
************************************************************************* 
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME                               -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- 1335663986           720                  8760                 2011-05-27 10:16:46.997118 +02:00                                                                                          1                    2                    80                   1                    2011-05-27 10:16:46.997118 +02:00        1 rows fetched

The ordinary tracefiles will stay for 30 days (720 hours), while files like incident files stay one year (8760 hours) by default. We can change that policy with for example:

adrci> set control (SHORTP_POLICY = 360)  
adrci> set control (LONGP_POLICY = 2190)  
adrci> show control 
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl: 
************************************************************************* 
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- 1335663986           360                  2190                 2011-06-01 11:42:17.208064 +02:00                                                                                          1                    2                    80                   1                    2011-05-27 10:16:46.997118 +02:00 1 rows fetched

Also, we may want to purge tracefiles manually. Following command will manually purge all tracefiles older than 2 days (2880 minutes):

adrci> purge -age 2880 -type trace 
adrci> show tracefile -rt    
01-JUN-11 10:46:54  diag/rdbms/orcl/orcl/trace/orcl_mmon_2106.trc    
01-JUN-11 09:43:43  diag/rdbms/orcl/orcl/trace/orcl_ckpt_2100.trc    01-JUN-11 09:22:13  diag/rdbms/orcl/orcl/trace/alert_orcl.log    01-JUN-11 09:22:11  diag/rdbms/orcl/orcl/trace/orcl_diag_2088.trc    01-JUN-11 09:22:10  diag/rdbms/orcl/orcl/incident/incdir_6201/orcl_ora_2299_i6201.trc    01-JUN-11 09:22:10  diag/rdbms/orcl/orcl/trace/orcl_ora_2299.trc    01-JUN-11 09:18:47  diag/rdbms/orcl/orcl/incident/incdir_6153/orcl_ora_2236_i6153.trc    01-JUN-11 09:18:47  diag/rdbms/orcl/orcl/trace/orcl_ora_2236.trc    01-JUN-11 09:17:19  diag/rdbms/orcl/orcl/trace/orcl_dbrm_2090.trc    01-JUN-11 09:16:44  diag/rdbms/orcl/orcl/trace/orcl_j002_2210.trc    01-JUN-11 09:16:30  diag/rdbms/orcl/orcl/trace/orcl_ora_2187.trc    01-JUN-11 09:16:19  diag/rdbms/orcl/orcl/trace/orcl_mman_2094.trc    01-JUN-11 09:16:16  diag/rdbms/orcl/orcl/trace/orcl_vktm_2082.trc    01-JUN-11 09:16:14  diag/rdbms/orcl/orcl/trace/orcl_ora_2016.trc    30-MAY-11 14:07:02  diag/rdbms/orcl/orcl/trace/orcl_mmon_2093.trc    30-MAY-11 11:15:30  diag/rdbms/orcl/orcl/trace/orcl_ora_3414.trc    30-MAY-11 11:00:01  diag/rdbms/orcl/orcl/trace/orcl_j000_2245.trc    30-MAY-11 10:56:58  diag/rdbms/orcl/orcl/trace/orcl_dbrm_2077.trc    30-MAY-11 10:56:20  diag/rdbms/orcl/orcl/trace/orcl_j002_2201.trc    30-MAY-11 10:56:06  diag/rdbms/orcl/orcl/trace/orcl_ora_2178.trc    30-MAY-11 10:55:58  diag/rdbms/orcl/orcl/trace/orcl_mman_2081.trc    30-MAY-11 10:55:55  diag/rdbms/orcl/orcl/trace/orcl_vktm_2069.trc    30-MAY-11 10:55:53  diag/rdbms/orcl/orcl/trace/orcl_ora_2006.trc

Conclusion: With adrci, we have a new and efficient utility to deal with Oracle Errors – especially for collecting information to send them to Oracle Support. This functionality is called Incident Packaging Service. 11g is generating lots of tracefiles. We can control the purging policy of them with adrci. Finally, we can now tail -f our alert.log from any OS.

Tuesday, October 09, 2012

OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)


In this Document
Goal
Fix
Prepare the disks
1. Size
2. For raw or block device (pre 11.2)
3. For ASM disks (11.2+)
4. For cluster file system
5. Permissions
ADD/REMOVE/REPLACE/MOVE OCR Device
1. To add an OCRMIRROR device when only OCR device is defined:
2. To remove an OCR device
3. To replace or move the location of an OCR device
ADD/DELETE/MOVE Voting Disk
For 10gR2 release
For 11gR1 release
For 11gR2 release
References

Applies to:

Oracle Server - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to 11.2]
Information in this document applies to any platform.

Goal

The goal of this note is to provide steps to add, remove, replace or move an Oracle Cluster Repository (OCR) or voting disk in Oracle Clusterware 10gR2, 11gR1 and 11gR2 environment. It will also provide steps to move OCR / voting and ASM devices from raw device to block device.

This article is intended for DBA and Support Engineers who need to modify, or move OCR and voting disks files, customers who have an existing clustered environment deployed on a storage array and might want to migrate to a new storage array with minimal downtime.

Typically, one would simply cp or dd the files once the new storage has been presented to the hosts. In this case, it is a little more difficult because:

1. The Oracle Clusterware has the OCR and voting disks open and is actively using them. (Both primary and mirrors)
2. There is an API provided for this function (ocrconfig and crsctl), which is the appropriate interface than typical cp and/or dd commands.

It is highly recommended to take a backup of the voting disk, and OCR device before making any changes.

Oracle Cluster Registry (OCR) and Voting Disk Additional clarifications

For Voting disks (never use even number of voting disks):
External redundancy means 1 voting disk
Normal redundancy means 3 voting disks
High redundancy means 5 voting disks
For OCR: 10.2 and 11.1, maximum 2 OCR devices: OCR and OCRMIRROR
11.2+, upto 5 OCR devices can be added.

Fix

Prepare the disks


For OCR or votind disk addition or replacement, new disks need to be prepared. Please refer to Clusteware/Gird Infrastructure installation guide for different platform for the disk requirement and preparation.

1. Size

For 10.1:
OCR device minimum size (each): 100M
Voting disk minimum size (each): 20M
For 10.2:
OCR device minimum size (each): 256M
Voting disk minimum size (each): 256M
For 11.1:
OCR device minimum size (each): 280M
Voting disk minimum size (each): 280M
For 11.2:
OCR device minimum size (each): 300M
Voting disk minimum size (each): 300M

2. For raw or block device (pre 11.2)

Please refer to Clusterware installation guide on different platform for more details.
On windows platform the new raw device link is created via $CRS_HOME\bin\GUIOracleOBJManager.exe, for example:
\\.\VOTEDSK2
\\.\OCR2

3. For ASM disks (11.2+)

On Windows platform, please refer to Document 331796.1 How to setup ASM on Windows
On Linux platform, please refer to Document 580153.1 How To Setup ASM on Linux Using ASMLIB Disks, Raw Devices or Block Devices?
For other platform, please refer to Clusterware/Gird Infrastructure installation guide.

4. For cluster file system

If OCR is on cluster file system, the new OCR or OCRMIRROR file must be touched before add/replace command can be issued. Otherwise PROT-21: Invalid parameter (10.2/11.) or PROT-30 The Oracle Cluster Registry location to be added is not accessible (for 11.2) will occur.
As root user
# touch
/cluster_fs/ocrdisk.dat
# touch /cluster_fs/ocrmirror.dat
# chown
root:oinstall /cluster_fs/ocrdisk.dat  /cluster_fs/ocrmirror.dat
# chmod 640
/cluster_fs/ocrdisk.dat  /cluster_fs/ocrmirror.dat
It is not required to pre-touch voting disk file on cluster file system.
After delete command is issued, the ocr/voting files on the cluster file system require to be removed manually.

5. Permissions

For OCR device:
chown root:oinstall <OCR device>
chmod 640 <OCR device>
For Voting device:
chown <crs/grid>:oinstall <Voting device>
chmod 644 <Voting device>
For ASM disks used for OCR/Voting disk:
chown griduser:asmadmin <asm disks>
chmod 660 <asm disks>

ADD/REMOVE/REPLACE/MOVE OCR Device

Note: You must be logged in as the root user, because root owns the OCR files. "crsctl -replace" command can only be issued when CRS is running, otherwise "PROT-1: Failed to initialize ocrconfig" will occur.

Please ensure CRS is running on ALL cluster nodes during this operation, otherwise the change will not reflect in the CRS down node, CRS will have problem to startup from this down node. "ocrconfig -repair" option will be required to fix the ocr.loc file on the CRS down node.

For 11.2+ with OCR on ASM diskgroup, due to unpublished Bug 8604794 - FAIL TO CHANGE OCR LOCATION TO DG WITH 'OCRCONFIG -REPAIR -REPLACE', "ocrconfig -repair" to change OCR location to different ASM diskgroup does not work currently. Workaround is to manually edit /etc/oracle/ocr.loc or /var/opt/ocr.loc or Windows registry HYKEY_LOCAL_MACHINE\SOFTWARE\Oracle\ocr, point to desired diskgroup.
Make sure there is a recent copy of the OCR file before making any changes:
ocrconfig
-showbackup
If there is not a recent backup copy of the OCR file, an export can be taken for the current OCR file. Use the following command to generate an export of the online OCR file:

In 10.2
# ocrconfig -export
<OCR export_filename> -s online
In 11.1 and 11.2
# ocrconfig
-manualbackup
node1 2008/08/06 06:11:58
/crs/cdata/crs/backup_20080807_003158.ocr
To recover using this file, the following command can be used:
#
ocrconfig -import <OCR export_filename>

From 11.2+, please also refer How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems Document 1062983.1

To see whether OCR is healthy, run an ocrcheck, which should return with like below.
# ocrcheck
Status of
Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) :
497928
Used space (kbytes) : 312
Available space (kbytes) : 497616
ID :
576761409
Device/File Name : /dev/raw/raw1
Device/File integrity check
succeeded
Device/File Name : /dev/raw/raw2
Device/File integrity check
succeeded

Cluster registry integrity check succeeded

For 11.1+,
ocrcheck as root user should also show:
Logical corruption check
succeeded

1. To add an OCRMIRROR device when only OCR device is defined:

To add an OCR mirror device, provide the full path including file name.
10.2 and 11.1:
# ocrconfig -replace
ocrmirror <filename>
eg:
# ocrconfig -replace ocrmirror
/dev/raw/raw2
# ocrconfig -replace ocrmirror /dev/sdc1
# ocrconfig
-replace ocrmirror /cluster_fs/ocrdisk.dat
> ocrconfig -replace ocrmirror
\\.\OCRMIRROR2  - for Windows
11.2+: From 11.2 onwards, upto 4 ocrmirrors can be added
# ocrconfig -add
<filename>
eg:
# ocrconfig -add +OCRVOTE2
# ocrconfig -add
/cluster_fs/ocrdisk.dat

2. To remove an OCR device

To remove an OCR device:
10.2 and 11.1:

# ocrconfig -replace
ocr
11.2+:
# ocrconfig -delete
<filename>
eg:
# ocrconfig -delete +OCRVOTE1

* Once an OCR device is removed, ocrmirror device automatically changes to be OCR device.
* It is not allowed to remove OCR device if only 1 OCR device is defined, the command will return PROT-16.

To remove an OCR mirror device:
10.2 and 11.1:
# ocrconfig -replace
ocrmirror
11.2+:
# ocrconfig -delete
<ocrmirror filename>
eg:
# ocrconfig -delete
+OCRVOTE2
After removal, the old OCR/OCRMIRROR can be deleted if they are on cluster filesystem.

3. To replace or move the location of an OCR device

Note. 1. An ocrmirror must be in place before trying to replace the OCR device. The ocrconfig will fail with PROT-16, if there is no ocrmirror exists.
2. If an OCR device is replaced with a device of a different size, the size of the new device will not be reflected until the clusterware is restarted.

10.2 and 11.1:
To replace the OCR device with <filename>, provide the full path including file name.
# ocrconfig -replace
ocr <filename>
eg:
# ocrconfig -replace ocr /dev/sdd1
$ ocrconfig
-replace ocr \\.\OCR2 - for Windows
To replace the OCR mirror device with <filename>, provide the full path including file name.
# ocrconfig -replace
ocrmirror <filename>
eg:
# ocrconfig -replace ocrmirrow
/dev/raw/raw4
# ocrconfig -replace ocrmirrow \\.\OCRMIRROR2  - for
Windows
11.2:
The command is same for replace either OCR or OCRMIRRORs (at least 2 OCR exist for replace command to work):
# ocrconfig -replace
<current filename> -replacement <new filename>
eg:
# ocrconfig
-replace /cluster_file/ocr.dat -replacement +OCRVOTE
# ocrconfig -replace
+CRS -replacement +OCRVOTE

ADD/DELETE/MOVE Voting Disk

Note: 1. crsctl votedisk commands must be run as root for 10.2 and 11.1, but can be run as grid user for 11.2+
2. For 11.2, when using ASM disks for OCR and voting, the command is same for Windows and Unix platform.
To take a backup of voting disk:
$ dd if=voting_disk_name of=backup_file_name
For Windows:
ocopy \\.\votedsk1 o:\backup\votedsk1.bak

For 10gR2 release

Shutdown the Oracle Clusterware (crsctl stop crs as root) on all nodes before making any modification to the voting disk. Determine the current voting disk location using:
crsctl query css votedisk

1. To add a Voting Disk, provide the full path including file name:
# crsctl add css
votedisk <VOTEDISK_LOCATION> -force
eg:
# crsctl add css votedisk
/dev/raw/raw1 -force
# crsctl add css votedisk /cluster_fs/votedisk.dat
-force
> crsctl add css votedisk \\.\VOTEDSK2 -force   - for
windows
2. To delete a Voting Disk, provide the full path including file name:
# crsctl delete css
votedisk <VOTEDISK_LOCATION> -force
eg:
# crsctl delete css votedisk
/dev/raw/raw1 -force
# crsctl delete css votedisk /cluster_fs/votedisk.dat
-force
> crsctl delete css votedisk \\.\VOTEDSK1 -force   - for
windows
3. To move a Voting Disk, provide the full path including file name, add a device first before deleting the old one:
# crsctl add css
votedisk <NEW_LOCATION> -force
# crsctl delete css votedisk
<OLD_LOCATION> -force
eg:
# crsctl add css votedisk /dev/raw/raw4
-force
# crsctl delete css votedisk /dev/raw/raw1
-force
After modifying the voting disk, start the Oracle Clusterware stack on all nodes
# crsctl start
crs
Verify the voting disk location using
# crsctl query css
votedisk

For 11gR1 release

Starting with 11.1.0.6, the below commands can be performed online (CRS is up and running).

1. To add a Voting Disk, provide the full path including file name:
# crsctl add css
votedisk <VOTEDISK_LOCATION>
eg:
# crsctl add css votedisk
/dev/raw/raw1
# crsctl add css votedisk /cluster_fs/votedisk.dat
>
crsctl add css votedisk \\.\VOTEDSK2        - for windows
2. To delete a Voting Disk, provide the full path including file name:
# crsctl delete css
votedisk <VOTEDISK_LOCATION>
eg:
# crsctl delete css votedisk
/dev/raw/raw1 -force
# crsctl delete css votedisk
/cluster_fs/votedisk.dat
> crsctl delete css votedisk \\.\VOTEDSK1     -
for windows
3. To move a Voting Disk, provide the full path including file name:
# crsctl add css
votedisk <NEW_LOCATION>
# crsctl delete css votedisk
<OLD_LOCATION>
eg:
# crsctl add css votedisk /dev/raw/raw4
#
crsctl delete css votedisk /dev/raw/raw1
Verify the voting disk location:
# crsctl query css
votedisk

For 11gR2 release

From 11.2, votedisk can be stored on either ASM diskgroup or cluster file systems. The following commands can only be executed when Grid Infrastructure is running. As grid user:

1. To add a Voting Disk
a. When votedisk is on cluster file system:
$ crsctl add css
votedisk <cluster_fs/filename>
b. When votedisk is on ASM diskgroup, no add option available.
The number of votedisk is determined by the diskgroup redundancy. If more copy of votedisk is desired, one can move votedisk to a diskgroup with higher redundancy. See step 4.
If a votedisk is removed from a normal or high redundancy diskgroup for abnormal reason, it can be added back using:
alter diskgroup <vote diskgroup
name> add disk '</path/name>' force;

2. To delete a Voting Disk
a. When votedisk is on cluster file system:
$ crsctl delete css
votedisk <cluster_fs/filename>
b. When votedisk is on ASM, no delete option available, one can only replace the existing votedisk group with another ASM diskgroup

3. To move a Voting Disk on cluster file system
$ crsctl add css votedisk
<new_cluster_fs/filename>
$ crsctl delete css votedisk
<old_cluster_fs/filename>
4. To move voting disk on ASM from one diskgroup to another diskgroup due to redundancy change or disk location change
$ crsctl replace
votedisk <+diskgroup>|<vdisk>
Example here is moving from external redundancy +OCRVOTE diskgroup to normal redundancy +CRS diskgroup

1. create new diskgroup +CRS as
desired


2. $ crsctl query css votedisk
## 
STATE    File Universal Id                File Name Disk group
--  -----   
-----------------                --------- ---------
1. ONLINE  
5e391d339a594fc7bf11f726f9375095 (ORCL:ASMDG02) [+OCRVOTE]
Located 1 voting
disk(s).

3. $ crsctl replace votedisk +CRS
Successful addition of
voting disk 941236c324454fc0bfe182bd6ebbcbff.
Successful addition of voting
disk 07d2464674ac4fabbf27f3132d8448b0.
Successful addition of voting disk
9761ccf221524f66bff0766ad5721239.
Successful deletion of voting disk
5e391d339a594fc7bf11f726f9375095.
Successfully replaced voting disk group
with +CRS.
CRS-4266: Voting file(s) successfully replaced


4. $ crsctl query css votedisk
##  STATE    File Universal
Id                File Name Disk group
--  -----   
-----------------                --------- ---------
1. ONLINE  
941236c324454fc0bfe182bd6ebbcbff (ORCL:CRSD1) [CRS]
2. ONLINE  
07d2464674ac4fabbf27f3132d8448b0 (ORCL:CRSD2) [CRS]
3. ONLINE  
9761ccf221524f66bff0766ad5721239 (ORCL:CRSD3) [CRS]
Located 3 voting
disk(s).
5. To move voting disk between ASM diskgroup and cluster file system
a. Move from ASM diskgroup to cluster file system:
$ crsctl query css votedisk
## 
STATE    File Universal Id                File Name Disk group
--  -----   
-----------------                --------- ---------
1. ONLINE  
6e5850d12c7a4f62bf6e693084460fd9 (ORCL:CRSD1) [CRS]
2. ONLINE  
56ab5c385ce34f37bf59580232ea815f (ORCL:CRSD2) [CRS]
3. ONLINE  
4f4446a59eeb4f75bfdfc4be2e3d5f90 (ORCL:CRSD3) [CRS]
Located 3 voting
disk(s).

$ crsctl replace votedisk /rac_shared/oradata/vote.test3
Now
formatting voting disk: /rac_shared/oradata/vote.test3.
CRS-4256: Updating
the profile
Successful addition of voting disk
61c4347805b64fd5bf98bf32ca046d6c.
Successful deletion of voting disk
6e5850d12c7a4f62bf6e693084460fd9.
Successful deletion of voting disk
56ab5c385ce34f37bf59580232ea815f.
Successful deletion of voting disk
4f4446a59eeb4f75bfdfc4be2e3d5f90.
CRS-4256: Updating the profile
CRS-4266:
Voting file(s) successfully replaced

$ crsctl query css votedisk
## 
STATE    File Universal Id                File Name Disk group
--  -----   
-----------------                --------- ---------
1. ONLINE  
61c4347805b64fd5bf98bf32ca046d6c (/rac_shared/oradata/vote.disk) []
Located 1
voting disk(s).
b. Move from cluster file system to ASM diskgroup
$ crsctl query css votedisk
## 
STATE    File Universal Id                File Name Disk group
--  -----   
-----------------                --------- ---------
1. ONLINE  
61c4347805b64fd5bf98bf32ca046d6c (/rac_shared/oradata/vote.disk) []
Located 1
voting disk(s).

$ crsctl replace votedisk +CRS
CRS-4256: Updating the
profile
Successful addition of voting disk
41806377ff804fc1bf1d3f0ec9751ceb.
Successful addition of voting disk
94896394e50d4f8abf753752baaa5d27.
Successful addition of voting disk
8e933621e2264f06bfbb2d23559ba635.
Successful deletion of voting disk
61c4347805b64fd5bf98bf32ca046d6c.
Successfully replaced voting disk group
with +CRS.
CRS-4256: Updating the profile
CRS-4266: Voting file(s)
successfully replaced

[oragrid@auw2k4 crsconfig]$ crsctl query css
votedisk
##  STATE    File Universal Id                File Name Disk
group
--  -----    -----------------                ---------
---------
1. ONLINE   41806377ff804fc1bf1d3f0ec9751ceb (ORCL:CRSD1)
[CRS]
2. ONLINE   94896394e50d4f8abf753752baaa5d27 (ORCL:CRSD2) [CRS]
3.
ONLINE   8e933621e2264f06bfbb2d23559ba635 (ORCL:CRSD3) [CRS]
Located 3 voting
disk(s).
6. To verify:
$ crsctl query css
votedisk



Thursday, October 04, 2012

Oracle 11g Grid Control Command

What works? Commands

 

Set environment variables. (emctl from OMS home has to be used )

$ export ORACLE_HOME=<path to OMS installation>
$ export PATH=$ORACLE_HOME/bin:$PATH
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib


Start OMS

$ emctl start oms


This will start oms as well as the required weblogic processes. [starts opmn, HTTP_SERVER, Node Manager]


Check Status of OMS

$ emctl status oms [-details]


=====================

$ export ORACLE_HOME=/u01/app/Middleware/oms11g
$ export PATH=$ORACLE_HOME/bin:$PATH
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib


gridcontrol:/u01/app/Middleware/oms11g []$ cd bin
gridcontrol:/u01/app/Middleware/oms11g/bin []$ ./emctl status oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
WebTier is Up
Oracle Management Server is Up


=====================


The "details" option is very informative. It will ask for Sysman password.


gridcontrol:/u01/app/Middleware/oms11g/bin []$ ./emctl status oms -details
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Enter Enterprise Manager Root (SYSMAN) Password :
Console Server Host : bnair.localdomain
HTTP Console Port   : 7788
HTTPS Console Port  : 7799
HTTP Upload Port    : 4889
HTTPS Upload Port   : 4900
SLB or virtual hostname: bnair.localdomain
Agent Upload is unlocked.
OMS Console is unlocked.
Active CA ID: 1


Stop OMS

$ emctl stop oms [-all]


Using the "all" option stop node manager and Admin Server as well.


List OMS

$ emctl list oms


This provides the oms name configured in the local ORACLE_HOME. A "*" is displayed next to OMS name if admin server is configured on the same host.


What "also" works?

Though opmnctl cannot be used to start oms, you can still use opmnctl to start | stop | check status of HTTP Server components.


$export ORACLE_INSTANCE=/u01/gc_inst/WebTierIH1
$ cd $ORACLE_INSTANCE/bin
gridcontrol:/u01/gc_inst/WebTierIH1/bin []$ ./opmnctl status
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status
---------------------------------+--------------------+---------+---------
ohs1                             | OHS                |    3962 | Alive


Log file locations


OMS Application logs:  <EM_INSTANCE_HOME>/sysman/log
OPMN logs:                   <Webtier Instance Home>/diagnostics/logs/OPMN/opmn
HTTP Server:                 <Webtier Instance Home>/diagnostics/logs/OHS/ohs1


Directory Structure






The diagram shows the locations related to oms, agents and webtier only.
(Based on more detailed diagram available at oracle support).