Oracle 数据库自动诊断库 ADR(Automatic Diagnostic Repository)简介
作者介绍
ADR 的作用
ADR(Automatic Diagnostic Repository)是一个基于文件的存储库,用于存储数据库诊断数据,如跟踪、转储、警报日志、运行状况监视报告等。它具有跨多个实例和多个产品的统一目录结构。为什么基于文件的原因是显而易见的,因为数据库出现故障的时候数据库可能无法打开,将诊断信息放到文件里面保证了随时可以查阅。数据库、Oracle自动存储管理(Oracle ASM)、侦听器、Oracle Clusterware和其他Oracle产品或组件将所有诊断数据存储在ADR中。每个产品的每个实例都将诊断数据存储在ADR中自己的主目录下。例如,在具有共享存储和 Oracle ASM 的 Oracle Real Application Clusters 环境中,每个数据库实例和每个 Oracle ASM 实例都有一个ADR主目录。
ADR 中的文件
trace:每个服务器和后台进程都可以写入关联的 trace 文件。trace 文件在流程的整个生命周期内定期更新,可以包含有关流程环境、状态、活动和错误的信息。此外,当进程检测到严重错误时,它会将有关该错误的信息写入其 trace 文件。
dumps:dumps 是一种特定类型的 trace 文件。它通常是针对事件(如事件)的诊断数据的一次性输出,而 trace file 往往是诊断数据的连续输出。
core:core 文件包含一个内存转储,采用全二进制的格式。
Alert Log 和 sbtio.log 在后面说明。
ADR 的目录结构
如果设置了环境变量 ORACLE_BASE,则 DIAGNOSTIC_DEST 设置为 ORACLE_BASE 指定的目录。
如果未设置环境变量 ORACLE_BASE,则 DIAGNOSTIC_DEST 设置为 ORACLE_HOME/log。
因此,目前一个常见的 ADR base 例子是 /u01/app/oracle。
在ADR base中,可以有多个 ADR home,其中每个 ADR home 是特定 Oracle 产品或组件的特定实例的所有诊断数据跟踪、转储、警报日志等的根目录。例如,在带有 Oracle ASM 的 Oracle Real Application Clusters 环境中,每个数据库实例、Oracle ASM实例和侦听器都有一个ADR home,结构如下:
一个方便的查询所有组件的家目录的方法是在 adrci 工具中用 show homes 的命令,例如:
ADR base = "/u01/app/grid"adrci> show homes;ADR Homes: diag/asm/+asm/+ASM1diag/crs/rac1/crsdiag/clients/user_grid/host_1874443374_110diag/clients/user_root/host_1874443374_110diag/tnslsnr/rac1/asmnet1lsnr_asmdiag/tnslsnr/rac1/listener_scan1diag/tnslsnr/rac1/listenerdiag/asmcmd/user_grid/rac1diag/asmcmd/user_oracle/rac1diag/kfod/rac1/kfod数据库的 ADR 家目录结构
diag/product_type/product_id/instance_id

每个子目录的作用如下:
· alert:XML格式的 alert log
· cdump:core 文件
· incident: 多个子目录,其中每个子目录都是针对特定事件命名的,每个子目录只包含与该事件相关的转储
· trace: 后台和服务器进程跟踪文件、SQL跟踪文件和文本格式的 alert log
· others: ADR home的其他子目录,存储事件包、运行状况监视报告、警报日志以外的日志(例如DDL日志和调试日志)以及其他信息
select * from v$diag_info INST_ID NAME VALUE CON_ID---------- ---------------------------- ---------------------------------------------------------------- ---------- 1 Diag Enabled TRUE 0 1 ADR Base /u01/app/oracle 0 1 ADR Home /u01/app/oracle/diag/rdbms/orcl/orcl 0 1 Diag Trace /u01/app/oracle/diag/rdbms/orcl/orcl/trace 0 1 Diag Alert /u01/app/oracle/diag/rdbms/orcl/orcl/alert 0 1 Diag Incident /u01/app/oracle/diag/rdbms/orcl/orcl/incident 0 1 Diag Cdump /u01/app/oracle/diag/rdbms/orcl/orcl/cdump 0 1 Health Monitor /u01/app/oracle/diag/rdbms/orcl/orcl/hm 0 1 Default Trace File /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_33364.trc 0 1 Active Problem Count 0 0 1 Active Incident Count 0 0 1 ORACLE_HOME /u01/app/oracle/product/19.0.0/db_1 0Alert Log
对于数据库,Alert Log 包含有关以下内容的消息:
严重错误(事件)
管理操作,如启动或关闭数据库、恢复数据库、创建或删除表空间等。
自动刷新实化视图时出错
其他数据库事件
对于我们前面列举的 orcl 数据库,这两类的 alert log 如下:
$ tail /u01/app/oracle/diag/rdbms/orcl/orcl/alert/log.xml <txt>Closing Resource Manager plan via scheduler window </txt></msg><msg time='2020-12-07T18:00:00.130+08:00' org_id='oracle' comp_id='rdbms' type='UNKNOWN' level='16' host_id='dell.scutech' host_addr='fe80::655e:8314:f61f:9bac%em1' pid='42245' con_uid='1784183627' con_id='5' con_name='SALESPDB'> <txt>Clearing Resource Manager plan via parameter </txt></msg>$ tail /u01/app/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log 2020-12-07T18:00:00.077657+08:00PDB1(3):Closing scheduler windowPDB1(3):Closing Resource Manager plan via scheduler windowPDB1(3):Clearing Resource Manager plan via parameterHRPDB(4):Closing scheduler windowHRPDB(4):Closing Resource Manager plan via scheduler windowHRPDB(4):Clearing Resource Manager plan via parameterSALESPDB(5):Closing scheduler windowSALESPDB(5):Closing Resource Manager plan via scheduler windowSALESPDB(5):Clearing Resource Manager plan via parameter[grid@rac1 crs]$ tail /u01/app/grid/diag/crs/rac1/crs/alert/log.xml <txt>2020-12-03 18:52:29.951 [ORAAGENT(35496)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 35496 </txt></msg><msg time='2020-12-03T18:52:52.787+08:00' org_id='oracle' comp_id='crs' msg_id='clsdadr_process_queue:4927:2974305713' type='UNKNOWN' group='CLSDADR' level='16' host_id='rac1' host_addr='192.168.82.175' pid='36879'> <txt>2020-12-03 18:52:52.679 [ORAAGENT(36879)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 36879 </txt></msg>[grid@rac1 crs]$ tail /u01/app/grid/diag/crs/rac1/crs/trace/alert.log 2020-12-03 18:52:13.218 [OCTSSD(34536)]CRS-2401: The Cluster Time Synchronization Service started on host rac1.2020-12-03 18:52:13.218 [OCTSSD(34536)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac2.2020-12-03 18:52:20.552 [CRSD(34786)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 347862020-12-03 18:52:23.601 [CRSD(34786)]CRS-1012: The OCR service started on node rac1.2020-12-03 18:52:23.671 [CRSD(34786)]CRS-1201: CRSD started on node rac1.2020-12-03 18:52:24.356 [ORAAGENT(35004)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 350042020-12-03 18:52:24.420 [ORAROOTAGENT(35014)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 350142020-12-03 18:52:24.480 [ORAAGENT(35028)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 350282020-12-03 18:52:29.951 [ORAAGENT(35496)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 354962020-12-03 18:52:52.679 [ORAAGENT(36879)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 36879[grid@rac1 crs]$sbtio log
对于我们前面列举的 orcl 数据库,sbtio.log 的例子如下:
$ tail /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/sbtio.logSBT-19152 11/18/20 22:40:43 [LM_ERROR] 7501: Failed to start session with server: Protocol not supportedADRCI 工具
下面是生成一个问题报告的例子。
[grid@rac1 crs]$ adrciADRCI: Release 19.0.0.0.0 - Production on Tue Dec 8 11:26:34 2020Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.ADR base = "/u01/app/grid"adrci> show problemsADR Home = /u01/app/grid/diag/crs/rac1/crs:*************************************************************************PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME -------------------- ----------------------------------------------------------- -------------------- ---------------------------------------- 1 CRS 8503 33 2020-12-03 18:34:59.547000 +08:00 1 row fetchedadrci> set homepath diag/crs/rac1/crsadrci> IPS CREATE PACKAGE PROBLEM 1Created package 1 based on problem id 1, correlation level typicaladrci> ips generate package 1 in /tmpGenerated package 1 in file /tmp/CRS8503_20201208114517_COM_1.zip, mode completeadrci>