Redhat6主机Oracle 11g数据库多次异常重启问题。
1.数据库主机异常重启前均有FC链路及multipath多路径状态异常。
... ... Jan 1 00:20:52 zs2 kernel: rport-0:0-6: blocked FC remote port time out: removing rport Jan 1 00:20:52 zs2 kernel: rport-2:0-6: blocked FC remote port time out: removing rport Jan 1 00:20:52 zs2 kernel: EXT4-fs (sdak1): mounted filesystem with ordered data mode. Opts: Jan 1 00:20:52 zs2 kernel: EXT4-fs (dm-20): mounted filesystem with ordered data mode. Opts: ... ... Jan 1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: add path (uevent) Jan 1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: failed to get path uid Jan 1 00:21:50 zs2 multipathd: uevent trigger error Jan 1 00:21:50 zs2 kernel: [Oracle ACFS] FCB hash size 2000000 Jan 1 00:21:50 zs2 kernel: [Oracle ACFS] buffer cache size 36184MB (5514771 buckets) Jan 1 00:21:50 zs2 kernel: [Oracle ACFS] DLM hash size 2000000 Jan 1 00:21:50 zs2 kernel: ACFSK-0037: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0 ... Jan 1 00:21:50 zs2 multipathd: ofsctl: add path (uevent) Jan 1 00:21:50 zs2 multipathd: ofsctl: failed to get path uid Jan 1 00:21:50 zs2 multipathd: uevent trigger error Jan 1 00:21:50 zs2 kernel: OKSK-00010: Persistent OKS log opened at /u01/app/11.2.0/grid/log/zs2/acfs/kernel/acfs.log.0. Jan 1 00:37:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds. Jan 1 00:37:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 1 00:37:02 zs2 kernel: multipathd D 0000000000000013 0 10201 1 0x00000000 Jan 1 00:37:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffffffff811666bc Jan 1 00:37:02 zs2 kernel: ffff8800375103f0 ffff88204e22ae68 ffff88804f79b8e8 ffffffff8107c93b Jan 1 00:37:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638 Jan 1 00:37:02 zs2 kernel: Call Trace: Jan 1 00:37:02 zs2 kernel: [<ffffffff811666bc>] ? transfer_objects+0x5c/0x80 Jan 1 00:37:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20 ... ... Jan 1 00:37:02 zs2 kernel: [<ffffffff811955f1>] sys_ioctl+0x81/0xa0 Jan 1 00:37:02 zs2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jan 1 00:38:49 zs2 multipathd: sdbh: couln't get asymmetric access state Jan 1 00:38:49 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...] Jan 1 00:39:29 zs2 multipathd: sdao: couln't get asymmetric access state Jan 1 00:41:40 zs2 multipathd: mpath06: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...] Jan 1 00:41:40 zs2 multipathd: sdau: couldn't get target port group Jan 1 00:41:40 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...] Jan 1 00:41:40 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 2 1 round-robin ...] Jan 1 00:41:40 zs2 multipathd: sdbm: couldn't get target port group Jan 1 00:41:41 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...] Jan 1 00:45:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds. Jan 1 00:45:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 1 00:45:02 zs2 kernel: multipathd D 0000000000000004 0 10201 1 0x00000000 Jan 1 00:45:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffff88000001bfc0 Jan 1 00:45:02 zs2 kernel: ffff8800375a7018 ffff884051116038 ffff88804f79b8e8 ffffffff8107c93b Jan 1 00:45:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638 Jan 1 00:45:02 zs2 kernel: Call Trace: Jan 1 00:45:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20 ... ... Jan 1 00:54:15 zs2 kernel: imklog 5.8.10, log source = /proc/kmsg started. Jan 1 00:54:15 zs2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="10768" x-info="http://www.rsyslog.com"] start Jan 1 00:54:15 zs2 kernel: Initializing cgroup subsys cpuset Jan 1 00:54:15 zs2 kernel: Initializing cgroup subsys cpu Jan 1 00:54:15 zs2 kernel: Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) ... ... Jan 1 00:54:15 zs2 kernel: Command line: ro root=/dev/mapper/VolGroup-LogVol02 rd_NO_LUKS LANG=en_US.UTF-8 ... ... Jan 1 00:54:15 zs2 kernel: KERNEL supported cpus: Jan 1 00:54:15 zs2 kernel: Intel GenuineIntel Jan 1 00:54:15 zs2 kernel: AMD AuthenticAMD Jan 1 00:54:15 zs2 kernel: Centaur CentaurHauls Jan 1 00:54:15 zs2 kernel: BIOS-provided physical RAM map: Jan 1 00:54:15 zs2 kernel: BIOS-e820: 0000000000000000 - 000000000009c000 (usable) ... ...
2.FC-SW端显示zs2主机FC链路(14号端口)有大量enc_out报错异常。
Index Port Address Media Speed State Proto ================================================== 14 14 010e00 id N8 Online FC F-Port 10:00:00:10:9b:1a:ad:d8 # zs2_port1 ... ... 20 20 011400 id N8 Online FC F-Port 50:0b:34:20:0f:5b:e0:02 21 21 011500 id N8 Online FC F-Port 50:0b:34:20:0f:5b:e6:02 SNS2124:admin> porterrshow frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err ... ... 14: 1.8g 4.2g 0 0 0 0 0 0 5.9m 0 0 0 28 0 0 0 0 0 ... ... 20: 2.3g 1.5g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21: 4.1g 712.3m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.RedHat Bugzilla官网确认multipathd报错异常为系统bug。
zs2主机FC1端口光纤质量问题导致主机端检测到链路超时,触发主机系统multipathbug,导致multipathFC多路径识别异常超时及RedHat读写磁盘超时,最终导致数据库多次异常重启。
建议根据RedHat官方指导意见进行升级multipath至75版本。
Multipath版本升级建议由客户或客户协调Red Hat工程师执行。
multipathd: failed to get path uid,couln't get asymmetric access state,couldn't get target port group