5931  
   0
Redhat6主机系统Oracle11g数据库异常重启问题
作者:刘渊于 2018年11月12日 发布在分类 / 经验案例 / 经验案例 下,并于 2018年11月12日 编辑


一、组网图


二、问题描述

Redhat6主机Oracle 11g数据库多次异常重启问题。


三、过程分析

1.数据库主机异常重启前均有FC链路及multipath多路径状态异常。

... ...
Jan  1 00:20:52 zs2 kernel: rport-0:0-6: blocked FC remote port time out: removing rport
Jan  1 00:20:52 zs2 kernel: rport-2:0-6: blocked FC remote port time out: removing rport
Jan  1 00:20:52 zs2 kernel: EXT4-fs (sdak1): mounted filesystem with ordered data mode. Opts: 
Jan  1 00:20:52 zs2 kernel: EXT4-fs (dm-20): mounted filesystem with ordered data mode. Opts: 
... ... 
Jan  1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: add path (uevent)
Jan  1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: failed to get path uid
Jan  1 00:21:50 zs2 multipathd: uevent trigger error
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] FCB hash size 2000000
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] buffer cache size 36184MB (5514771 buckets)
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] DLM hash size 2000000
Jan  1 00:21:50 zs2 kernel: ACFSK-0037: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0 ...
Jan  1 00:21:50 zs2 multipathd: ofsctl: add path (uevent)
Jan  1 00:21:50 zs2 multipathd: ofsctl: failed to get path uid
Jan  1 00:21:50 zs2 multipathd: uevent trigger error
Jan  1 00:21:50 zs2 kernel: OKSK-00010: Persistent OKS log opened at /u01/app/11.2.0/grid/log/zs2/acfs/kernel/acfs.log.0.
Jan  1 00:37:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds.
Jan  1 00:37:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 00:37:02 zs2 kernel: multipathd    D 0000000000000013     0 10201      1 0x00000000
Jan  1 00:37:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffffffff811666bc
Jan  1 00:37:02 zs2 kernel: ffff8800375103f0 ffff88204e22ae68 ffff88804f79b8e8 ffffffff8107c93b
Jan  1 00:37:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638
Jan  1 00:37:02 zs2 kernel: Call Trace:
Jan  1 00:37:02 zs2 kernel: [<ffffffff811666bc>] ? transfer_objects+0x5c/0x80
Jan  1 00:37:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20
... ... 
Jan  1 00:37:02 zs2 kernel: [<ffffffff811955f1>] sys_ioctl+0x81/0xa0
Jan  1 00:37:02 zs2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Jan  1 00:38:49 zs2 multipathd: sdbh: couln't get asymmetric access state
Jan  1 00:38:49 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:39:29 zs2 multipathd: sdao: couln't get asymmetric access state
Jan  1 00:41:40 zs2 multipathd: mpath06: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: sdau: couldn't get target port group
Jan  1 00:41:40 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 2 1 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: sdbm: couldn't get target port group
Jan  1 00:41:41 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:45:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds.
Jan  1 00:45:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 00:45:02 zs2 kernel: multipathd    D 0000000000000004     0 10201      1 0x00000000
Jan  1 00:45:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffff88000001bfc0
Jan  1 00:45:02 zs2 kernel: ffff8800375a7018 ffff884051116038 ffff88804f79b8e8 ffffffff8107c93b
Jan  1 00:45:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638
Jan  1 00:45:02 zs2 kernel: Call Trace:
Jan  1 00:45:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20
... ...
Jan  1 00:54:15 zs2 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Jan  1 00:54:15 zs2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="10768" x-info="http://www.rsyslog.com"] start
Jan  1 00:54:15 zs2 kernel: Initializing cgroup subsys cpuset
Jan  1 00:54:15 zs2 kernel: Initializing cgroup subsys cpu
Jan  1 00:54:15 zs2 kernel: Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) ... ...
Jan  1 00:54:15 zs2 kernel: Command line: ro root=/dev/mapper/VolGroup-LogVol02 rd_NO_LUKS LANG=en_US.UTF-8 ... ...
Jan  1 00:54:15 zs2 kernel: KERNEL supported cpus:
Jan  1 00:54:15 zs2 kernel:  Intel GenuineIntel
Jan  1 00:54:15 zs2 kernel:  AMD AuthenticAMD
Jan  1 00:54:15 zs2 kernel:  Centaur CentaurHauls
Jan  1 00:54:15 zs2 kernel: BIOS-provided physical RAM map:
Jan  1 00:54:15 zs2 kernel: BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
... ...


2.FC-SW端显示zs2主机FC链路(14号端口)有大量enc_out报错异常。

Index Port Address Media Speed       State   Proto                                                                                                                                                                                                                            
==================================================                                   
  14  14   010e00   id    N8       Online      FC  F-Port  10:00:00:10:9b:1a:ad:d8 # zs2_port1
... ...   
  20  20   011400   id    N8       Online      FC  F-Port  50:0b:34:20:0f:5b:e0:02                                                                                                                                                                                            
  21  21   011500   id    N8       Online      FC  F-Port  50:0b:34:20:0f:5b:e6:02
  
SNS2124:admin> porterrshow                                                                                                                                                                                                                                                    
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy    c3timeout    pcs                                                                                                                                             
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig                   tx    rx     err                                                                                                                                             
... ...                                                                                                                                       
 14:    1.8g   4.2g   0      0      0      0      0      0      5.9m   0      0      0     28      0      0      0      0      0                                                                                                                                                                                                                                                                                          
... ...                                                                                                                                       
 20:    2.3g   1.5g   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
 21:    4.1g 712.3m   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   


3.RedHat Bugzilla官网确认multipathd报错异常为系统bug。






四、问题定位

zs2主机FC1端口光纤质量问题导致主机端检测到链路超时,触发主机系统multipathbug,导致multipathFC多路径识别异常超时及RedHat读写磁盘超时,最终导致数据库多次异常重启。


五、解决方法

建议根据RedHat官方指导意见进行升级multipath至75版本。


六、风险提示

Multipath版本升级建议由客户或客户协调Red Hat工程师执行。


七、关键字

multipathd: failed to get path uid,couln't get asymmetric access state,couldn't get target port group


 知识评论当前评论数0

 推荐知识


 访问权限

创建人 刘渊
工作小组 宏杉成员
文档编辑权限 创建者私有
文档阅读权限 来自分类
分类阅读权限 所有人
分类编辑权限 技术服务部  : 机构     渠道合作伙伴  : 机构     系统管理员 : 人员     
分类审核权限 审核小组  : 岗位    
分类预览权限 审核小组 : 岗位    
分类下载权限 技术服务部  : 机构    
 历史版本

修改日期 修改人 备注
2018-11-12 10:33:43[当前版本] 刘渊 CREAT

 附件

附件类型

PNGPNG

 目录
    宏杉案例知识库-V4.0.1