Project

General

Profile

Actions

Bug #12697

closed

[RBD] rbd_dev_id_put causes system crash

Added by Zhi Zhang over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Today we randomly meet system crash two times. Both crashes are related to rbd_dev_id_put. We are still not sure how to constantly reproduce it.

One crash:

[73872.509649] WARNING: at /root/rpmbuild/BUILD/kernel-tlinux2-3.10.83/kernel-tlinux2-3.10.83/block/genhd.c:352 unregister_blkdev+0x7c/0xd0()
[73872.771688] aufs au_opts_verify:1602:docker[5658]: dirperm1 breaks the protection by the permission bits on the lower branch
[73872.714870] Modules linked in: rbd libceph tcp_diag inet_diag iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype nf_nat nf_conntrack aufs iptable_filter ip_tables mlx4_core mperf crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd igb ptp pps_core i2c_algo_bit i2c_core ipmi_devintf shpchp ipmi_si ipmi_msghandler binfmt_misc autofs4
[73873.536311] CPU: 5 PID: 14957 Comm: systemd-udevd Not tainted 3.10.83-1-tlinux2-0019.lock_debug.tl2 #1
[73873.690205] Hardware name: LENOVO RD440X/ThinkServer RD440X                      , BIOS 1.01    02/11/2014
[73873.808858]  0000000000000009 ffff88074d615ce8 ffffffff81b211d1 ffff88074d615d20
[73873.934185]  ffffffff810407b1 0000000000000000 00000000ffff8810 ffff881052aed020
[73874.067679]  ffff881052aed200 000000000000101d ffff88074d615d30 ffffffff8104088a
[73874.200312] Call Trace:
[73874.244188]  [<ffffffff81b211d1>] dump_stack+0x19/0x1b
[73874.334915]  [<ffffffff810407b1>] warn_slowpath_common+0x61/0x80
[73874.439930]  [<ffffffff8104088a>] warn_slowpath_null+0x1a/0x20
[73874.537889]  [<ffffffff8145a2cc>] unregister_blkdev+0x7c/0xd0
[73874.619634]  [<ffffffffa01243aa>] rbd_dev_device_release+0x4a/0x80 [rbd]
[73874.734582]  [<ffffffff81547a72>] device_release+0x32/0xa0
[73874.833512]  [<ffffffff814733be>] kobject_release+0x7e/0x1b0
[73874.933003]  [<ffffffff81473278>] kobject_put+0x28/0x60
[73875.020895]  [<ffffffff81547d47>] put_device+0x17/0x20
[73875.112333]  [<ffffffffa012421e>] rbd_release+0x5e/0xa0 [rbd]
[73875.213413]  [<ffffffff811afd9c>] __blkdev_put+0x17c/0x1b0
[73875.313040]  [<ffffffff811b0740>] blkdev_put+0x50/0x160
[73875.404316]  [<ffffffff811b0908>] blkdev_close+0x28/0x30
[73875.497447]  [<ffffffff81178cf1>] __fput+0xe1/0x230
[73875.565605]  [<ffffffff81178e8e>] ____fput+0xe/0x10
[73875.645906]  [<ffffffff81064a87>] task_work_run+0xa7/0xe0
[73875.739910]  [<ffffffff81002a01>] do_notify_resume+0x61/0xa0
[73875.836912]  [<ffffffff81b34f2a>] int_signal+0x12/0x17
[73875.902015] ---[ end trace 66640b23ec7f7257 ]---
[73875.955190] BUG: unable to handle kernel NULL pointer dereference at           (null)
[73876.083015] IP: [<ffffffff81485fe9>] __list_del_entry+0x29/0xd0
[73876.189295] PGD 0
[73876.224324] Oops: 0000 [#1] SMP
[73876.282471] Modules linked in: rbd libceph tcp_diag inet_diag iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype nf_nat nf_conntrack aufs iptable_filter ip_tables mlx4_core mperf crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd igb ptp pps_core i2c_algo_bit i2c_core ipmi_devintf shpchp ipmi_si ipmi_msghandler binfmt_misc autofs4
[73876.878034] CPU: 5 PID: 14957 Comm: systemd-udevd Tainted: G        W    3.10.83-1-tlinux2-0019.lock_debug.tl2 #1
[73877.057781] Hardware name: LENOVO RD440X/ThinkServer RD440X                      , BIOS 1.01    02/11/2014
[73877.227944] task: ffff880834f08000 ti: ffff88074d614000 task.ti: ffff88074d614000
[73877.355785] RIP: 0010:[<ffffffff81485fe9>]  [<ffffffff81485fe9>] __list_del_entry+0x29/0xd0
[73877.504821] RSP: 0018:ffff88074d615d30  EFLAGS: 00010207
[73877.586095] RAX: 0000000000000000 RBX: ffff881052aed000 RCX: dead000000200200
[73877.700056] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881052aed1e0
[73877.819753] RBP: ffff88074d615d30 R08: 0000000000000000 R09: 0000000000000001
[73877.943362] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000052aea800
[73878.036153] R13: ffff881052aed1e0 R14: ffff881052aed200 R15: 000000000000101d
[73878.128912] FS:  00007fe962e86880(0000) GS:ffff88085f000000(0000) knlGS:0000000000000000
[73878.270284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73878.368922] CR2: 0000000000000000 CR3: 00000008174fc000 CR4: 00000000001407e0
[73878.492527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[73878.604607] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[73878.722464] Stack:
[73878.755548]  ffff88074d615d58 ffffffffa0123da5 ffff881052aed1f0 ffff881052aed000
[73878.883315]  ffff8810240d6600 ffff88074d615d78 ffffffffa01243bc ffff881052aed200
[73879.013149]  ffff881052aed1f0 ffff88074d615da0 ffffffff81547a72 ffff881052aed238
[73879.144726] Call Trace:
[73879.187458]  [<ffffffffa0123da5>] rbd_dev_id_put+0x45/0x140 [rbd]
[73879.295591]  [<ffffffffa01243bc>] rbd_dev_device_release+0x5c/0x80 [rbd]
[73879.409559]  [<ffffffff81547a72>] device_release+0x32/0xa0
[73879.506165]  [<ffffffff814733be>] kobject_release+0x7e/0x1b0
[73879.600984]  [<ffffffff81473278>] kobject_put+0x28/0x60
[73879.676484]  [<ffffffff81547d47>] put_device+0x17/0x20
[73879.763404]  [<ffffffffa012421e>] rbd_release+0x5e/0xa0 [rbd]
[73879.861950]  [<ffffffff811afd9c>] __blkdev_put+0x17c/0x1b0
[73879.956648]  [<ffffffff811b0740>] blkdev_put+0x50/0x160
[73880.039773]  [<ffffffff811b0908>] blkdev_close+0x28/0x30
[73880.124839]  [<ffffffff81178cf1>] __fput+0xe1/0x230
[73880.181037]  [<ffffffff81178e8e>] ____fput+0xe/0x10
[73880.241135]  [<ffffffff81064a87>] task_work_run+0xa7/0xe0
[73880.335833]  [<ffffffff81002a01>] do_notify_resume+0x61/0xa0
[73880.434315]  [<ffffffff81b34f2a>] int_signal+0x12/0x17

The other crash:

[87921.131214] WARNING: at /root/rpmbuild/BUILD/kernel-tlinux2-3.10.83/kernel-tlinux2-3.10.83/block/genhd.c:352 unregister_blkdev+0x7c/0xd0()
[87921.131367] Modules linked in: xt_nat veth rbd libceph tcp_diag inet_diag xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype nf_nat nf_conntrack aufs iptable_filter ip_tables mlx4_core mperf crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd igb ptp pps_core i2c_algo_bit i2c_core ipmi_devintf shpchp ipmi_si ipmi_msghandler binfmt_misc autofs4
[87921.131372] CPU: 21 PID: 13770 Comm: systemd-udevd Not tainted 3.10.83-1-tlinux2-0019.lock_debug.tl2 #1
[87921.131376] Hardware name: LENOVO RD440X/ThinkServer RD440X                      , BIOS 1.01    02/11/2014
[87921.131388]  0000000000000009 ffff880a3027bce8 ffffffff81b211d1 ffff880a3027bd20
[87921.131397]  ffffffff810407b1 0000000000000000 0000000000000000 ffff880d9657b020
[87921.131406]  ffff880d9657b200 000000000000101d ffff880a3027bd30 ffffffff8104088a
[87921.131409] Call Trace:
[87921.131487]  [<ffffffff81b211d1>] dump_stack+0x19/0x1b
[87921.131497]  [<ffffffff810407b1>] warn_slowpath_common+0x61/0x80
[87921.131502]  [<ffffffff8104088a>] warn_slowpath_null+0x1a/0x20
[87921.131509]  [<ffffffff8145a2cc>] unregister_blkdev+0x7c/0xd0
[87921.131519]  [<ffffffffa02223aa>] rbd_dev_device_release+0x4a/0x80 [rbd]
[87921.131527]  [<ffffffff81547a72>] device_release+0x32/0xa0
[87921.131539]  [<ffffffff814733be>] kobject_release+0x7e/0x1b0
[87921.131545]  [<ffffffff81473278>] kobject_put+0x28/0x60
[87921.131550]  [<ffffffff81547d47>] put_device+0x17/0x20
[87921.131557]  [<ffffffffa022221e>] rbd_release+0x5e/0xa0 [rbd]
[87921.131564]  [<ffffffff811afd9c>] __blkdev_put+0x17c/0x1b0
[87921.131571]  [<ffffffff811b0740>] blkdev_put+0x50/0x160
[87921.131575]  [<ffffffff811b0908>] blkdev_close+0x28/0x30
[87921.131584]  [<ffffffff81178cf1>] __fput+0xe1/0x230
[87921.131588]  [<ffffffff81178e8e>] ____fput+0xe/0x10
[87921.131598]  [<ffffffff81064a87>] task_work_run+0xa7/0xe0
[87921.131608]  [<ffffffff81002a01>] do_notify_resume+0x61/0xa0
[87921.131615]  [<ffffffff81b34f2a>] int_signal+0x12/0x17
[87921.131619] ---[ end trace 6cfe29f18ab78b3f ]---
[87921.131625] .Assertion failure in rbd_dev_id_put() at line 4404:..   rbd_assert(rbd_id > 0);.
[87921.131647] ------------[ cut here ]------------
[87921.131652] kernel BUG at /root/rpmbuild/BUILD/kernel-tlinux2-3.10.83/kernel-tlinux2-3.10.83/drivers/block/rbd.c:4404!
[87921.131656] invalid opcode: 0000 [#1] SMP
[87921.131715] Modules linked in: xt_nat veth rbd libceph tcp_diag inet_diag xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype nf_nat nf_conntrack aufs iptable_filter ip_tables mlx4_core mperf crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd igb ptp pps_core i2c_algo_bit i2c_core ipmi_devintf shpchp ipmi_si ipmi_msghandler binfmt_misc autofs4
[87921.131719] CPU: 21 PID: 13770 Comm: systemd-udevd Tainted: G        W    3.10.83-1-tlinux2-0019.lock_debug.tl2 #1
[87921.131721] Hardware name: LENOVO RD440X/ThinkServer RD440X                      , BIOS 1.01    02/11/2014
[87921.131723] task: ffff8808accf2540 ti: ffff880a3027a000 task.ti: ffff880a3027a000
[87921.131728] RIP: 0010:[<ffffffffa0221e58>]  [<ffffffffa0221e58>] rbd_dev_id_put+0xf8/0x140 [rbd]
[87921.131730] RSP: 0018:ffff880a3027bd40  EFLAGS: 00010292
[87921.131732] RAX: 000000000000004f RBX: ffff880d9657b000 RCX: 000000002e872e86
[87921.131734] RDX: 000000000000a461 RSI: 0000000000000001 RDI: ffffffff81042b2a
[87921.131736] RBP: ffff880a3027bd58 R08: 0000000000000000 R09: 0000000000000001
[87921.131741] R10: 0000000000000000 R11: 0000000000080000 R12: 0000000000000000
[87921.131744] R13: ffff88034d649e00 R14: ffff880d9657b200 R15: 000000000000101d
[87921.131748] FS:  00007fc0263f3880(0000) GS:ffff88105f000000(0000) knlGS:0000000000000000
[87921.131751] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[87921.131754] CR2: 0000000005ca6000 CR3: 0000000f154a1000 CR4: 00000000001407e0
[87921.131756] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[87921.131758] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[87921.131760] Stack:
[87921.131771]  ffff880d9657b1f0 ffff880d9657b000 ffff88034d649e00 ffff880a3027bd78
[87921.131782]  ffffffffa02223bc ffff880d9657b200 ffff880d9657b1f0 ffff880a3027bda0
[87921.131793]  ffffffff81547a72 ffff880d9657b238 ffffffff82108ac0 ffff88084f5d1600
[87921.131796] Call Trace:
[87921.131804]  [<ffffffffa02223bc>] rbd_dev_device_release+0x5c/0x80 [rbd]
[87921.131810]  [<ffffffff81547a72>] device_release+0x32/0xa0
[87921.131815]  [<ffffffff814733be>] kobject_release+0x7e/0x1b0
[87921.131819]  [<ffffffff81473278>] kobject_put+0x28/0x60
[87921.131824]  [<ffffffff81547d47>] put_device+0x17/0x20
[87921.131830]  [<ffffffffa022221e>] rbd_release+0x5e/0xa0 [rbd]
[87921.131835]  [<ffffffff811afd9c>] __blkdev_put+0x17c/0x1b0
[87921.131838]  [<ffffffff811b0740>] blkdev_put+0x50/0x160
[87921.131842]  [<ffffffff811b0908>] blkdev_close+0x28/0x30
[87921.131848]  [<ffffffff81178cf1>] __fput+0xe1/0x230
[87921.131852]  [<ffffffff81178e8e>] ____fput+0xe/0x10
[87921.131858]  [<ffffffff81064a87>] task_work_run+0xa7/0xe0
[87921.131864]  [<ffffffff81002a01>] do_notify_resume+0x61/0xa0
[87921.131869]  [<ffffffff81b34f2a>] int_signal+0x12/0x17


Files

docker-volume-ceph-rbd.diff (6.71 KB) docker-volume-ceph-rbd.diff Zhi Zhang, 08/24/2015 06:15 AM
Actions

Also available in: Atom PDF