Project

General

Profile

Actions

Bug #16502

closed

fio ended up with soft lockup - CPU#1 stuck for 22s!

Added by Shinobu Kinjo almost 8 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
krbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph Client
//DISTRO
Ubuntu 14.04.4 LTS - trusty

//KERNEL
4.2.0-36-generic #42~14.04.1-Ubuntu

//PACKAGE
0.94.5-5
ceph-common
libcephfs1
librados2
librbd1
python-cephfs
python-rados
python-rbd

//TRACE
[607086.738100] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:3:11351]
[607086.738167] Modules linked in: xt_recent nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache bcache rbd libceph libcrc32c ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables kvm_amd 8021q kvm garp mrp stp llc dm_multipath scsi_dh serio_raw amd64_edac_mod edac_core k10temp edac_mce_amd joydev i2c_piix4 input_leds shpchp 8250_fintek mac_hid lp parport igb ixgbe i2c_algo_bit raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 raid1 vxlan ip6_udp_tunnel multipath dca udp_tunnel pata_acpi hid_generic ptp ahci usbhid linear psmouse pata_atiixp hid libahci pps_core mdio dm_mirror dm_region_hash dm_log [last unloaded: libceph]
[607086.738189] CPU: 1 PID: 11351 Comm: kworker/1:3 Tainted: G L 4.2.0-36-generic #42~14.04.1-Ubuntu
[607086.738191] Hardware name: Supermicro H8DGU/H8DGU, BIOS 080016 03/18/2010
[607086.738203] Workqueue: rbd rbd_queue_workfn [rbd]
[607086.738205] task: ffff8800daa48000 ti: ffff880089b20000 task.ti: ffff880089b20000
[607086.738212] RIP: 0010:[<ffffffff810cf61e>] [<ffffffff810cf61e>] console_unlock+0x41e/0x4e0
[607086.738214] RSP: 0018:ffff880089b23a88 EFLAGS: 00000246
[607086.738216] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff
[607086.738218] RDX: ffff880089b239f8 RSI: 0000000000000004 RDI: 0000000000000246
[607086.738220] RBP: ffff880089b23ad8 R08: 0000000000000000 R09: ffff880215344ec0
[607086.738222] R10: 000000000000a000 R11: ffffc900024ffff8 R12: ffffffff81ef97d2
[607086.738224] R13: ffff880200000051 R14: 00000050810cd7dc R15: ffff880089b239e8
[607086.738226] FS: 00007fc9db031740(0000) GS:ffff880217a40000(0000) knlGS:0000000000000000
[607086.738229] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[607086.738231] CR2: 00000000006f8df4 CR3: 0000000213a74000 CR4: 00000000000006e0
[607086.738232] Stack:
[607086.738236] 0000000000000282 0000000000000282 0000000089b23ab8 ffffffff81f3a490
[607086.738240] 0000000000000001 00000000ffffffff 0000000000000000 0000000000000000
[607086.738244] 000000000000006d 0000000000000006 ffff880089b23b58 ffffffff810cfa54
[607086.738245] Call Trace:
[607086.738251] [<ffffffff810cfa54>] vprintk_emit+0x374/0x4f0
[607086.738256] [<ffffffff810cfd29>] vprintk_default+0x29/0x40
[607086.738261] [<ffffffff817b5c9e>] printk+0x46/0x48
[607086.738267] [<ffffffff813d5467>] __dynamic_pr_debug+0x87/0xb0
[607086.738271] [<ffffffff813b6d00>] ? vsnprintf+0x2c0/0x510
[607086.738277] [<ffffffff817bf495>] ? __schedule+0x105/0x910
[607086.738281] [<ffffffff811d0c25>] ? kmem_cache_alloc+0x1e5/0x210
[607086.738290] [<ffffffffc03d9713>] ? rbd_obj_request_create+0x63/0x160 [rbd]
[607086.738299] [<ffffffffc03d9802>] rbd_obj_request_create+0x152/0x160 [rbd]
[607086.738308] [<ffffffffc03db0b4>] rbd_img_request_fill+0x204/0x830 [rbd]
[607086.738318] [<ffffffffc03dd811>] rbd_queue_workfn+0x291/0x3f0 [rbd]
[607086.738324] [<ffffffff81091f1d>] process_one_work+0x14d/0x3f0
[607086.738330] [<ffffffff8109269a>] worker_thread+0x11a/0x470
[607086.738335] [<ffffffff81092580>] ? rescuer_thread+0x310/0x310
[607086.738339] [<ffffffff81097c62>] kthread+0xd2/0xf0
[607086.738344] [<ffffffff81097b90>] ? kthread_create_on_node+0x1c0/0x1c0
[607086.738349] [<ffffffff817c39df>] ret_from_fork+0x3f/0x70
[607086.738353] [<ffffffff81097b90>] ? kthread_create_on_node+0x1c0/0x1c0
[607086.738394] Code: 48 8b 7d c8 83 e0 1f 89 05 48 ae e6 00 c6 07 00 0f 1f 40 00 48 c7 c6 60 77 ef 81 44 89 e7 48 89 d9 e8 07 ea ff ff 4c 89 f7 57 9d <0f> 1f 44 00 00 8b 45 c4 85 c0 0f 84 62 fc ff ff e8 cd 08 6f 00
[607114.830938] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:3:11351]
[607114.831014] Modules linked in: xt_recent nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache bcache rbd libceph libcrc32c ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables kvm_amd 8021q kvm garp mrp stp llc dm_multipath scsi_dh serio_raw amd64_edac_mod edac_core k10temp edac_mce_amd joydev i2c_piix4 input_leds shpchp 8250_fintek mac_hid lp parport igb ixgbe i2c_algo_bit raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 raid1 vxlan ip6_udp_tunnel multipath dca udp_tunnel pata_acpi hid_generic ptp ahci usbhid linear psmouse pata_atiixp hid libahci pps_core mdio dm_mirror dm_region_hash dm_log [last unloaded: libceph]
[607114.831040] CPU: 1 PID: 11351 Comm: kworker/1:3 Tainted: G L 4.2.0-36-generic #42~14.04.1-Ubuntu
[607114.831042] Hardware name: Supermicro H8DGU/H8DGU, BIOS 080016 03/18/2010
[607114.831057] Workqueue: rbd rbd_queue_workfn [rbd]
[607114.831059] task: ffff8800daa48000 ti: ffff880089b20000 task.ti: ffff880089b20000
[607114.831068] RIP: 0010:[<ffffffff810cf61e>] [<ffffffff810cf61e>] console_unlock+0x41e/0x4e0
[607114.831070] RSP: 0018:ffff880089b23a88 EFLAGS: 00000246
[607114.831072] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff
[607114.831074] RDX: ffff880089b239f8 RSI: 0000000000000004 RDI: 0000000000000246
[607114.831076] RBP: ffff880089b23ad8 R08: 0000000000000000 R09: ffff880215344ec0
[607114.831078] R10: 000000000000a000 R11: ffffc900024ffff8 R12: ffffffff81ef97c0
[607114.831080] R13: ffff88020000003f R14: 0000003e810cd7dc R15: ffff880089b239e8
[607114.831083] FS: 00007fc9db031740(0000) GS:ffff880217a40000(0000) knlGS:0000000000000000
[607114.831085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[607114.831087] CR2: 00000000006f8df4 CR3: 0000000213a74000 CR4: 00000000000006e0
[607114.831088] Stack:
[607114.831093] 0000000000000282 0000000000000282 0000000089b23ab8 ffffffff81f3a490
[607114.831096] 0000000000000001 00000000ffffffff 0000000000000000 0000000000000000
[607114.831100] 000000000000006d 0000000000000006 ffff880089b23b58 ffffffff810cfa54
[607114.831101] Call Trace:
[607114.831109] [<ffffffff810cfa54>] vprintk_emit+0x374/0x4f0
[607114.831114] [<ffffffff810cfd29>] vprintk_default+0x29/0x40
[607114.831119] [<ffffffff817b5c9e>] printk+0x46/0x48
[607114.831125] [<ffffffff813d5467>] __dynamic_pr_debug+0x87/0xb0
[607114.831130] [<ffffffff813b6d00>] ? vsnprintf+0x2c0/0x510
[607114.831136] [<ffffffff817bf495>] ? __schedule+0x105/0x910
[607114.831141] [<ffffffff811d0c25>] ? kmem_cache_alloc+0x1e5/0x210
[607114.831150] [<ffffffffc03d9713>] ? rbd_obj_request_create+0x63/0x160 [rbd]
[607114.831159] [<ffffffffc03d9802>] rbd_obj_request_create+0x152/0x160 [rbd]
[607114.831168] [<ffffffffc03db0b4>] rbd_img_request_fill+0x204/0x830 [rbd]
[607114.831178] [<ffffffffc03dd811>] rbd_queue_workfn+0x291/0x3f0 [rbd]
[607114.831185] [<ffffffff81091f1d>] process_one_work+0x14d/0x3f0
[607114.831191] [<ffffffff8109269a>] worker_thread+0x11a/0x470
[607114.831196] [<ffffffff81092580>] ? rescuer_thread+0x310/0x310
[607114.831201] [<ffffffff81097c62>] kthread+0xd2/0xf0
[607114.831205] [<ffffffff81097b90>] ? kthread_create_on_node+0x1c0/0x1c0
[607114.831210] [<ffffffff817c39df>] ret_from_fork+0x3f/0x70
[607114.831215] [<ffffffff81097b90>] ? kthread_create_on_node+0x1c0/0x1c0
[607114.831255] Code: 48 8b 7d c8 83 e0 1f 89 05 48 ae e6 00 c6 07 00 0f 1f 40 00 48 c7 c6 60 77 ef 81 44 89 e7 48 89 d9 e8 07 ea ff ff 4c 89 f7 57 9d <0f> 1f 44 00 00 8b 45 c4 85 c0 0f 84 62 fc ff ff e8 cd 08 6f 00

FIO load should not be too much

// FIO - Recipe
[global]
ioengine=libaio
filename=/dev/rbd1
runtime=60
buffered=1
iodepth=32
size=16g
fsync_on_close=1
exec_prerun=clear_cache.sh

[rand_w_4k]
stonewall
rw=randwrite
bs=4k

// clear_cache.sh
#! /bin/bash

free -m
sync
echo 3 > /proc/sys/vm/drop_caches
free -m

Actions #1

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to rbd
  • Category deleted (26)
  • Status changed from New to Need More Info

AFAIK this could happen for a bunch of reasons, including an unresponsive cluster.

Actions #2

Updated by Jason Dillaman over 6 years ago

  • Status changed from Need More Info to Closed

Not enough information available.

Actions

Also available in: Atom PDF