Project

General

Profile

Bug #6809

3.11 kernel panic: Workqueue: ceph-msgr con_work

Added by Loïc Dachary over 10 years ago. Updated about 9 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

The ceph cluster is very unstable ( hosts going up and down frequently ) and has high latency ( > 10ms ) on more than half of the nodes. The panic occured while writing to a rbd mounted file system. The OS is ubuntu 12.04.3. I can give root access to the machine for diagnostic purposes. I can also provide more information if asked.

Nov 20 15:47:43 bm0014 kernel: [98073.067174] CPU: 4 PID: 9898 Comm: kworker/4:0 Tainted: GF          O 3.11.0-13-generic #20~precise2-Ubuntu
Nov 20 15:47:43 bm0014 kernel: [98073.067219] Hardware name:                  /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Nov 20 15:47:43 bm0014 kernel: [98073.067270] Workqueue: ceph-msgr con_work [libceph]
Nov 20 15:47:43 bm0014 kernel: [98073.067296] task: ffff880785c0ddc0 ti: ffff8807830b0000 task.ti: ffff8807830b0000
Nov 20 15:47:43 bm0014 kernel: [98073.067331] RIP: 0010:[<ffffffff8161c895>]  [<ffffffff8161c895>] kernel_sendpage+0x5/0x30
Nov 20 15:47:43 bm0014 kernel: [98073.067373] RSP: 0018:ffff8807830b1d10  EFLAGS: 00010207
Nov 20 15:47:43 bm0014 kernel: [98073.067399] RAX: ffffea00076bdf80 RBX: ffff88079f715878 RCX: 00000000000001c7
Nov 20 15:47:43 bm0014 kernel: [98073.068694] RDX: 0000000000000e39 RSI: ffffea00076bdf80 RDI: 0000000000000000
Nov 20 15:47:43 bm0014 kernel: [98073.069980] RBP: ffff8807830b1d78 R08: 00000000000040c0 R09: ffffea001cbfdc00
Nov 20 15:47:43 bm0014 kernel: [98073.071265] R10: ffffffff8161c763 R11: 0000000000000000 R12: ffff880421833830
Nov 20 15:47:43 bm0014 kernel: [98073.072551] R13: ffffea00076bdf80 R14: 0000000000000000 R15: ffff88079f7158f0
Nov 20 15:47:43 bm0014 kernel: [98073.073844] FS:  0000000000000000(0000) GS:ffff88081f300000(0000) knlGS:0000000000000000
Nov 20 15:47:43 bm0014 kernel: [98073.075167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 15:47:43 bm0014 kernel: [98073.076347] CR2: 0000000000000028 CR3: 00000007f0360000 CR4: 00000000001407e0
Nov 20 15:47:43 bm0014 kernel: [98073.077713] Stack:
Nov 20 15:47:43 bm0014 kernel: [98073.078754]  ffffffffa06afe08 ffff880421833830 01000000a1f6fbae ffff8807830b1d58
Nov 20 15:47:43 bm0014 kernel: [98073.079373]  0000000000000e39 00000000000001c7 00ff88079f715760 ffff88079f7156a8
Nov 20 15:47:43 bm0014 kernel: [98073.080019]  ffff880421833830 0000000000001000 ffff8804218339d8 0000000000000000
Nov 20 15:47:43 bm0014 kernel: [98073.080702] Call Trace:
Nov 20 15:47:43 bm0014 kernel: [98073.081324]  [<ffffffffa06afe08>] ? write_partial_message_data+0xe8/0x210 [libceph]
Nov 20 15:47:43 bm0014 kernel: [98073.081962]  [<ffffffffa06b2425>] try_write+0x105/0x420 [libceph]
Nov 20 15:47:43 bm0014 kernel: [98073.082633]  [<ffffffffa06b3eeb>] con_work+0xeb/0x3d0 [libceph]
Nov 20 15:47:43 bm0014 kernel: [98073.083263]  [<ffffffff810810b0>] process_one_work+0x170/0x4a0
Nov 20 15:47:43 bm0014 kernel: [98073.083892]  [<ffffffff81082171>] worker_thread+0x121/0x390
Nov 20 15:47:43 bm0014 kernel: [98073.084524]  [<ffffffff81082050>] ? manage_workers.isra.20+0x170/0x170
Nov 20 15:47:43 bm0014 kernel: [98073.085229]  [<ffffffff81089030>] kthread+0xc0/0xd0
Nov 20 15:47:43 bm0014 kernel: [98073.085928]  [<ffffffff81088f70>] ? flush_kthread_worker+0xb0/0xb0
Nov 20 15:47:43 bm0014 kernel: [98073.086580]  [<ffffffff8174ec6c>] ret_from_fork+0x7c/0xb0
Nov 20 15:47:43 bm0014 kernel: [98073.087216]  [<ffffffff81088f70>] ? flush_kthread_worker+0xb0/0xb0
Nov 20 15:47:43 bm0014 kernel: [98073.088212] Code: 00 00 00 48 89 e5 48 8b 47 28 48 8b 80 a8 00 00 00 48 85 c0 74 04 ff d0 5d c3 48 c7 c0 ea ff ff ff 5d c3 0f 1f 00 0f 1f 44 00 00 <48> 8b 47 28 55 48 89 e5 48 8b 80 a0 00 00 00 48 85 c0 74 07 ff
Nov 20 15:47:43 bm0014 kernel: [98073.091364] RIP  [<ffffffff8161c895>] kernel_sendpage+0x5/0x30
Nov 20 15:47:43 bm0014 kernel: [98073.092899]  RSP <ffff8807830b1d10>
Nov 20 15:47:43 bm0014 kernel: [98073.094429] CR2: 0000000000000028

syslog.gz - full syslog (236 KB) Loïc Dachary, 11/20/2013 07:09 AM


Related issues

Related to Linux kernel client - Bug #8087: libceph: null deref in osd_reset -> __reset_osd -> __remove_osd Resolved 04/12/2014

History

#1 Updated by Michael Kidd over 10 years ago

I too am having similar issues to this on kernel 3.11.(7|9|11) with Fedora 19.

I've tested it across Ceph versions
0.67.4
0.72.1
0.73 (just released to the 'testing' channel)

Nov 30 20:18:28 server kernel: [174396.174344] kworker/7:3: page allocation failure: order:4, mode:0x10c050
Nov 30 20:18:28 server kernel: [174396.174349] CPU: 7 PID: 14470 Comm: kworker/7:3 Tainted: PF O 3.11.9-200.fc19.x86_64 #1
Nov 30 20:18:28 server kernel: [174396.174351] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2103 11/06/2013
Nov 30 20:18:28 server kernel: [174396.174359] Workqueue: ceph-msgr con_work [libceph]
Nov 30 20:18:28 server kernel: [174396.174361] 0000000000000000 ffff880102223998 ffffffff8164764b 000000000010c050
Nov 30 20:18:28 server kernel: [174396.174364] ffff880102223a20 ffffffff81141870 0000004000000000 ffff88083efe7b38
Nov 30 20:18:28 server kernel: [174396.174366] ffffffff81143e56 000000000000000f 00000004810c70a3 0010c05000000010
Nov 30 20:18:28 server kernel: [174396.174368] Call Trace:
Nov 30 20:18:28 server kernel: [174396.174374] [<ffffffff8164764b>] dump_stack+0x45/0x56
Nov 30 20:18:28 server kernel: [174396.174377] [<ffffffff81141870>] warn_alloc_failed+0xf0/0x160
Nov 30 20:18:28 server kernel: [174396.174380] [<ffffffff81143e56>] ? drain_local_pages+0x16/0x20
Nov 30 20:18:28 server kernel: [174396.174382] [<ffffffff81145bd7>] __alloc_pages_nodemask+0x827/0xa30
Nov 30 20:18:28 server kernel: [174396.174386] [<ffffffff81183149>] alloc_pages_current+0xa9/0x170
Nov 30 20:18:28 server kernel: [174396.174388] [<ffffffff811406d1>] __get_free_pages+0x21/0x70
Nov 30 20:18:28 server kernel: [174396.174390] [<ffffffff8118cc8e>] kmalloc_order_trace+0x2e/0xa0
Nov 30 20:18:28 server kernel: [174396.174392] [<ffffffff8118f0aa>] __kmalloc+0x1ca/0x250
Nov 30 20:18:28 server kernel: [174396.174401] [<ffffffffa10c837d>] dispatch+0x11ad/0x1750 [ceph]
Nov 30 20:18:28 server kernel: [174396.174403] [<ffffffff8153458a>] ? kernel_recvmsg+0x3a/0x50
Nov 30 20:18:28 server kernel: [174396.174408] [<ffffffffa1065d57>] ? read_partial.isra.22+0x57/0x80 [libceph]
Nov 30 20:18:28 server kernel: [174396.174412] [<ffffffffa1068527>] con_work+0x1727/0x2d00 [libceph]
Nov 30 20:18:28 server kernel: [174396.174415] [<ffffffff8109ef87>] ? dequeue_entity+0x107/0x520
Nov 30 20:18:28 server kernel: [174396.174418] [<ffffffff810810f5>] process_one_work+0x175/0x430
Nov 30 20:18:28 server kernel: [174396.174420] [<ffffffff81081d1b>] worker_thread+0x11b/0x3a0
Nov 30 20:18:28 server kernel: [174396.174422] [<ffffffff81081c00>] ? rescuer_thread+0x340/0x340
Nov 30 20:18:28 server kernel: [174396.174425] [<ffffffff81088660>] kthread+0xc0/0xd0
Nov 30 20:18:28 server kernel: [174396.174427] [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
Nov 30 20:18:28 server kernel: [174396.174430] [<ffffffff816567ac>] ret_from_fork+0x7c/0xb0
Nov 30 20:18:28 server kernel: [174396.174431] [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
Nov 30 20:18:28 server kernel: [174396.174433] Mem-Info:
Nov 30 20:18:28 server kernel: [174396.174434] Node 0 DMA per-cpu:
Nov 30 20:18:28 server kernel: [174396.174435] CPU 0: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174436] CPU 1: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174437] CPU 2: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174438] CPU 3: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174439] CPU 4: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174440] CPU 5: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174441] CPU 6: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174442] CPU 7: hi: 0, btch: 1 usd: 0
Nov 30 20:18:28 server kernel: [174396.174442] Node 0 DMA32 per-cpu:
Nov 30 20:18:28 server kernel: [174396.174444] CPU 0: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174445] CPU 1: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174446] CPU 2: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174447] CPU 3: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174448] CPU 4: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174448] CPU 5: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174449] CPU 6: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174450] CPU 7: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174451] Node 0 Normal per-cpu:
Nov 30 20:18:28 server kernel: [174396.174452] CPU 0: hi: 186, btch: 31 usd: 155
Nov 30 20:18:28 server kernel: [174396.174453] CPU 1: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174454] CPU 2: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174455] CPU 3: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174456] CPU 4: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174457] CPU 5: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174458] CPU 6: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174459] CPU 7: hi: 186, btch: 31 usd: 0
Nov 30 20:18:28 server kernel: [174396.174462] active_anon:2057707 inactive_anon:376163 isolated_anon:0
Nov 30 20:18:28 server kernel: [174396.174462] active_file:625823 inactive_file:2815813 isolated_file:0
Nov 30 20:18:28 server kernel: [174396.174462] unevictable:8 dirty:4325 writeback:585 unstable:0
Nov 30 20:18:28 server kernel: [174396.174462] free:295950 slab_reclaimable:1246932 slab_unreclaimable:289179
Nov 30 20:18:28 server kernel: [174396.174462] mapped:20432 shmem:4996 pagetables:13979 bounce:0
Nov 30 20:18:28 server kernel: [174396.174462] free_cma:0
Nov 30 20:18:28 server kernel: [174396.174464] Node 0 DMA free:15904kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 30 20:18:28 server kernel: [174396.174468] lowmem_reserve[]: 0 2890 32057 32057
Nov 30 20:18:28 server kernel: [174396.174470] Node 0 DMA32 free:1057860kB min:6088kB low:7608kB high:9132kB active_anon:459236kB inactive_anon:506452kB active_file:25260kB inactive_file:62096kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3042108kB managed:2964432kB mlocked:0kB dirty:28kB writeback:4kB mapped:116kB shmem:4kB slab_reclaimable:714536kB slab_unreclaimable:114788kB kernel_stack:48kB pagetables:2428kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov 30 20:18:28 server kernel: [174396.174473] lowmem_reserve[]: 0 0 29166 29166
Nov 30 20:18:28 server kernel: [174396.174475] Node 0 Normal free:110036kB min:61456kB low:76820kB high:92184kB active_anon:7771592kB inactive_anon:998200kB active_file:2478032kB inactive_file:11201156kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:30392316kB managed:29866760kB mlocked:32kB dirty:17272kB writeback:2336kB mapped:81612kB shmem:19980kB slab_reclaimable:4273192kB slab_unreclaimable:1041928kB kernel_stack:6544kB pagetables:53488kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
Nov 30 20:18:28 server kernel: [174396.174478] lowmem_reserve[]: 0 0 0 0
Nov 30 20:18:28 server kernel: [174396.174480] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15904kB
Nov 30 20:18:28 server kernel: [174396.174487] Node 0 DMA32: 47057*4kB (UE) 47169*8kB (UE) 30521*16kB (UE) 121*32kB (EMR) 1*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1057852kB
Nov 30 20:18:28 server kernel: [174396.174493] Node 0 Normal: 21641*4kB (UEM) 621*8kB (UEM) 978*16kB (UEM) 120*32kB (UEM) 5*64kB (E) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 111340kB
Nov 30 20:18:28 server kernel: [174396.174501] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 30 20:18:28 server kernel: [174396.174502] 3447879 total pagecache pages
Nov 30 20:18:28 server kernel: [174396.174503] 1149 pages in swap cache
Nov 30 20:18:28 server kernel: [174396.174504] Swap cache stats: add 27501, delete 26352, find 9219/9398
Nov 30 20:18:28 server kernel: [174396.174505] Free swap = 8185068kB
Nov 30 20:18:28 server kernel: [174396.174505] Total swap = 8273916kB
Nov 30 20:18:28 server kernel: [174396.265124] 8384511 pages RAM
Nov 30 20:18:28 server kernel: [174396.265128] 178657 pages reserved
Nov 30 20:18:28 server kernel: [174396.265129] 3819306 pages shared
Nov 30 20:18:28 server kernel: [174396.265130] 6248640 pages non-shared
Nov 30 20:18:28 server kernel: [174396.265132] ceph: problem parsing dir contents -12
Nov 30 20:18:28 server kernel: [174396.265134] ceph: mds parse_reply err -12
Nov 30 20:18:28 server kernel: [174396.265146] ceph: mdsc_handle_reply got corrupt reply mds0(tid:15679476)

I am using kernel cephfs mounts on this box. Any other details can be provided.

#2 Updated by Eric Eastman about 10 years ago

I am seeing a similar issue with Kernel 3.12.1 on Ubuntu 1310

Jan 14 03:09:27 gw2 kernel: [204578.112175] libceph: osd143 10.15.2.26:6875 socket closed (con state OPEN)
Jan 14 03:09:27 gw2 kernel: [204578.112303] libceph: osd143 10.15.2.26:6875 socket closed (con state CONNECTING)
Jan 14 03:09:27 gw2 kernel: [204578.415612] libceph: osd143 10.15.2.26:6875 socket closed (con state CONNECTING)
Jan 14 03:09:28 gw2 kernel: [204579.468421] libceph: osd143 10.15.2.26:6875 socket closed (con state CONNECTING)
Jan 14 03:09:30 gw2 kernel: [204581.446328] libceph: osd143 10.15.2.26:6875 socket closed (con state CONNECTING)
Jan 14 03:09:34 gw2 kernel: [204585.436294] libceph: osd143 down
Jan 14 03:09:34 gw2 kernel: [204585.437946] libceph: osd143 10.15.2.26:6875 socket closed (con state CONNECTING)
Jan 14 03:10:03 gw2 kernel: [204585.442869] BUG: unable to handle kernel NULL pointer dereference at (null)
Jan 14 03:10:03 gw2 kernel: [204585.442896] IP: [<ffffffff8136fdd3>] rb_erase+0x1a3/0x370
Jan 14 03:10:03 gw2 kernel: [204585.442917] PGD 0
Jan 14 03:10:03 gw2 kernel: [204585.442929] Oops: 0000 [#1] SMP
Jan 14 03:10:03 gw2 kernel: [204585.442947] Modules linked in: xfs nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache ext2 gpio_ich psmouse serio_raw sb_edac rbd ioatdma edac_core lpc scsi_transport_fc ptp pps_core scsi_tgtac_hid acpi_power_meter lp parport qla2xxx be2net hpsa tg33
Jan 14 03:10:03 gw2 kernel: [204585.443135] CPU: 0 PID: 6142 Comm: kworker/0:1 Not tainted 3.11.7-031107-generic #201311040853
Jan 14 03:10:03 gw2 kernel: [204585.443150] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/14/2012
Jan 14 03:10:03 gw2 kernel: [204585.443172] Workqueue: ceph-msgr con_work [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443187] task: ffff880bfd0f9770 ti: ffff8817fb006000 task.ti: ffff8817fb006000
Jan 14 03:10:03 gw2 kernel: [204585.443200] RIP: 0010:[<ffffffff8136fdd3>] [<ffffffff8136fdd3>] rb_erase+0x1a3/0x370
Jan 14 03:10:03 gw2 kernel: [204585.443219] RSP: 0018:ffff8817fb007cd8 EFLAGS: 00010246
Jan 14 03:10:03 gw2 kernel: [204585.443231] RAX: ffff8817fb1fb018 RBX: ffff8817fc3bb000 RCX: 0000000000000000
Jan 14 03:10:03 gw2 kernel: [204585.443244] RDX: 0000000000000000 RSI: ffff880bfb3497d0 RDI: ffff8817fc3bb018
Jan 14 03:10:03 gw2 kernel: [204585.443269] RBP: ffff8817fb007cd8 R08: 0000000000000000 R09: ffffea002fee5000
Jan 14 03:10:03 gw2 kernel: [204585.443305] R10: ffffffff815f7263 R11: 0000000000000000 R12: ffff880bfb349750
Jan 14 03:10:03 gw2 kernel: [204585.443340] R13: ffff8817fc3bb498 R14: ffff8817fb007d48 R15: ffff8817fc3bb000
Jan 14 03:10:03 gw2 kernel: [204585.443377] FS: 0000000000000000(0000) GS:ffff880c0fa00000(0000) knlGS:0000000000000000
Jan 14 03:10:03 gw2 kernel: [204585.443414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 14 03:10:03 gw2 kernel: [204585.443437] CR2: 0000000000000000 CR3: 0000000001c0d000 CR4: 00000000000407f0
Jan 14 03:10:03 gw2 kernel: [204585.443472] Stack:
Jan 14 03:10:03 gw2 kernel: [204585.443491] ffff8817fb007cf8 ffffffffa010e16c ffff8817fc3bb000 ffff880bfb349750
Jan 14 03:10:03 gw2 kernel: [204585.443544] ffff8817fb007d28 ffffffffa010e45f ffff8817fc3bb000 ffff880bfb349750
Jan 14 03:10:03 gw2 kernel: [204585.443596] ffff880bfb349760 ffff8817fb007d48 ffff8817fb007d88 ffffffffa010feb0
Jan 14 03:10:03 gw2 kernel: [204585.443649] Call Trace:
Jan 14 03:10:03 gw2 kernel: [204585.443677] [<ffffffffa010e16c>] __remove_osd+0x3c/0xa0 [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443706] [<ffffffffa010e45f>] __reset_osd+0x12f/0x170 [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443735] [<ffffffffa010feb0>] __kick_osd_requests+0x40/0x250 [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443763] [<ffffffff815f7263>] ? sock_destroy_inode+0x33/0x40
Jan 14 03:10:03 gw2 kernel: [204585.443792] [<ffffffffa0110117>] osd_reset+0x57/0xa0 [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443820] [<ffffffffa0109f5c>] con_work+0x15c/0x3d0 [libceph]
Jan 14 03:10:03 gw2 kernel: [204585.443846] [<ffffffff810810b0>] process_one_work+0x170/0x4a0
Jan 14 03:10:03 gw2 kernel: [204585.443871] [<ffffffff81082171>] worker_thread+0x121/0x390
Jan 14 03:10:03 gw2 kernel: [204585.443896] [<ffffffff81082050>] ? manage_workers.isra.20+0x170/0x170
Jan 14 03:10:03 gw2 kernel: [204585.443921] [<ffffffff81089030>] kthread+0xc0/0xd0
Jan 14 03:10:03 gw2 kernel: [204585.444228] [<ffffffff81088f70>] ? flush_kthread_worker+0xb0/0xb0
Jan 14 03:10:03 gw2 kernel: [204585.444264] [<ffffffff8172946c>] ret_from_fork+0x7c/0xb0
Jan 14 03:10:03 gw2 kernel: [204585.444288] [<ffffffff81088f70>] ? flush_kthread_worker+0xb0/0xb0
Jan 14 03:10:03 gw2 kernel: [204585.444311] Code: 10 f6 c2 01 0f 84 4e 01 00 00 48 83 e2 fc 0f 84 10 ff ff ff 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 0f 85 71 ff ff ff 48 8b 50 10 <f6> 02 01 75 3a 48 8b 7a 08 48 89 c1 48 83 c9 01 48 89 78 10 48ffff8136fdd3>] rb_erase+0x1a3/0x370
Jan 14 03:10:03 gw2 kernel: [204585.444688] RSP <ffff8817fb007cd8>
Jan 14 03:10:03 gw2 kernel: [204585.444708] CR2: 0000000000000000
Jan 14 03:10:03 gw2 kernel: [204585.445244] ---[ end trace 0926e2bbdb50532b ]---

The rbd device was used as the storage for an XFS file system. I had multiple write streams going to the XFS file system.

cat /proc/version
Linux version 3.12.1-031201-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201311201654 SMP Wed Nov 20 21:54:49 UTC 2013

modinfo rbd
filename: /lib/modules/3.12.1-031201-generic/kernel/drivers/block/rbd.ko
license: GPL
author: Jeff Garzik <>
description: rados block device
author: Yehuda Sadeh <>
author: Sage Weil <>
author: Alex Elder <>
srcversion: A993220E8E5D714D1F1429C
depends: libceph
intree: Y
vermagic: 3.12.1-031201-generic SMP mod_unload modversions

#3 Updated by Ian Colle about 10 years ago

  • Assignee set to Ilya Dryomov

#4 Updated by Josh Durgin almost 10 years ago

Another instance of the crash in the last comment, on ubuntu 12.04 with xfs on top with a 3.13 kernel when osds were going up and down:

[<ffffffffa03a98cc>] __remove_osd+0x3c/0xa0 [libceph]
[<ffffffffa03a9a3f>] __reset_osd+0x10f/0x150 [libceph]
[<ffffffffa03abbbd>] kick_requests+0x24d/0x440 [libceph]
[<ffffffffa03ac910>] ceph_osdc_handle_map+0x260/0x560 [libceph]
[<ffffffffa03a80b8>] dispatch+0x2e8/0x780 [libceph]
[<ffffffffa03a393b>] try_read+0x4ab/0x10d0 [libceph]
[<ffffffff8110ef1c>] ? acct_account_cputime+0x1c/0x20
[<ffffffff8101b773>] ? native_sched_clock+0x13/0x80
[<ffffffffa03a5809>] con_work+0xb9/0x640 [libceph]
[<ffffffff81080702>] process_one_work+0x182/0x450
[<ffffffff810814a1>] worker_thread+0x121/0x410
[<ffffffff81081380>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff810880f2>] kthread+0xd2/0xf0
[<ffffffff81088020>] ? kthread_create_on_node+0x190/0x190
[<ffffffff81726cfc>] ret_from_fork+0x7c/0xb0
[<ffffffff81088020>] ? kthread_create_on_node+0x190/0x190

#5 Updated by Ilya Dryomov about 9 years ago

Comments 2 and 4 are #8087 and are resolved.

Also available in: Atom PDF