Project

General

Profile

Actions

Bug #39649

closed

kernel BUG at fs/ceph/file.c:1476!

Added by Jeff Layton almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Running xfstests generic/129 vs. a vstart cluster, and I hit ^c to cancel the test. The kernel then crashed here:

[ 1344.885281] run fstests generic/129 at 2019-05-09 08:48:14
[ 1344.999231] libceph: mon0 192.168.1.3:40010 session established
[ 1345.060496] libceph: client4219 fsid b45ff1a6-831c-4730-b237-be57984ef165
[ 1345.207877] libceph: mon0 192.168.1.3:40010 session established
[ 1345.268959] libceph: client4220 fsid b45ff1a6-831c-4730-b237-be57984ef165
[ 1349.813151] libceph: mon0 192.168.1.3:40010 session established
[ 1349.873605] libceph: client4221 fsid b45ff1a6-831c-4730-b237-be57984ef165
[ 1379.942403] ------------[ cut here ]------------
[ 1379.943952] kernel BUG at fs/ceph/file.c:1476!
[ 1379.945303] invalid opcode: 0000 [#1] SMP NOPTI
[ 1379.946662] CPU: 3 PID: 17258 Comm: looptest Tainted: G           O      5.1.0-rc4+ #119
[ 1379.948730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 1379.951509] RIP: 0010:ceph_write_iter+0xa90/0xbd0 [ceph]
[ 1379.953307] Code: 48 8b 80 90 00 00 00 e9 8a f6 ff ff 8b 95 38 ff ff ff 48 c7 c6 60 97 87 c0 48 c7 c7 88 aa 89 c0 e8 05 32 c8 d0 e9 c9 fa ff ff <0f> 0b c7 85 7c ff ff ff e4 ff ff ff e9 6a fe ff ff 48 8d bb c8 fc
[ 1379.960585] RSP: 0018:ffffbee291727d00 EFLAGS: 00010246
[ 1379.963100] RAX: ffffa061e4b4ca80 RBX: 0000000000019000 RCX: 0000000000000000
[ 1379.964683] RDX: ffffa061e4b4ca80 RSI: 0000000000000202 RDI: 0000000000000000
[ 1379.966535] RBP: ffffbee291727e30 R08: ffffa061e4b4ca48 R09: ffffbee291727c94
[ 1379.968259] R10: 00000000000025dd R11: 0000000000000001 R12: 0000000000000000
[ 1379.969845] R13: ffffa061e4b4cc00 R14: 0000000013ed9000 R15: ffffbee291727e58
[ 1379.971375] FS:  00007fbeb3ea0740(0000) GS:ffffa0622f8c0000(0000) knlGS:0000000000000000
[ 1379.973054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1379.974376] CR2: 00007fce57ee86c0 CR3: 00000003fa232000 CR4: 00000000000006e0
[ 1379.975883] Call Trace:
[ 1379.976780]  ? wait_for_completion_killable_timeout+0x150/0x180
[ 1379.978167]  ? wake_up_q+0x60/0x60
[ 1379.979379]  ? mutex_lock+0xe/0x30
[ 1379.980411]  ? new_sync_write+0x124/0x190
[ 1379.981498]  new_sync_write+0x124/0x190
[ 1379.982562]  vfs_write+0xb6/0x1a0
[ 1379.983565]  ksys_write+0x57/0xd0
[ 1379.984546]  do_syscall_64+0x5b/0x150
[ 1379.985607]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1379.986868] RIP: 0033:0x7fbeb408bb05
[ 1379.987889] Code: 00 00 75 05 48 83 c4 58 c3 e8 17 4c ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 8b 05 46 e9 00 00 85 c0 75 12 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 48 83 ec 28 48 89 54 24 18 48 89

That is:

                spin_lock(&ci->i_ceph_lock);                                    
                if (__ceph_have_pending_cap_snap(ci)) {                         
                        struct ceph_cap_snap *capsnap =                         
                                        list_last_entry(&ci->i_cap_snaps,       
                                                        struct ceph_cap_snap,   
                                                        ci_item);               
                        snapc = ceph_get_snap_context(capsnap->context);        
                } else {                                                        
                        BUG_ON(!ci->i_head_snapc);     <<<< CRASH HERE     
                        snapc = ceph_get_snap_context(ci->i_head_snapc);        
                }                                                               
                spin_unlock(&ci->i_ceph_lock);                                 
Actions #1

Updated by Jeff Layton almost 5 years ago

This happened on a kernel that is pretty close to what's in ceph-client/testing branch today (4695b5b0d754147f3e01afa2f97bb3ae96437383 + another unrelated patch that I sent this morning).

I've had no luck reproducing this so far. When I ran it initially, it was part of a 'quick' group check in xfstests, so it's possible that the behaviour of the earlier tests helped trigger it; or, maybe it's just a tight race. In any case, I wasn't doing anything with snapshots here, so that should not have been a factor.

Actions #2

Updated by Zheng Yan almost 5 years ago

  • Status changed from New to Resolved

fixed by "ceph: fix error handling in ceph_get_caps()"

Actions

Also available in: Atom PDF