Project

General

Profile

Actions

Bug #18474

closed

oops in __unregister_request

Added by Jeff Layton over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I left a xfstests run going overnight last night and when I came back I saw this oops on the console:

[14231.902682] run fstests generic/205 at 2017-01-09 23:11:24
[14339.010567] ceph:  dropping unsafe request 18446621567284867072
[14339.011516] ------------[ cut here ]------------
[14339.012188] kernel BUG at fs/ceph/mds_client.c:576!
[14339.012816] invalid opcode: 0000 [#1] SMP
[14339.013338] Modules linked in: loop ceph(OE) libceph fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_raw ip6table_mangle ip6table_security ip6table_
nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_raw iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables nfsd auth_rp
cgss crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nfs_acl lockd ppdev joydev acpi_cpufreq parport_pc tpm_tis pcspkr parport i2c_piix4 tpm_tis_core tpm virtio_balloon qemu_fw_cfg grace sunrpc xfs libcrc32c virtio_blk virtio_net v
irtio_console qxl drm_kms_helper ata_generic ttm serio_raw crc32c_intel drm pata_acpi virtio_pci virtio_ring virtio floppy
[14339.026549] CPU: 1 PID: 29659 Comm: kworker/1:2 Tainted: G           OE   4.9.0+ #13
[14339.027572] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[14339.029143] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[14339.029916] task: ffff9094ea818000 task.stack: ffffb500420b4000
[14339.030676] RIP: 0010:[<ffffffffc05a41bd>]  [<ffffffffc05a41bd>] __unregister_request+0x1ad/0x1b0 [ceph]
[14339.032275] RSP: 0018:ffffb500420b7c08  EFLAGS: 00010246
[14339.033134] RAX: 0000000000000000 RBX: ffff9094c012f400 RCX: 0000000000000000
[14339.034282] RDX: 0000000000000000 RSI: ffff9094efb87100 RDI: ffff9094efb87000
[14339.035108] RBP: ffffb500420b7c20 R08: 0000000000000000 R09: 0000000000000000
[14339.036325] R10: 0000000000000c00 R11: 0000000000000479 R12: ffff9094efb87000
[14339.037455] R13: ffff9094c012f408 R14: ffff9094efd00db8 R15: ffff9094c012f738
[14339.038532] FS:  0000000000000000(0000) GS:ffff9094ffc80000(0000) knlGS:0000000000000000
[14339.039827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14339.040674] CR2: 0000555bf06f56b8 CR3: 00000001311a5000 CR4: 00000000000406e0
[14339.041755] Stack:
[14339.042082]  ffff9094efd00800 ffff9094c012f400 ffff9094efb87000 ffffb500420b7c60
[14339.043306]  ffffffffc05a4247 ffff9094efb87008 ffff9094efb87000 0000000000000003
[14339.044514]  ffff9094eee4af00 ffff9094efd00800 ffff9094eeea3420 ffffb500420b7d18
[14339.045718] Call Trace:
[14339.046109]  [<ffffffffc05a4247>] cleanup_session_requests+0x87/0x130 [ceph]
[14339.047199]  [<ffffffffc05a78ee>] dispatch+0x69e/0x16d0 [ceph]
[14339.048059]  [<ffffffffc052920b>] ceph_con_workfn+0x5fb/0x2ef0 [libceph]
[14339.049099]  [<ffffffffad0daa2c>] ? dequeue_task_fair+0x5ec/0x940
[14339.050035]  [<ffffffffad05cf4e>] ? kvm_sched_clock_read+0x1e/0x30
[14339.051029]  [<ffffffffad02fc49>] ? sched_clock+0x9/0x10
[14339.051807]  [<ffffffffad025728>] ? __switch_to+0x2a8/0x5c0
[14339.052652]  [<ffffffffad0bbe54>] process_one_work+0x184/0x430
[14339.053572]  [<ffffffffad0bc14e>] worker_thread+0x4e/0x490
[14339.054364]  [<ffffffffad0bc100>] ? process_one_work+0x430/0x430
[14339.055220]  [<ffffffffad0bc100>] ? process_one_work+0x430/0x430
[14339.056336]  [<ffffffffad0c1b59>] kthread+0xd9/0xf0
[14339.057270]  [<ffffffffad0c1a80>] ? kthread_park+0x60/0x60
[14339.058275]  [<ffffffffad0c1a80>] ? kthread_park+0x60/0x60
[14339.059293]  [<ffffffffad80cb15>] ret_from_fork+0x25/0x30
[14339.060293] Code: 11 00 00 74 e6 48 8b 40 f8 49 89 84 24 f8 00 00 00 e9 96 fe ff ff 4c 8b a3 88 00 00 00 4d 85 e4 0f 85 56 ff ff ff e9 cb fe ff ff <0f> 0b 90 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 49
[14339.064785] RIP  [<ffffffffc05a41bd>] __unregister_request+0x1ad/0x1b0 [ceph]
[14339.065884]  RSP <ffffb500420b7c08>
[14339.067866] ---[ end trace 8a0bb333836080c5 ]---

...the debugger says:

(gdb) list *(__unregister_request+0x1ad)
0x231ed is in __unregister_request (fs/ceph/mds_client.c:576).
571        put_request_session(req);
572        ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
573        kfree(req);
574    }
575    
576    DEFINE_RB_FUNCS(request, struct ceph_mds_request, r_tid, r_node)
577    
578    /*
579     * lookup session, bump ref if found.
580     *

Which may be the BUG_ON in DEFINE_RB_INSDEL_FUNCS:

        BUG_ON(!RB_EMPTY_NODE(&t->nodefld));

...unfortunately I didn't get a vmcore, so it's hard to know for sure.

Actions #1

Updated by Jeff Layton over 7 years ago

On the next attempt:

$ sudo ./check generic/205
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 cephfs 4.9.0+
MKFS_OPTIONS  -- 192.168.1.149:/scratch
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 192.168.1.149:/scratch /mnt/scratch

generic/205     [not run] Reflink not supported by scratch filesystem type: ceph
Not run: generic/205
Passed all 0 tests

...with no crash. So it seems likely that this occurred during the mount or unmount perhaps? The message about a dropped unsafe request may be a hint there.

Actions #2

Updated by Ilya Dryomov over 7 years ago

  • Category set to fs/ceph
Actions #3

Updated by Jeff Layton about 7 years ago

Hit a very similar crash in testing today. This time it crashed while doing the list_del_init in cleanup_session_requests(). Almost the same spot, but not exactly. I think we may have a memory scribble.

I'll plan to add some strategic BUG_ONs -- see if we're maybe freeing the reqs while they're still on s_unsafe? I'll also see if I can set up kdump and get a vmcore. That might tell us a bit more about how we get a zeroed out object here.

It might also not hurt to make a dedicated slabcache for ceph_mds_request objects. That way we could do some memory poisoning of just those objects too.

Actions #4

Updated by Zheng Yan about 7 years ago

if op == CEPH_SESSION_CLOSE, handle_session() unregisters the session. Maybe it's use-after-free bug. (I didn't check it carefully)

Actions #5

Updated by Jeff Layton about 7 years ago

Yeah, could be a use-after-free, or maybe a refcounting imbalance in the session handling? I kicked off another xfstests run and it didn't occur, so I'm guessing it may be a race of some sort.

xfstests does do a bunch of mounting and unmounting in quick succession and this oops is occurring in the session teardown code, so maybe it's related to that somehow?

Actions #6

Updated by Jeff Layton about 7 years ago

I added this just before the kfree(req) in ceph_mdsc_release_request:

BUG_ON(!list_empty(&req->r_unsafe_item));

...and kicked off an xfstests run before bedtime last night. This was on the console this morning:

[43351.124014] run fstests generic/205 at 2017-02-06 22:52:49
[43415.014713] ------------[ cut here ]------------
[43415.015243] kernel BUG at fs/ceph/mds_client.c:587!
[43415.015759] invalid opcode: 0000 [#1] SMP
[43415.016190] Modules linked in: ceph(OE) libceph(OE) loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security ebtable_filter ebtables ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev joydev acpi_cpufreq i2c_piix4 parport_pc pcspkr nfsd tpm_tis parport virtio_balloon tpm_tis_core qemu_fw_cfg tpm auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c qxl virtio_console drm_kms_helper virtio_blk virtio_net ttm floppy drm ata_generic virtio_pci
[43415.023469]  crc32c_intel virtio_ring serio_raw pata_acpi virtio [last unloaded: libceph]
[43415.024331] CPU: 0 PID: 7618 Comm: umount Tainted: G           OE   4.10.0-rc5+ #8
[43415.025125] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[43415.026033] task: ffff9ee239c9d880 task.stack: ffffc275833b8000
[43415.026663] RIP: 0010:ceph_mdsc_release_request+0x169/0x170 [ceph]
[43415.027313] RSP: 0018:ffffc275833bbdc0 EFLAGS: 00010287
[43415.027860] RAX: ffff9ee2335f7b30 RBX: ffff9ee2335f4f30 RCX: 0000000100000000
[43415.028607] RDX: ffff9ee22e30d688 RSI: ffff9ee2335f4f58 RDI: ffff9ee22eb3fc00
[43415.029355] RBP: ffffc275833bbdd0 R08: ffff9ee229ea0a40 R09: 0000000000000001
[43415.030120] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9ee2335f4c00
[43415.030857] R13: ffff9ee229eec010 R14: ffff9ee22eb3fc08 R15: ffff9ee2335f4c00
[43415.031638] FS:  00007fe9e3c86340(0000) GS:ffff9ee23fc00000(0000) knlGS:0000000000000000
[43415.032484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43415.033090] CR2: 00005623e6999c28 CR3: 000000012cd0c000 CR4: 00000000000406f0
[43415.033830] Call Trace:
[43415.034108]  __unregister_request+0xc3/0x200 [ceph]
[43415.034628]  ceph_mdsc_pre_umount+0x131/0x1f0 [ceph]
[43415.035163]  ceph_kill_sb+0x2b/0x70 [ceph]
[43415.035600]  deactivate_locked_super+0x43/0x70
[43415.036074]  deactivate_super+0x5a/0x60
[43415.036481]  cleanup_mnt+0x3f/0x90
[43415.036845]  __cleanup_mnt+0x12/0x20
[43415.037232]  task_work_run+0x80/0xa0
[43415.037611]  exit_to_usermode_loop+0xaa/0xb0
[43415.038069]  syscall_return_slowpath+0xaa/0xb0
[43415.038540]  entry_SYSCALL_64_fastpath+0xa7/0xa9
[43415.039031] RIP: 0033:0x7fe9e2ceffd7
[43415.039414] RSP: 002b:00007ffe6835a1b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[43415.040200] RAX: 0000000000000000 RBX: 000055c9a6dea080 RCX: 00007fe9e2ceffd7
[43415.040940] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055c9a6df2350
[43415.041679] RBP: 000055c9a6df2350 R08: 000055c9a6df2370 R09: 0000000000000014
[43415.042424] R10: 000055c9a6dede40 R11: 0000000000000246 R12: 00007fe9e3a6d184
[43415.043374] R13: 0000000000000000 R14: 000055c9a6dea260 R15: 00007ffe6835a440
[43415.044341] Code: 8b bb 58 fd ff ff 48 8d b3 90 00 00 00 48 83 c3 68 e8 dc 20 ff ff 48 8b 03 48 39 c3 75 0d 4c 89 e7 e8 4c 85 9c e1 5b 41 5c 5d c3 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 49 
[43415.046730] RIP: ceph_mdsc_release_request+0x169/0x170 [ceph] RSP: ffffc275833bbdc0
[43415.049061] ---[ end trace e740b673f6130a3f ]---

So it seems likely that we have a ceph_mds_request refcounting imbalance somewhere that is causing reqs to be freed before it has been removed from the s_unsafe list.

In my testing, it always happens in the unmount codepath during generic/205 test. I'll plan to have a look at that test and see whether we can narrow it down to certain codepaths.

Actions #7

Updated by Jeff Layton about 7 years ago

  • Assignee set to Jeff Layton
Actions #8

Updated by Zheng Yan about 7 years ago

I think wait_requests() should remove request from unsafe list before calling __unregister_request()

Actions #9

Updated by Jeff Layton about 7 years ago

Zheng Yan wrote:

I think wait_requests() should remove request from unsafe list before calling __unregister_request()

Maybe. Would it be better instead to do it in __unregister_request? It's protected by the same lock you need for the stuff in that function, so removing it there would make sense to me.

Is there ever a time you'd want to remove it from the request tree but leave it on the s_unsafe list?

Actions #10

Updated by Zheng Yan about 7 years ago

Jeff Layton wrote:

Is there ever a time you'd want to remove it from the request tree but leave it on the s_unsafe list?

I don't think there is

Actions #11

Updated by Jeff Layton about 7 years ago

On second thought, I don't really like that since __register_request doesn't put the thing on the list. I'm going to go with your original idea of list_del_init'ing it in wait_requests, and then add a WARN_ON_ONCE() in __unregister_request if the list isn't empty at that point. I ran xfstests on ceph last night with a patch like that and got no panics and no warnings, so I think it's safe enough.

Actions #12

Updated by Jeff Layton about 7 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF