Actions
Bug #5428
closedlibceph: null deref in ceph_auth_reset
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
<4>[19534.802099] libceph: mon2 10.214.131.7:6790 socket closed (con state CONNE CTING) <1>[19534.809633] BUG: unable to handle kernel NULL pointer dereference at (null) <1>[19534.817523] IP: [<ffffffff8163353e>] mutex_lock_nested+0xee/0x360 <4>[19534.823650] PGD 0 <4>[19534.825688] Oops: 0002 [#1] SMP [dumpcommon]kdb> -bt Stack traceback for pid 13893 0xffff88020cbe3f20 13893 2 1 0 R 0xffff88020cbe43a8 *kworker/0:0 ffff880211275b28 0000000000000018 ffffffff8163351a ffffffffa081f3c6 ffff880224f3b598 ffff88021112ea80 0000000000000246 ffff88021112ea80 0000000000000000 1111111111111111 ffff880211275b48 ffff880211275b98 Call Trace: [<ffffffff8163351a>] ? mutex_lock_nested+0xca/0x360 [<ffffffffa081f3c6>] ? ceph_auth_reset+0x26/0x80 [libceph] [<ffffffffa081f3c6>] ? ceph_auth_reset+0x26/0x80 [libceph] [<ffffffffa0812776>] ? __close_session+0x76/0xa0 [libceph] [<ffffffffa0812e33>] ? mon_fault+0x53/0xe0 [libceph] [<ffffffffa080ee21>] ? con_work+0x571/0x2d50 [libceph] [<ffffffff81080bb3>] ? idle_balance+0x133/0x180 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540 [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540 [<ffffffff810605bc>] ? worker_thread+0x11c/0x370 [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0 [<ffffffff8106727a>] ? kthread+0xea/0xf0 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150 [<ffffffff8163ff9c>] ? ret_from_fork+0x7c/0xb0 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
run was
ubuntu@teuthology:/a/teuthology-2013-06-22_01:00:51-kernel-next-testing-basic/42855$ cat orig.config.yaml kernel: kdb: true sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: next ceph: conf: global: ms inject socket failures: 500 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 94eada40460cc6010be23110ef8ce0e3d92691af install: ceph: sha1: 94eada40460cc6010be23110ef8ce0e3d92691af s3tests: branch: next workunit: sha1: 94eada40460cc6010be23110ef8ce0e3d92691af roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: null - ceph-fuse: null - workunit: clients: all: - rbd/map-unmap.sh
Files
Updated by Sage Weil almost 11 years ago
first guess was a shutdown race, but ceph_monc_stop() is flushing the msgr wq. also, no other threads appear to be in ceph code at this time.
ok, looking at the test output all rbds have long since been unmapped (unless there is a bug in the test script), so this is a leaked msgr socket, most likely.
Updated by Sage Weil almost 11 years ago
focusing on teh warning leading up to this first: it looks like the socket callback is happening when the socket is in the CLOSED state, which is always preceded by a sock->ops->shutdown(). best theory is that shutdown isn't serialized against the callbacks. alternatively, there is some ugly use-after-free going on, but that seems less likely.
Updated by Sage Weil over 10 years ago
- Status changed from New to Can't reproduce
Actions