Bug #5043
closedOops in remove_osd
0%
Description
Stack output:
Stack traceback for pid 29892
0xffff88022140bf20 29892 2 1 6 R 0xffff88022140c3a8 *kworker/6:1
ffff88020ad79b48 0000000000000018 ffff88022140c640 ffff88020ad79b68
ffffffffa00bfa0c ffff88020a285000 ffff8802244a4950 ffff88020ad79b98
ffffffffa00bfcff ffff88020a285030 ffff8802244a4950 0000000000000005
Call Trace:
[<ffffffffa00bfa0c>] ? __remove_osd+0x3c/0xa0 [libceph]
[<ffffffffa00bfcff>] ? __reset_osd+0x12f/0x170 [libceph]
[<ffffffffa00c0e3e>] ? osd_reset+0x7e/0x2b0 [libceph]
[<ffffffffa00b8e81>] ? con_work+0x571/0x2d40 [libceph]
[<ffffffff816724c0>] ? _raw_spin_unlock_irq+0x30/0x40
[<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
[<ffffffff810b8c05>] ? trace_hardirqs_on_caller+0x105/0x190
[<ffffffff81074b12>] ? process_one_work+0x1d2/0x4b0
[<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
[<ffffffff81076778>] ? worker_thread+0x118/0x340
[<ffffffff81076660>] ? manage_workers+0x320/0x320
[<ffffffff8107c46a>] ? kthread+0xea/0xf0
[<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
[<ffffffff8167b4ac>] ? ret_from_fork+0x7c/0xb0
[<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
This was the job:
kernel: &id001
kdb: true
sha1: b5b09be30cf99f9c699e825629f02e3bce555d44
machine_type: plana
nuke-on-error: true
overrides:
ceph:
conf:
mon:
debug mon: 20
debug ms: 20
debug paxos: 20
osd:
filestore flush min: 0
osd op thread timeout: 60
fs: btrfs
log-whitelist:
- slow request
sha1: 6c1e4791782ce2b3e101ee80640d896bcda684de
s3tests:
branch: next
workunit:
sha1: 6c1e4791782ce2b3e101ee80640d896bcda684de
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
ubuntu@plana19.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0IExVUuQEnwBzUbN6+jgEb/qLLsSDFLAe0OPv+R3Q3uUZn+4QjB+FQ3sMMEwaGpEhUfbpWn3xZbCmi4NngTsgPDjImYJPMeMecaxvVXqAJt4IM6gdBN0415lrKXtbXaGBJCmeFFDB+xN+JN6Qhk8DWN62DnJB8MS5vKn9u0S3HvtGzY/QrnutT3AX9I4097isbTepLFRC4n2CoAC9srXaxAprFgLgOIYHm386B3W7yK3yfIImWofvZxYpPCsr/7ws8DSgdRX9eDucGQfY2WcYKCEIgZclGPBhExm7q/ahHUqYKrzWY3RD93AuXJVgOJk5Yp8C1Ryx8vyJkAxBW4nb
ubuntu@plana25.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuXajaQgHe9XnbLOzI8WWFYVz6+TnOiTzbkIJPGOZpzQEjnUtJraQIEt5ABSeovMjiEj+V4XvunfyuSmEd0H9giRSyjmCHTPGlpndfTeCdVtCBpNqf5GkUqHaEY1Hp57XPbya2rGlwtFm0NeIDYx6pfkejKnsTOUqwhgUb6950TRhjHQhMjFgyALSyfAm/4y6vGZfjm57+yyih6XgDkqWiiQ6Y/aJVR2n+iCzvqEzV7JSCU+Brn+k8IQLHho1fadYqc5PjYct5BaVlHcP6c+T8nJE/DvqGwZ4gQaVJcuWJiDfLOPPYo1g/0AFicxauLwVNJ6HFR9FjLLGtGU+2DcVN
ubuntu@plana45.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDp3cwfZhOipCot6NiKX4cRMn4zx43QY0+5HdqzCQU2y7OrOJt3d0qvifnZPyeq8/d+aW2WL2OM8m4taz380JsP0SLmlpY8D0pGY/tN0pQDqIFd8EboMtKY6tR8unQrVzuczMqup/tkKSfdRp0zAeTiJ8qH7l9MaVcOw6WfRACb8f7APJE2gVRBrzPAdbqKzAphTRzZSz0cq722AX7XQDPT2dz7NoTp5Tk7xaQdDu2II+78B1H27IWdyYeonfy17yf9N+IA2Xzna/g5zu8apg7UvzyFmHunLyjr78dhPtR39201A0QJ5x5Qli9/UaB3LwiqnbCiGfx4xWFazdUFzxiD
tasks:
- internal.lock_machines:
- 3
- plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh
Its on plana45 which will not be reclaimed until the end of the week at the earliest if someone wants to check it out.
Updated by Alex Elder almost 11 years ago
- Status changed from New to In Progress
- Assignee set to Alex Elder
I'm taking this for the time being.
This is in rb_erase().
Which means that maybe the osd client's red-black tree got
corrupted.
I'll dig a little deeper to see if I can find out more.
Updated by Alex Elder almost 11 years ago
The ceph_osd_client in question is osdc @ 0xffff8802244a4950
The ceph_osd is osd @ 0xffff88020a285000
(struct rb_root *) &osdc->osds.rb_node is ffff8802218a1018 -> {
long unsigned int __rb_parent_color = 0000000000000001
struct rb_node * rb_right = ffff880063513018 ->
struct rb_node * rb_left = ffff880209db4818
}
(struct rb_node *) &osd->node is 0xffff88020a285018 = {
long unsigned int __rb_parent_color = ffff880209db2819
struct rb_node * rb_right = 0000000000000000
struct rb_node * rb_left = 0000000000000000
}
The osd getting removed has empty requests and linger requests
lists, and otherwise looks completely fine (though it does have
two references).
struct ceph_osd @ 0xffff88020a285000 -> {
atomic_t o_ref = { 00000002 }
(pad) = 00000000
struct ceph_osd_client * o_osdc = ffff8802244a4950
int o_osd = 00000005
int o_incarnation = 00000001
struct rb_node o_node = {
parent_color = ffff880209db2819
right = 0000000000000000
left = 0000000000000000
}
struct ceph_connection o_con = {
ffff88020a285000 ffffffffa00cf1c0
...
0000000000000000
0000000000000032
}
struct list_head o_requests = ffff88020a285538 = {
next = ffff88020a285538
prev = ffff88020a285538
}
struct list_head o_linger_requests
ffff88020a285548 = {
next = ffff88020a285548
prev = ffff88020a285548
}
struct list_head o_osd_lru ffff88020a285558 = {
next = ffff88020a285558
prev = ffff88020a285558
}
struct ceph_auth_handshake o_auth =
ffff88020a285568 = {
struct ceph_authorizer * authorizer = 0
void * authorizer_buf = 0
size_t authorizer_buf_len = 0
void * authorizer_reply_buf = 0
size_t authorizer_reply_buf_len = 0
}
long unsigned int lru_ttl = 00000000
int o_marked_for_keepalive = 00000000
struct list_head o_keepalive_item @ ffff88020a285590 = {
next = ffff88020a285590
prev = ffff88020a285590
}
};
Updated by Alex Elder almost 11 years ago
I think I may have just found something.
It appears as though access to the osd client's osd red-black
tree is supposed to be protected using osdc->request_mutex.
An osd is added to that tree only in __insert_osd(). That
function is only called by __map_request(), which does no
additional locking before calling that function. There
are three places __map_request() is called: kick_requests()
(twice); and ceph_osdc_start_request().
In ceph_osdc_start_request(), osdc->request_mutex is held during
the __map_request() call. In kick_requests(), osdc->request_mutex
is taken within that function before calling __map_request()
So all insertions are protected by the request mutex.
An osd is removed from an osd client's tree only in __remove_osd().
There are three callers of __remove_osd(): remove_all_osds();
remove_old_osds(); and __reset_osd(). The first two of these
take and hold osdc->request_mutex while __remove_osd() is called.
But _reset_osd() does not. That function is called in two places,
_kick_osd_requests() and reset_changed_osds(). The first one
takes osdc->request_mutex before calling __remove_osd().
But reset_changed_osds() does not take the mutex. And the
one place reset_changed_osds() is called drops the mutex before
it makes the call (in kick_requests()).
So...
I think what happened is that kick_requests() got called, ultimately
changing the osd client's red-black tree, concurrent with another
call (in this case a call to __remove_osd() but there are other
possible failure modes), and that ultimately led to the corruption
in the osd client's osd red-black tree.
The problem can be seen in this commit:
e6d50f67 libceph: always reset osds when kicking
In that, the call to reset_changed_osds() was moved inside
kick_requests() rather than being called after it.
However even there, the kick_requests call was only protected
by osdc->map_sem (for writing), and that situation seems to
go back to antiquity (to when net/ceph/osd_client.c first got
added to the tree).
One last point. It's possible that osdc->map_sem was intended
to protect updates to the red-black tree. However that's wrong
too, because there appear to be cases where the semaphore is
taken for reading when in fact the tree could get updated.
I think the simplest fix is to move the call in kick_requests()
to reset_changed_osds() up a few lines, before the request
mutex is dropped. This would follow a call to
ceph_monc_request_next_osdmap(), but I think that's fine because
that appears to be an asynchronous request to the monitor and
shouldn't impact the osd client.
Updated by Alex Elder almost 11 years ago
- Status changed from In Progress to Fix Under Review
The following patch has been posted for review.
[PATCH] libceph: must hold mutex for reset_changed_osds()
Updated by Alex Elder almost 11 years ago
- Status changed from Fix Under Review to Resolved
The following has been committed to the "testing" branch
of the ceph-client git repository:
14d2f38 libceph: must hold mutex for reset_changed_osds()
It has also been queued to send to Linus for the 3.10
Linux release cycle.