Bug #5043: Oops in remove_osd - Linux kernel client - Ceph

Actions

Copy link

Bug #5043

closed

Oops in remove_osd

Added by Sandon Van Ness almost 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Alex Elder

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Stack output:

Stack traceback for pid 29892
0xffff88022140bf20 29892 2 1 6 R 0xffff88022140c3a8 *kworker/6:1
ffff88020ad79b48 0000000000000018 ffff88022140c640 ffff88020ad79b68
ffffffffa00bfa0c ffff88020a285000 ffff8802244a4950 ffff88020ad79b98
ffffffffa00bfcff ffff88020a285030 ffff8802244a4950 0000000000000005
Call Trace:
[<ffffffffa00bfa0c>] ? __remove_osd+0x3c/0xa0 [libceph]
[<ffffffffa00bfcff>] ? __reset_osd+0x12f/0x170 [libceph]
[<ffffffffa00c0e3e>] ? osd_reset+0x7e/0x2b0 [libceph]
[<ffffffffa00b8e81>] ? con_work+0x571/0x2d40 [libceph]
[<ffffffff816724c0>] ? _raw_spin_unlock_irq+0x30/0x40
[<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
[<ffffffff810b8c05>] ? trace_hardirqs_on_caller+0x105/0x190
[<ffffffff81074b12>] ? process_one_work+0x1d2/0x4b0
[<ffffffff81074aa1>] ? process_one_work+0x161/0x4b0
[<ffffffff81076778>] ? worker_thread+0x118/0x340
[<ffffffff81076660>] ? manage_workers+0x320/0x320
[<ffffffff8107c46a>] ? kthread+0xea/0xf0
[<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0
[<ffffffff8167b4ac>] ? ret_from_fork+0x7c/0xb0
[<ffffffff8107c380>] ? flush_kthread_work+0x1a0/0x1a0

This was the job:

kernel: &id001
kdb: true
sha1: b5b09be30cf99f9c699e825629f02e3bce555d44
machine_type: plana
nuke-on-error: true
overrides:
ceph:
conf:
mon:
debug mon: 20
debug ms: 20
debug paxos: 20
osd:
filestore flush min: 0
osd op thread timeout: 60
fs: btrfs
log-whitelist:
- slow request
sha1: 6c1e4791782ce2b3e101ee80640d896bcda684de
s3tests:
branch: next
workunit:
sha1: 6c1e4791782ce2b3e101ee80640d896bcda684de
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
ubuntu@plana19.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0IExVUuQEnwBzUbN6+jgEb/qLLsSDFLAe0OPv+R3Q3uUZn+4QjB+FQ3sMMEwaGpEhUfbpWn3xZbCmi4NngTsgPDjImYJPMeMecaxvVXqAJt4IM6gdBN0415lrKXtbXaGBJCmeFFDB+xN+JN6Qhk8DWN62DnJB8MS5vKn9u0S3HvtGzY/QrnutT3AX9I4097isbTepLFRC4n2CoAC9srXaxAprFgLgOIYHm386B3W7yK3yfIImWofvZxYpPCsr/7ws8DSgdRX9eDucGQfY2WcYKCEIgZclGPBhExm7q/ahHUqYKrzWY3RD93AuXJVgOJk5Yp8C1Ryx8vyJkAxBW4nb
ubuntu@plana25.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuXajaQgHe9XnbLOzI8WWFYVz6+TnOiTzbkIJPGOZpzQEjnUtJraQIEt5ABSeovMjiEj+V4XvunfyuSmEd0H9giRSyjmCHTPGlpndfTeCdVtCBpNqf5GkUqHaEY1Hp57XPbya2rGlwtFm0NeIDYx6pfkejKnsTOUqwhgUb6950TRhjHQhMjFgyALSyfAm/4y6vGZfjm57+yyih6XgDkqWiiQ6Y/aJVR2n+iCzvqEzV7JSCU+Brn+k8IQLHho1fadYqc5PjYct5BaVlHcP6c+T8nJE/DvqGwZ4gQaVJcuWJiDfLOPPYo1g/0AFicxauLwVNJ6HFR9FjLLGtGU+2DcVN
ubuntu@plana45.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDp3cwfZhOipCot6NiKX4cRMn4zx43QY0+5HdqzCQU2y7OrOJt3d0qvifnZPyeq8/d+aW2WL2OM8m4taz380JsP0SLmlpY8D0pGY/tN0pQDqIFd8EboMtKY6tR8unQrVzuczMqup/tkKSfdRp0zAeTiJ8qH7l9MaVcOw6WfRACb8f7APJE2gVRBrzPAdbqKzAphTRzZSz0cq722AX7XQDPT2dz7NoTp5Tk7xaQdDu2II+78B1H27IWdyYeonfy17yf9N+IA2Xzna/g5zu8apg7UvzyFmHunLyjr78dhPtR39201A0QJ5x5Qli9/UaB3LwiqnbCiGfx4xWFazdUFzxiD
tasks:
- internal.lock_machines:
- 3
- plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh

Its on plana45 which will not be reclaimed until the end of the week at the earliest if someone wants to check it out.

Actions

Copy link

Updated by Alex Elder almost 11 years ago

Status changed from New to In Progress
Assignee set to Alex Elder

I'm taking this for the time being.

This is in rb_erase().

Which means that maybe the osd client's red-black tree got
corrupted.

I'll dig a little deeper to see if I can find out more.

Actions

Copy link

Updated by Alex Elder almost 11 years ago

The ceph_osd_client in question is osdc @ 0xffff8802244a4950

The ceph_osd is osd @ 0xffff88020a285000
(struct rb_root *) &osdc->osds.rb_node is ffff8802218a1018 -> {
long unsigned int __rb_parent_color = 0000000000000001
struct rb_node * rb_right = ffff880063513018 ->
struct rb_node * rb_left = ffff880209db4818
}

(struct rb_node *) &osd->node is 0xffff88020a285018 = {
long unsigned int __rb_parent_color = ffff880209db2819
struct rb_node * rb_right = 0000000000000000
struct rb_node * rb_left = 0000000000000000
}

The osd getting removed has empty requests and linger requests
lists, and otherwise looks completely fine (though it does have
two references).

struct ceph_osd @ 0xffff88020a285000 -> {
atomic_t o_ref = { 00000002 }
(pad) = 00000000
struct ceph_osd_client * o_osdc = ffff8802244a4950

int                        o_osd = 00000005
    int                        o_incarnation = 00000001
    struct rb_node             o_node = {
            parent_color = ffff880209db2819
        right = 0000000000000000
        left = 0000000000000000
    }
    struct ceph_connection     o_con = {
        ffff88020a285000 ffffffffa00cf1c0
            ...
        0000000000000000
        0000000000000032
    }
    struct list_head           o_requests =  ffff88020a285538 = {
        next = ffff88020a285538 
        prev = ffff88020a285538 
    }
    struct list_head           o_linger_requests  ffff88020a285548 = {
        next = ffff88020a285548 
        prev = ffff88020a285548 
    }
    struct list_head           o_osd_lru  ffff88020a285558 = {
        next = ffff88020a285558 
        prev = ffff88020a285558 
    }
    struct ceph_auth_handshake o_auth =  ffff88020a285568 = {
        struct ceph_authorizer *   authorizer = 0
        void *                     authorizer_buf = 0
        size_t                     authorizer_buf_len = 0
        void *                     authorizer_reply_buf = 0
        size_t                     authorizer_reply_buf_len = 0
    }
    long unsigned int          lru_ttl = 00000000
    int                        o_marked_for_keepalive = 00000000
    struct list_head           o_keepalive_item @ ffff88020a285590 = {
        next = ffff88020a285590 
        prev = ffff88020a285590 
    }
};

Actions

Copy link

Updated by Alex Elder almost 11 years ago

I think I may have just found something.

It appears as though access to the osd client's osd red-black
tree is supposed to be protected using osdc->request_mutex.

An osd is added to that tree only in __insert_osd(). That
function is only called by __map_request(), which does no
additional locking before calling that function. There
are three places __map_request() is called: kick_requests()
(twice); and ceph_osdc_start_request().

In ceph_osdc_start_request(), osdc->request_mutex is held during
the __map_request() call. In kick_requests(), osdc->request_mutex
is taken within that function before calling __map_request()

So all insertions are protected by the request mutex.

An osd is removed from an osd client's tree only in __remove_osd().
There are three callers of __remove_osd(): remove_all_osds();
remove_old_osds(); and __reset_osd(). The first two of these
take and hold osdc->request_mutex while __remove_osd() is called.

But _reset_osd() does not. That function is called in two places,
_kick_osd_requests() and reset_changed_osds(). The first one
takes osdc->request_mutex before calling __remove_osd().
But reset_changed_osds() does not take the mutex. And the
one place reset_changed_osds() is called drops the mutex before
it makes the call (in kick_requests()).

So...

I think what happened is that kick_requests() got called, ultimately
changing the osd client's red-black tree, concurrent with another
call (in this case a call to __remove_osd() but there are other
possible failure modes), and that ultimately led to the corruption
in the osd client's osd red-black tree.

The problem can be seen in this commit:
e6d50f67 libceph: always reset osds when kicking
In that, the call to reset_changed_osds() was moved inside
kick_requests() rather than being called after it.

However even there, the kick_requests call was only protected
by osdc->map_sem (for writing), and that situation seems to
go back to antiquity (to when net/ceph/osd_client.c first got
added to the tree).

One last point. It's possible that osdc->map_sem was intended
to protect updates to the red-black tree. However that's wrong
too, because there appear to be cases where the semaphore is
taken for reading when in fact the tree could get updated.

I think the simplest fix is to move the call in kick_requests()
to reset_changed_osds() up a few lines, before the request
mutex is dropped. This would follow a call to
ceph_monc_request_next_osdmap(), but I think that's fine because
that appears to be an asynchronous request to the monitor and
shouldn't impact the osd client.

Actions

Copy link