Project

General

Profile

Actions

Bug #16630

closed

rbd map vs rbd_watch_errcb deadlock

Added by Ilya Dryomov almost 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

On Fri, Jul 8, 2016 at 3:28 AM, Patrick McLean <patrickm@gaikai.com> wrote:
> This is on linus git master as of 2016/07/01
>
> [...]
>
> # grep -C 40 rbd /proc/*/stack
> /proc/14109/stack-[<ffffffff8111c7c4>] __queue_work+0x144/0x420
> /proc/14109/stack-[<ffffffff8112c1c0>] default_wake_function+0x0/0x10
> /proc/14109/stack:[<ffffffffa01561ee>]
> rbd_obj_method_sync.constprop.48+0x1be/0x290 [rbd]
> /proc/14109/stack:[<ffffffffa0156aea>] rbd_dev_header_info+0x15a/0x930 [rbd]
> /proc/14109/stack:[<ffffffffa0157450>] rbd_watch_cb+0x0/0xa0 [rbd]
> /proc/14109/stack:[<ffffffffa0157586>]
> rbd_dev_image_probe.part.42+0x96/0x910 [rbd]
> /proc/14109/stack:[<ffffffffa015640e>] rbd_dev_image_id+0x14e/0x1b0 [rbd]
> /proc/14109/stack:[<ffffffffa015828f>] do_rbd_add.isra.43+0x48f/0xbb0 [rbd]
> /proc/14109/stack-[<ffffffff8123ba27>] __kmalloc+0x27/0x170
> /proc/14109/stack-[<ffffffff812b7f3a>] kernfs_fop_write+0x10a/0x190
> /proc/14109/stack-[<ffffffff8124dc63>] __vfs_write+0x23/0x120
> /proc/14109/stack-[<ffffffff8124e8f3>] vfs_write+0xb3/0x1a0
> /proc/14109/stack-[<ffffffff8124fbd2>] SyS_write+0x42/0xa0
> /proc/14109/stack-[<ffffffff81816f72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> /proc/14109/stack-[<ffffffffffffffff>] 0xffffffffffffffff
> --
> /proc/29744/stack-[<ffffffff813c7c63>] call_rwsem_down_write_failed+0x13/0x20
> /proc/29744/stack:[<ffffffffa01572dd>] rbd_dev_refresh+0x1d/0xf0 [rbd]
> /proc/29744/stack:[<ffffffffa0157413>] rbd_watch_errcb+0x33/0x70 [rbd]
> /proc/29744/stack-[<ffffffffa0126a2e>] do_watch_error+0x2e/0x40 [libceph]
> /proc/29744/stack-[<ffffffff8111d935>] process_one_work+0x145/0x3c0
> /proc/29744/stack-[<ffffffff8111dbfa>] worker_thread+0x4a/0x470
> /proc/29744/stack-[<ffffffff8111dbb0>] worker_thread+0x0/0x470
> /proc/29744/stack-[<ffffffff81122e4d>] kthread+0xbd/0xe0
> /proc/29744/stack-[<ffffffff8181717f>] ret_from_fork+0x1f/0x40
> /proc/29744/stack-[<ffffffff81122d90>] kthread+0x0/0xe0
> /proc/29744/stack-[<ffffffffffffffff>] 0xffffffffffffffff
> --
> /proc/3426/stack-[<ffffffff8115ea31>] try_to_del_timer_sync+0x41/0x60
> /proc/3426/stack-[<ffffffff8115ea94>] del_timer_sync+0x44/0x50
> /proc/3426/stack-[<ffffffff8112c1c0>] default_wake_function+0x0/0x10
> /proc/3426/stack-[<ffffffff8111b35f>] flush_workqueue+0x12f/0x540
> /proc/3426/stack:[<ffffffffa015376b>] do_rbd_remove.isra.25+0xfb/0x190 [rbd]
> /proc/3426/stack-[<ffffffff812b7f3a>] kernfs_fop_write+0x10a/0x190
> /proc/3426/stack-[<ffffffff8124dc63>] __vfs_write+0x23/0x120
> /proc/3426/stack-[<ffffffff8124e8f3>] vfs_write+0xb3/0x1a0
> /proc/3426/stack-[<ffffffff8124fbd2>] SyS_write+0x42/0xa0
> /proc/3426/stack-[<ffffffff81816f72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> /proc/3426/stack-[<ffffffffffffffff>] 0xffffffffffffffff
>
>  # ps aux | egrep '(14109|29744|3426)'
> root      3426  0.0  0.0 181256 10488 ?        Dl   Jul06   0:00 rbd
> unmap /dev/rbd0
> root     14109  0.0  0.0 246704 10228 ?        Sl   Jul03   0:01 rbd
> map --pool XXXX XXXXXXX
> root     29744  0.0  0.0      0     0 ?        D    Jul05   0:00 [kworker/u16:2]

Files

dmesg-T.txt (184 KB) dmesg-T.txt Victor Payno, 07/27/2016 01:47 AM
dmesg-T-sysrq_t.txt (305 KB) dmesg-T-sysrq_t.txt Victor Payno, 07/27/2016 01:47 AM
dmesg-T-sysrq_w.txt (256 KB) dmesg-T-sysrq_w.txt Victor Payno, 07/27/2016 01:47 AM
ceph-osd.2.log.trimed.txt-aa (1000 KB) ceph-osd.2.log.trimed.txt-aa Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ab (1000 KB) ceph-osd.2.log.trimed.txt-ab Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ad (1000 KB) ceph-osd.2.log.trimed.txt-ad Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ac (1000 KB) ceph-osd.2.log.trimed.txt-ac Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ae (1000 KB) ceph-osd.2.log.trimed.txt-ae Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-af (1000 KB) ceph-osd.2.log.trimed.txt-af Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ag (1000 KB) ceph-osd.2.log.trimed.txt-ag Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ah (1000 KB) ceph-osd.2.log.trimed.txt-ah Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ai (1000 KB) ceph-osd.2.log.trimed.txt-ai Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-aj (1000 KB) ceph-osd.2.log.trimed.txt-aj Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ak (1000 KB) ceph-osd.2.log.trimed.txt-ak Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-al (1000 KB) ceph-osd.2.log.trimed.txt-al Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-am (1000 KB) ceph-osd.2.log.trimed.txt-am Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ao (1000 KB) ceph-osd.2.log.trimed.txt-ao Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-an (1000 KB) ceph-osd.2.log.trimed.txt-an Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ap (1000 KB) ceph-osd.2.log.trimed.txt-ap Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-aq (1000 KB) ceph-osd.2.log.trimed.txt-aq Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-ar (1000 KB) ceph-osd.2.log.trimed.txt-ar Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-as (1000 KB) ceph-osd.2.log.trimed.txt-as Victor Payno, 07/27/2016 02:04 AM
ceph-osd.2.log.trimed.txt-at (1000 KB) ceph-osd.2.log.trimed.txt-at Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-au (1000 KB) ceph-osd.2.log.trimed.txt-au Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-av (1000 KB) ceph-osd.2.log.trimed.txt-av Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-aw (1000 KB) ceph-osd.2.log.trimed.txt-aw Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-ax (1000 KB) ceph-osd.2.log.trimed.txt-ax Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-ay (1000 KB) ceph-osd.2.log.trimed.txt-ay Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-az (1000 KB) ceph-osd.2.log.trimed.txt-az Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-ba (1000 KB) ceph-osd.2.log.trimed.txt-ba Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bb (1000 KB) ceph-osd.2.log.trimed.txt-bb Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bc (1000 KB) ceph-osd.2.log.trimed.txt-bc Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-be (1000 KB) ceph-osd.2.log.trimed.txt-be Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bd (1000 KB) ceph-osd.2.log.trimed.txt-bd Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bf (1000 KB) ceph-osd.2.log.trimed.txt-bf Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bg (1000 KB) ceph-osd.2.log.trimed.txt-bg Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bh (1000 KB) ceph-osd.2.log.trimed.txt-bh Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bi (1000 KB) ceph-osd.2.log.trimed.txt-bi Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bj (1000 KB) ceph-osd.2.log.trimed.txt-bj Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bl (1000 KB) ceph-osd.2.log.trimed.txt-bl Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bk (1000 KB) ceph-osd.2.log.trimed.txt-bk Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bm (1000 KB) ceph-osd.2.log.trimed.txt-bm Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bo (914 KB) ceph-osd.2.log.trimed.txt-bo Victor Payno, 07/27/2016 02:05 AM
ceph-osd.2.log.trimed.txt-bn (1000 KB) ceph-osd.2.log.trimed.txt-bn Victor Payno, 07/27/2016 02:05 AM
dmesg-full.txt-aa (1000 KB) dmesg-full.txt-aa Victor Payno, 07/27/2016 03:20 AM
dmesg-full.txt-ab (1000 KB) dmesg-full.txt-ab Victor Payno, 07/27/2016 03:20 AM
dmesg-full.txt-ac (1000 KB) dmesg-full.txt-ac Victor Payno, 07/27/2016 03:20 AM
dmesg-full.txt-ad (1000 KB) dmesg-full.txt-ad Victor Payno, 07/27/2016 03:20 AM
dmesg-full.txt-ae (1000 KB) dmesg-full.txt-ae Victor Payno, 07/27/2016 03:20 AM
dmesg-full.txt-af (221 KB) dmesg-full.txt-af Victor Payno, 07/27/2016 03:20 AM

Related issues 1 (1 open0 closed)

Related to Linux kernel client - Feature #12874: krbd: get rid of RBD_MAX_SNAP_COUNTNewIlya Dryomov08/31/2015

Actions
Actions #1

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from New to Need More Info

ceph-devel thread: http://www.spinics.net/lists/ceph-devel/msg31257.html

On Wed, Jul 13, 2016 at 9:38 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Wed, Jul 13, 2016 at 2:38 AM, Patrick McLean <patrickm@gaikai.com> wrote:
>> On Sat, Jul 9, 2016 at 7:30 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>
>>> [...]
>>>
>>> Can you set "debug ms = 1" on all OSDs and try to reproduce?
>>> I'd also like to see the content of /sys/kernel/debug/ceph/*/osdc when
>>> it happens again.
>>
>> Here is the contents of the osdc debug file on a machine that had been
>> in that state for 3 days.
>>
>> # cat /sys/kernel/debug/ceph/*/osdc
>> REQUESTS 1 homeless 0
>> 7 osd38 4.27a8d388 [38,16,47]/38 [38,16,47]/38 rbd_header.bea4a16ebd6a9a 0x400011 1 0'0 call
>> LINGER REQUESTS
>> 1 osd38 4.27a8d388 [38,16,47]/38 [38,16,47]/38 rbd_header.bea4a16ebd6a9a 0x24 0 WC/0
>
> OK, so it's waiting for a reply.  Do you have a debug ms = 1 log for
> osd38?

Updated by Victor Payno almost 8 years ago

  1. cat /sys/kernel/debug/ceph/*/osdc
    REQUESTS 1 homeless 0
    322452 osd2 37.a819967 [2]/2 [2]/2 rbd_header.d960431b30e2f 0x400019 3 0'0 call
    LINGER REQUESTS
    167 osd2 37.a819967 [2]/2 [2]/2 rbd_header.d960431b30e2f 0x24 2 WC/0

Updated by Victor Payno almost 8 years ago

Apologies for the 41 log files.

We were unable to upload the single file compressed with xz, gz or bzip2 (1.1MB, 1.9MB, 1.4MB).

The uncompressed file was also too large at 40MB.

Actions #8

Updated by Patrick McLean almost 8 years ago

I will also note here that enabling lockdep in the kernel on the client side makes the problem go away for some reason. No logs or reports from lockdep either.

Updated by Victor Payno almost 8 years ago

Same problem again. Couldn't upload a 5MB file. It reads "(Maximum size: 73.4 MB)" on the form.

Actions #10

Updated by Ilya Dryomov almost 8 years ago

Sorry about that - I've raised the upload issue internally. For now, just email the logs directly to me.

You've trimmed too much - I need the context for tid 322452, so I'm looking for "client.1246958" ":322452" or so. The easiest thing would be to compress and send the whole log from 2016-07-26 23:06:46 to 2016-07-27 00:56:59.

Also, just to validate the log - that image has hundreds of snapshots, right?

Actions #11

Updated by Victor Payno over 7 years ago

I'll attach the wider log to an email.

Yes, it has 697 snapshots. The overall average snapshot count per rdb is 355. The lowest number is 3 and the highest number is 708. Because of our load testing the number of snapshots could

Interestingly enough we can't map that rbd anymore.

$ rbd info test-rbd/3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7-willnotmap
rbd image '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7-willnotmap':
size 9536 MB in 2385 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.11e601379f78e7
format: 2
features: layering, striping
flags:
stripe unit: 4096 kB
stripe count: 1

When we try to mount it, it hangs here:

setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
open("/sys/bus/rbd/add_single_major", O_WRONLY) = 4
write(4, "192.168.2.63:6789,192.168.3.63:6789,"..., 147

Other rbds mount fine.

The cluster health is HEALTH_OK.

From Cliff:

Sifting through the osd2's log for "d960431b30e2f" I see this:

2016-07-27 00:47:47.199252 7f1ac7dbb700 1 -- 10.9.228.101:6814/25892 <== client.1252427 10.9.72.23:0/273648291 13 ==== osd_op(client.1252427.0:19 rbd_header.d960431b30e2f [call lock.lock] 37.a819967 ondisk+write+known_if_redirected e107453) v6 ==== 210+0+51 (4053684278 0 659001106) 0x55c9ec362000 con 0x55c9f0101600
2016-07-27 00:47:47.199515 7f1ad3539700 1 -- 10.9.228.101:6814/25892 --> 10.9.72.23:0/273648291 -- osd_op_reply(19 rbd_header.d960431b30e2f [call lock.lock] v0'0 uv0 ondisk = 16 ((16) Device or resource busy)) v6 - ?+0 0x55c9ee4b8b00 con 0x55c9f0101600

Actions #12

Updated by Victor Payno over 7 years ago

Because of our load testing the number of snapshots could

very well make it to 10k per rbd.

Actions #13

Updated by Ilya Dryomov over 7 years ago

That's exactly the problem.

For obscure implementation reasons, kernel client currently supports only 510 snapshots per image. Mapping an image with more than that hangs

[ 4986.527680] libceph: get_reply osd2 tid 322452 data 5332 > preallocated 4092, skipping

because the reply message is simply dropped on the floor and "rbd map" needs to be Ctrl-C'ed. I guess you just didn't notice it between all those ext4 warnings.

Lifting this limit is on the TODO list, but having this many snapshots is going to affect your bandwidth, among other things - snapshot contexts are transmitted with every write.
If you aren't just stress-testing and have a real-world use case, please share it.

Actions #14

Updated by Ilya Dryomov over 7 years ago

  • Related to Feature #12874: krbd: get rid of RBD_MAX_SNAP_COUNT added
Actions #15

Updated by Victor Payno over 7 years ago

Thanks, I did see those messages but since there's so much noise from all the activity it was hard to go hey that's the problem specially since it looks harmless.

For our use case we're going to limit it to 300ish snapshots for now and we may end up going down to 50ish snapshots (which we do need them for various reasons) after we do some benchmarking.

As for the EXT4 errors we have code to fsck them as part of the mounting process, it's just disabled so we could see what kind of corruptions we could see. Most of them seem to be from all the hangs (that forced us to reset systems to unmap them) that have been damaging file systems.

Actions #16

Updated by Ilya Dryomov over 7 years ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF