Project

General

Profile

Bug #8818

IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]

Added by Greg Wilson over 7 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

While conducting dd read and write tests to a raw rbd device for baseline performance tests we experience an IO hang. The result is the dd never finishes and can only be cleared by a reboot. We started testing with firefly on Ubuntu with a 3.15 kernel but have upgraded to 3.16 in case there were changes that might have impacted this situation:

  1. cat /proc/version
    Linux version 3.16.0-031600rc4-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201407061635 SMP Sun Jul 6 20:36:26 UTC 2014
  1. ceph -v
    ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

The script we are running does a dd write and then a dd read using various block sizes. The script consistently hangs but never on the same dd command or block size and has failed on both a dd write and a dd read. When the error occurs the following messages appear in the system log:

Jul 11 16:35:37 ks2-p1 kernel: [ 4325.515421] INFO: task kworker/6:1:2739 blocked for more than 120 seconds.
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.523177] Not tainted 3.16.0-031600rc4-generic #201407061635
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.530193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538849] kworker/6:1 D 0000000000000006 0 2739 2 0x00000000
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538856] Workqueue: ceph-msgr con_work [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538857] ffff880bf8573b58 0000000000000002 ffff880bf8573c08 ffff880bf8573fd8
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538858] 0000000000014440 0000000000014440 ffff8817f96e64c0 ffff880bf9061930
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538860] 0000000000000000 ffff880bfc7567d0 ffff880bfc7567d4 ffff880bf9061930
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538862] Call Trace:
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538864] [<ffffffff8179d2b9>] schedule+0x29/0x70
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538866] [<ffffffff8179d5de>] schedule_preempt_disabled+0xe/0x10
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538868] [<ffffffff8179f3f5>] __mutex_lock_slowpath+0xd5/0x1c0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538870] [<ffffffff8179f503>] mutex_lock+0x23/0x37
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538875] [<ffffffffc061808d>] get_reply.isra.30+0x3d/0x240 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538886] [<ffffffffc0618322>] alloc_msg+0x92/0xa0 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538894] [<ffffffffc0610d01>] ceph_con_in_msg_alloc+0x71/0x1e0 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538898] [<ffffffffc0611080>] read_partial_message+0x210/0x4e0 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538900] [<ffffffff8166c926>] ? kernel_recvmsg+0x46/0x60
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538904] [<ffffffffc060cfd8>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538920] [<ffffffffc0611608>] try_read+0x2b8/0x5a0 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538928] [<ffffffffc0611c21>] con_work+0x91/0x290 [libceph]
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538931] [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538933] [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538935] [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538937] [<ffffffff81096479>] kthread+0xc9/0xe0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538940] [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538942] [<ffffffff817a13fc>] ret_from_fork+0x7c/0xb0
Jul 11 16:35:37 ks2-p1 kernel: [ 4325.538944] [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0

I have attached sample dd script and a pared down version of the kernel.log.

bug_10jul2014.txt.gz (10.9 KB) Greg Wilson, 07/11/2014 09:22 PM

10GB_raw_dd.sh View (2.6 KB) Greg Wilson, 07/11/2014 09:22 PM

dump.txt View (428 KB) Xavier Trilla, 07/16/2014 11:52 AM

kern-log.tar.gz (115 KB) Greg Wilson, 07/16/2014 04:34 PM

configs.tar.gz (123 KB) Greg Wilson, 07/25/2014 07:52 AM


Related issues

Duplicated by Linux kernel client - Bug #8464: krbd: deadlock Resolved 05/29/2014

History

#1 Updated by Xavier Trilla over 7 years ago

Hi,

We are experimenting exactly the same problem when running several concurrent dd operations to a kernel mounted rbd image.

We have tested ubuntu kernels 3.15.0 and 3.15.5 with same results. We have also updated all machines (OSDs, MONs and clients) to ceph to 0.80.4 but the error persists.

In our setup it's quite easy to reproduce the error, we just launch against the same rbd image 4 simultaneous instances of dd.

dd command line:

dd if=/dev/zero of=test-1 bs=64k count=160k conv=fdatasync &
dd if=/dev/zero of=test-2 bs=64k count=160k conv=fdatasync &
dd if=/dev/zero of=test-3 bs=64k count=160k conv=fdatasync &
dd if=/dev/zero of=test-4 bs=64k count=160k conv=fdatasync &

(Each file should create a 10gb file)

RBD image filesystem is ext4.

Thanks.

#2 Updated by Zheng Yan over 7 years ago

when this happens again, execute 'echo t > /proc/sysrq-trigger' and upload the kernel message.

By the way, are there the rbd clients and OSDs on the same machine?

#3 Updated by Xavier Trilla over 7 years ago

Hi Zheng,

No problem, in our setup it's pretty easy to reproduce the error.

Please find the output attached. (Using Ubuntu kernel 3.15.5)

OSDs and RBD clients are on different machines.

If you need anything else please just let me know.

Thanks

#4 Updated by Greg Wilson over 7 years ago

The problem is also very repeatable at our site as well. Attached is the kern.log file after running the requested command.

The rbd client we are running the dd tests on is a separate machine from our OSD servers.

#5 Updated by Zheng Yan over 7 years ago

  • Status changed from New to 12
  • Assignee set to Ilya Dryomov
  • Priority changed from Normal to Urgent

looks like a ABBA deadlock between ceph_connection->mutex and ceph_osd_client->request_mutex.


 kworker/5:2     D 0000000000000005     0  2762      2 0x00000000
 Workqueue: ceph-msgr con_work [libceph]
  ffff880bf3a47b58 0000000000000002 ffff880bf3a47c08 ffff880bf3a47fd8
  0000000000014440 0000000000014440 ffff8817f9704b90 ffff880bf9681930
  0000000000000000 ffff880bfc7b17d0 ffff880bfc7b17d4 ffff880bf9681930
 Call Trace:
  [<ffffffff8179d2b9>] schedule+0x29/0x70
  [<ffffffff8179d5de>] schedule_preempt_disabled+0xe/0x10
  [<ffffffff8179f3f5>] __mutex_lock_slowpath+0xd5/0x1c0
  [<ffffffff8179f503>] mutex_lock+0x23/0x37                                <- acquiring request mutex
  [<ffffffffc049408d>] get_reply.isra.30+0x3d/0x240 [libceph]
  [<ffffffffc0494322>] alloc_msg+0x92/0xa0 [libceph]
  [<ffffffffc048cd01>] ceph_con_in_msg_alloc+0x71/0x1e0 [libceph]
  [<ffffffffc048d080>] read_partial_message+0x210/0x4e0 [libceph]
  [<ffffffff8166c926>] ? kernel_recvmsg+0x46/0x60
  [<ffffffffc0488fd8>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
  [<ffffffffc048d608>] try_read+0x2b8/0x5a0 [libceph]
  [<ffffffffc048dc21>] con_work+0x91/0x290 [libceph]                        <- connection mutex is acquired
  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
  [<ffffffff81096479>] kthread+0xc9/0xe0
  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
  [<ffffffff817a13fc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
 dd              D ffffffff81814ae0     0  2779   2615 0x00000000
  ffff880036777548 0000000000000002 0000000000000002 ffff880036777fd8
  0000000000014440 0000000000014440 ffff8817f8f23260 ffff880bf73e8000
  ffff88003662cb90 ffff880bf6e6c1b8 ffff88003662cb90 ffff880bf73e8000
 Call Trace:
  [<ffffffff8179d2b9>] schedule+0x29/0x70
  [<ffffffff8179d5de>] schedule_preempt_disabled+0xe/0x10
  [<ffffffff8179f4d5>] __mutex_lock_slowpath+0x1b5/0x1c0
  [<ffffffffc049ad1e>] ? ceph_calc_pg_acting+0x10e/0x160 [libceph]
  [<ffffffff8179f503>] mutex_lock+0x23/0x37                                          <- acquiring connection mutex
  [<ffffffffc048de7c>] ceph_con_send+0x5c/0x200 [libceph]
  [<ffffffffc0491ffb>] ? __map_request+0x17b/0x5b0 [libceph]
  [<ffffffffc04933ce>] ? ceph_osdc_build_request+0x14e/0x280 [libceph]
  [<ffffffffc0493acc>] __send_request+0x10c/0x170 [libceph]
  [<ffffffffc0493b7e>] __send_queued+0x4e/0x90 [libceph]
  [<ffffffffc049522b>] __ceph_osdc_start_request+0x5b/0x110 [libceph]
  [<ffffffffc0506c75>] ? rbd_img_request_fill+0xb5/0x800 [rbd]
  [<ffffffffc0495331>] ceph_osdc_start_request+0x51/0x80 [libceph]                   <- request mutex is acquired
  [<ffffffffc05027f3>] rbd_obj_request_submit+0x33/0x70 [rbd]
  [<ffffffffc05076ba>] rbd_img_obj_request_submit+0xaa/0x100 [rbd]
  [<ffffffffc0507766>] rbd_img_request_submit+0x56/0x80 [rbd]
  [<ffffffffc05081ac>] rbd_request_fn+0x2ac/0x350 [rbd]
  [<ffffffff81355ce7>] __blk_run_queue+0x37/0x50
  [<ffffffff813565cd>] queue_unplugged+0x3d/0xc0
  [<ffffffff8135bcf2>] blk_flush_plug_list+0x1d2/0x210
  [<ffffffff8179d378>] io_schedule+0x78/0xd0
  [<ffffffff81359de0>] get_request+0x150/0x350
  [<ffffffff8135b397>] ? bio_attempt_back_merge+0x37/0x100
  [<ffffffff810bab60>] ? __wake_up_sync+0x20/0x20
  [<ffffffff8135be4c>] blk_queue_bio+0xcc/0x380
  [<ffffffff81356de5>] generic_make_request.part.75+0x75/0xb0
  [<ffffffff81357408>] generic_make_request+0x68/0x70
  [<ffffffff81357496>] submit_bio+0x86/0x160
  [<ffffffff8121bd48>] do_blockdev_direct_IO+0xa88/0xcf0
  [<ffffffff810aa415>] ? sched_clock_cpu+0x85/0xc0
  [<ffffffff81216ab0>] ? I_BDEV+0x10/0x10
  [<ffffffff8121bffc>] __blockdev_direct_IO+0x4c/0x50
  [<ffffffff81216ab0>] ? I_BDEV+0x10/0x10
  [<ffffffff81217481>] blkdev_direct_IO+0x51/0x60
  [<ffffffff81216ab0>] ? I_BDEV+0x10/0x10
  [<ffffffff8116cc8b>] generic_file_direct_write+0xbb/0x180
  [<ffffffff812006c8>] ? __mnt_want_write+0x58/0x70
  [<ffffffff8116cff7>] __generic_file_write_iter+0x2a7/0x360
  [<ffffffff81216e46>] blkdev_write_iter+0x36/0xb0
  [<ffffffff811e041b>] new_sync_write+0x7b/0xb0
  [<ffffffff811e1137>] vfs_write+0xc7/0x1f0
  [<ffffffff811e163f>] SyS_write+0x4f/0xb0
  [<ffffffff817a16bf>] tracesys+0xe1/0xe6

#6 Updated by Josh Durgin over 7 years ago

Somone reported seeing this same issue in 3.15.2 but not in 3.13.7

#7 Updated by Xavier Trilla over 7 years ago

Josh Durgin wrote:

Somone reported seeing this same issue in 3.15.2 but not in 3.13.7

Ubuntu Kernel 3.14.9 also works well. Issue seems to be introduced in 3.15.

#8 Updated by Greg Wilson over 7 years ago

We also confirmed the issue is with 3.15 above as we back rev'd our system to 3.14.13 and were able to successfully complete our tests.

#9 Updated by Ilya Dryomov over 7 years ago

I have been trying to reproduce this with both parallel dds and the attached script, but so far no luck. We've seen this deadlock before, but then it just stopped happening. The interesting observation is that the first occurrence coincided with the rebase of our testing branch to 3.15.

Xavier or Greg, do you have CONFIG_LOCKDEP enabled in your kernels?

#10 Updated by Greg Wilson over 7 years ago

We are using the standard kernels as they come from the ubuntu site. That being said when I look in the config files I do see CONFIG_LOCKDEP_SUPPORT set to y:

  1. grep LOCKDEP config-3.1*
    config-3.13.0-24-generic:CONFIG_LOCKDEP_SUPPORT=y
    config-3.14.13-031413-generic:CONFIG_LOCKDEP_SUPPORT=y
    config-3.15.0-031500-generic:CONFIG_LOCKDEP_SUPPORT=y

These config files are from a different machine in our cluster that we started using as an NFS gateway earlier this week. It also hit the same issue on Wednesday when running a 3.15 kernel. We moved it back to 3.14.13 and have been running without issue since. I'm uploading the kernel config files for you to see.

#11 Updated by Ilya Dryomov over 7 years ago

OK, so lockdep is disabled. Since you are able to reproduce it so reliably, can you try our testing kernel? The package is at

http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/testing/linux-image-3.16.0-rc4-ceph-00033-gf18244b_3.16.0-rc4-ceph-00033-gf18244b-1_amd64.deb

It's based on Ubuntu config, so you should be able to boot it.

#13 Updated by Greg Wilson over 7 years ago

Hi Ilya,

I can reliably reproduce the error when running this generic kernel with no changes:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16-rc4-utopic/linux-image-3.16.0-031600rc4-generic_3.16.0-031600rc4.201407061635_amd64.deb

It usually takes less than 15 minutes to hit the issue; even faster recently.

I downloaded and installed the ceph kernel you linked to and have been running it without a failure since Saturday night. It looks like whatever has been changed prevents the issue. I'm going to compare the differences between the config files but is there documentation on what changes exist between the generic and your ceph image?

#14 Updated by Ilya Dryomov over 7 years ago

I'm pretty sure it's the disabled lockdep that affects this. Our testing kernel is built with lockdep enabled, Ubuntu stock kernels - aren't. I'll try reproducing it some more and will get back to you tomorrow. Thanks for your help so far.

#15 Updated by Karl Austin over 7 years ago

I can reproduce this with 100% certainty now on Trusty, 3.15.6-031506-generic.

Running:

bonnie++ -n 512

against a mounted rbd (OSDs on 3 other servers) that comprises 6 x SSD OSD. When it was 3 OSD in the pool it was okay, the move to 6 triggered it every time.

Anything we can do to help/test, let me know.

#16 Updated by Ilya Dryomov over 7 years ago

Update: At this point I'm almost certain this is not an rbd/ceph problem. Trying to track down the exact culprit.

#18 Updated by Eric Eastman over 7 years ago

The deadlock-bad kernel showed the error after a few minutes of running multiple dd writes to rbd device. Here is one of many entries in the /var/log/kern.log file:

Jul 30 22:07:16 t16 kernel: [ 601.225863] INFO: task kworker/7:7:1959 blocked for more than 120 seconds.
Jul 30 22:07:16 t16 kernel: [ 601.225926] Tainted: G E 3.16.0-rc7-ceph-00100-g1e9afaa #1
Jul 30 22:07:16 t16 kernel: [ 601.225988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 22:07:16 t16 kernel: [ 601.226066] kworker/7:7 D 0000000000000000 0 1959 2 0x00000000
Jul 30 22:07:16 t16 kernel: [ 601.226074] Workqueue: ceph-msgr con_work [libceph]
Jul 30 22:07:16 t16 kernel: [ 601.226078] ffff8800318e7c18 0000000000000046 0000000000000000 0000000000000000
Jul 30 22:07:16 t16 kernel: [ 601.226086] ffff8800368cb240 0000000000012fc0 ffff8800318e7fd8 0000000000012fc0
Jul 30 22:07:16 t16 kernel: [ 601.226087] ffff88007c0e9920 ffff8800368cb240 0000000000000000 ffff880036a1d7c8
Jul 30 22:07:16 t16 kernel: [ 601.226088] Call Trace:
Jul 30 22:07:16 t16 kernel: [ 601.226090] [<ffffffff816ba789>] schedule+0x29/0x70
Jul 30 22:07:16 t16 kernel: [ 601.226091] [<ffffffff816baa5e>] schedule_preempt_disabled+0xe/0x10
Jul 30 22:07:16 t16 kernel: [ 601.226093] [<ffffffff816bc4c3>] __mutex_lock_slowpath+0xd3/0x1c0
Jul 30 22:07:16 t16 kernel: [ 601.226094] [<ffffffff816bc5d3>] mutex_lock+0x23/0x37
Jul 30 22:07:16 t16 kernel: [ 601.226097] [<ffffffffa01fa88f>] alloc_msg+0xcf/0x230 [libceph]
Jul 30 22:07:16 t16 kernel: [ 601.226100] [<ffffffffa01f5fd6>] con_work+0x19d6/0x2020 [libceph]
Jul 30 22:07:16 t16 kernel: [ 601.226101] [<ffffffff81092c4c>] ? put_prev_entity+0x3c/0x320
Jul 30 22:07:16 t16 kernel: [ 601.226102] [<ffffffff81095c72>] ? pick_next_task_fair+0x3e2/0x4e0
Jul 30 22:07:16 t16 kernel: [ 601.226104] [<ffffffff81070f34>] process_one_work+0x174/0x4a0
Jul 30 22:07:16 t16 kernel: [ 601.226105] [<ffffffff81071c5b>] worker_thread+0x11b/0x530
Jul 30 22:07:16 t16 kernel: [ 601.226107] [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
Jul 30 22:07:16 t16 kernel: [ 601.226108] [<ffffffff81078829>] kthread+0xc9/0xe0
Jul 30 22:07:16 t16 kernel: [ 601.226110] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Jul 30 22:07:16 t16 kernel: [ 601.226112] [<ffffffff816be12c>] ret_from_fork+0x7c/0xb0
Jul 30 22:07:16 t16 kernel: [ 601.226113] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

The deadlock-good kernel has now been running the same test, in a loop, for over 3 hours without any errors reported in /var/log/kern.log. I will let it run overnight and will update the ticket only if it starts reporting issues.

#19 Updated by Greg Wilson over 7 years ago

I have done some testing and I am seeing the same thing as Eric. With the deadlock-bad kernel I hit the deadlock issue in less than 15 minutes. With the deadlock-good kernel I have been running for several hours and have not had an issue. I will be running against it all weekend and will update the ticket if an issue is found.

#20 Updated by Ilya Dryomov over 7 years ago

No need to do that just yet. I now fully understand the problem and working on a proper fix that I'd like you to test later. deadlock-good is simply a revert of a core kernel commit that changed the semantics of blocking lock primitives and made this deadlock possible.

#22 Updated by Eric Eastman over 7 years ago

The 00036 "bad" kernel started showing the problem in the /var/log/kern.log file within minutes of starting my test script which runs multiple dd writes to the raw rbd device. An example of the failure reported in kern.log:

Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764346] INFO: task kworker/1:0:15 blocked for more than 120 seconds.
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764355] Tainted: G E 3.16.0-ceph-00036-gbd78ef2 #1
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764358] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764361] kworker/1:0 D 0000000000000000 0 15 2 0x00000000
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764405] Workqueue: ceph-msgr con_work [libceph]
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764410] ffff880bfce87c18 0000000000000046 0000000000000296 0000000000000000
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764415] ffff880bfce6cb60 0000000000012fc0 ffff880bfce87fd8 0000000000012fc0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764420] ffff880bfb251920 ffff880bfce6cb60 0000000000000000 ffff880bf796a7c8
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764424] Call Trace:
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764440] [<ffffffff816ba789>] schedule+0x29/0x70
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764446] [<ffffffff816baa5e>] schedule_preempt_disabled+0xe/0x10
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764452] [<ffffffff816bc4c3>] __mutex_lock_slowpath+0xd3/0x1c0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764457] [<ffffffff816bc5d3>] mutex_lock+0x23/0x37
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764469] [<ffffffffa017788f>] alloc_msg+0xcf/0x230 [libceph]
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764479] [<ffffffffa0172fd6>] con_work+0x19d6/0x2020 [libceph]
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764487] [<ffffffff81095c56>] ? pick_next_task_fair+0x3c6/0x4e0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764493] [<ffffffff81070f34>] process_one_work+0x174/0x4a0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764498] [<ffffffff81071c5b>] worker_thread+0x11b/0x530
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764503] [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764509] [<ffffffff81078829>] kthread+0xc9/0xe0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764515] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764521] [<ffffffff816be12c>] ret_from_fork+0x7c/0xb0
Aug 5 17:38:42 ic1gw01 kernel: [ 3243.764526] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

The "good" 00037 kernel has been running the same dd write test script for about 2 hours now without reporting any issues. It looks like your changes to the rbd driver has fixed the problem. I will leave it run overnight, and only update the bug report if it fails.

#23 Updated by Greg Wilson over 7 years ago

I switched to the good kernel (3.16.0-ceph-00037-g0532581) yesterday and re-ran my scripts overnight. The scripts completed without issue so I concur with Eric that the changes in 00037 seem to have fixed the mutex issue.

#24 Updated by Ilya Dryomov over 7 years ago

  • Status changed from 12 to Resolved

OK, thanks everybody.

rbd: rework rbd_request_fn()

#25 Updated by German Anders over 7 years ago

Upgrade to kernel: 3.16.0 and got the same problem:

...
[ 70.858716] Key type ceph registered
[ 70.858800] libceph: loaded (mon/osd proto 15/24)
[ 70.863011] rbd: loaded (major 251)
[ 70.867935] libceph: client5437 fsid 56e56e4c-ea59-4157-8b98-acae109bebe1
[ 70.869338] libceph: mon1 10.97.10.2:6789 session established
[ 70.879934] rbd0: unknown partition table
[ 70.880354] rbd: rbd0: added with size 0x3200000000
[ 164.737028] do_IRQ: 6.104 No irq handler for vector (irq -1)
[ 360.696573] INFO: task fio:3119 blocked for more than 120 seconds.
[ 360.699202] Tainted: G I 3.16.1-031601-generic #201408140014
[ 360.702248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.705625] fio D 0000000000000000 0 3119 3117 0x00000000
[ 360.705631] ffff880234347818 0000000000000086 ffff880234347828 ffff880234347fd8
[ 360.705639] 0000000000014400 0000000000014400 ffff880139f4c6e0 ffff8800b9e7b2a0
[ 360.705643] ffff8802343477e8 ffff88013bc14d00 ffff8800b9e7b2a0 0000000000000000
[ 360.705646] Call Trace:
[ 360.705658] [<ffffffff81790129>] schedule+0x29/0x70
[ 360.705667] [<ffffffff817901ff>] io_schedule+0x8f/0xd0
[ 360.705673] [<ffffffff8135a9b0>] get_request+0x150/0x350
[ 360.705679] [<ffffffff8135bf67>] ? bio_attempt_back_merge+0x37/0x100
[ 360.705687] [<ffffffff810bab00>] ? __wake_up_sync+0x20/0x20
[ 360.705691] [<ffffffff8135ca1c>] blk_queue_bio+0xcc/0x380
[ 360.705697] [<ffffffff813579b5>] generic_make_request.part.75+0x75/0xb0
[ 360.705701] [<ffffffff81357fd8>] generic_make_request+0x68/0x70
[ 360.705704] [<ffffffff81358066>] submit_bio+0x86/0x160
[ 360.705711] [<ffffffff8177d4a1>] ? put_compound_page.part.21+0xf8/0x12f
[ 360.705718] [<ffffffff8121b668>] do_direct_IO+0xa78/0xf60
[ 360.705722] [<ffffffff8121beaa>] do_blockdev_direct_IO+0x35a/0xcc0
[ 360.705727] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
[ 360.705731] [<ffffffff8121c85c>] __blockdev_direct_IO+0x4c/0x50
[ 360.705734] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
[ 360.705738] [<ffffffff81217d41>] blkdev_direct_IO+0x51/0x60
[ 360.705741] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
[ 360.705747] [<ffffffff8116d9f3>] generic_file_read_iter+0x143/0x150
[ 360.705750] [<ffffffff812174e0>] ? ioctl_by_bdev+0x50/0x50
[ 360.705754] [<ffffffff81217517>] blkdev_read_iter+0x37/0x40
[ 360.705758] [<ffffffff81229c57>] aio_run_iocb+0x207/0x3a0
[ 360.705761] [<ffffffff81229f0b>] io_submit_one+0x11b/0x260
[ 360.705768] [<ffffffff81013730>] ? __switch_to+0x260/0x5c0
[ 360.705771] [<ffffffff8122a424>] do_io_submit+0x104/0x1d0
[ 360.705775] [<ffffffff8122a500>] SyS_io_submit+0x10/0x20
[ 360.705780] [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
...

Then I upgrade to 3.16.1 and the error continues. The test is with "fio", I'm running the following command:

fio --filename=/dev/rbd0 --direct=1 --rw=read --bs=8m --size=10G --iodepth=16 --ioengine=libaio --runtime=60 --group_reporting --name=file1

#26 Updated by Eric Eastman over 7 years ago

The fix looks like it made it into 3.17rc1. I have been testing this kernel since Sunday, and have not triggered the problem.
I do not think it has been pushed into any 3.16.x kernel yet

#27 Updated by Ilya Dryomov over 7 years ago

Eric is correct, the fix isn't in 3.16 stable yet, and unfortunately won't be in 3.15 at all - Linus pulled it into his tree on the day the last 3.15 stable release was cut.

OTOH, the stack trace doesn't look the same to me. German, what exactly are you seeing - does all rbd activity halt eventually? Are there any other hung tasks in dmesg?

#28 Updated by German Anders over 7 years ago

Ok, I've applied the "..." with Kernel 3.16.0 and the error continues:

...
Aug 21 14:38:45 mail02-old kernel: [ 7206.274363] INFO: task fio:3590 blocked for more than 120 seconds.
Aug 21 14:38:45 mail02-old kernel: [ 7206.276954] Tainted: G I 3.16.1-031601-generic #201408140014
Aug 21 14:38:45 mail02-old kernel: [ 7206.279969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 21 14:38:45 mail02-old kernel: [ 7206.283310] fio D 0000000000000002 0 3590 3580 0x00000000
Aug 21 14:38:45 mail02-old kernel: [ 7206.283316] ffff880139513768 0000000000000086 ffff880139513778 ffff880139513fd8
Aug 21 14:38:45 mail02-old kernel: [ 7206.283320] 0000000000014400 0000000000014400 ffff8802366a0000 ffff880234982880
Aug 21 14:38:45 mail02-old kernel: [ 7206.283323] ffff880139513738 ffff88013bc54d00 ffff880234982880 0000000000000010
Aug 21 14:38:45 mail02-old kernel: [ 7206.283327] Call Trace:
Aug 21 14:38:45 mail02-old kernel: [ 7206.283339] [<ffffffff81790129>] schedule+0x29/0x70
Aug 21 14:38:45 mail02-old kernel: [ 7206.283342] [<ffffffff817901ff>] io_schedule+0x8f/0xd0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283348] [<ffffffff8135a9b0>] get_request+0x150/0x350
Aug 21 14:38:45 mail02-old kernel: [ 7206.283352] [<ffffffff8135bf67>] ? bio_attempt_back_merge+0x37/0x100
Aug 21 14:38:45 mail02-old kernel: [ 7206.283360] [<ffffffff810bab00>] ? __wake_up_sync+0x20/0x20
Aug 21 14:38:45 mail02-old kernel: [ 7206.283363] [<ffffffff8135ca1c>] blk_queue_bio+0xcc/0x380
Aug 21 14:38:45 mail02-old kernel: [ 7206.283370] [<ffffffff813579b5>] generic_make_request.part.75+0x75/0xb0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283374] [<ffffffff81357fd8>] generic_make_request+0x68/0x70
Aug 21 14:38:45 mail02-old kernel: [ 7206.283377] [<ffffffff81358066>] submit_bio+0x86/0x160
Aug 21 14:38:45 mail02-old kernel: [ 7206.283384] [<ffffffff8177d4a1>] ? put_compound_page.part.21+0xf8/0x12f
Aug 21 14:38:45 mail02-old kernel: [ 7206.283391] [<ffffffff8121b668>] do_direct_IO+0xa78/0xf60
Aug 21 14:38:45 mail02-old kernel: [ 7206.283395] [<ffffffff8121beaa>] do_blockdev_direct_IO+0x35a/0xcc0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283400] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
Aug 21 14:38:45 mail02-old kernel: [ 7206.283404] [<ffffffff8121c85c>] __blockdev_direct_IO+0x4c/0x50
Aug 21 14:38:45 mail02-old kernel: [ 7206.283407] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
Aug 21 14:38:45 mail02-old kernel: [ 7206.283411] [<ffffffff81217d41>] blkdev_direct_IO+0x51/0x60
Aug 21 14:38:45 mail02-old kernel: [ 7206.283414] [<ffffffff81217370>] ? I_BDEV+0x10/0x10
Aug 21 14:38:45 mail02-old kernel: [ 7206.283420] [<ffffffff8116d3cb>] generic_file_direct_write+0xbb/0x180
Aug 21 14:38:45 mail02-old kernel: [ 7206.283423] [<ffffffff8116d737>] __generic_file_write_iter+0x2a7/0x360
Aug 21 14:38:45 mail02-old kernel: [ 7206.283427] [<ffffffff812176d0>] ? bdput+0x20/0x20
Aug 21 14:38:45 mail02-old kernel: [ 7206.283431] [<ffffffff81217706>] blkdev_write_iter+0x36/0xb0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283435] [<ffffffff81229c57>] aio_run_iocb+0x207/0x3a0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283438] [<ffffffff81229f0b>] io_submit_one+0x11b/0x260
Aug 21 14:38:45 mail02-old kernel: [ 7206.283441] [<ffffffff8122a424>] do_io_submit+0x104/0x1d0
Aug 21 14:38:45 mail02-old kernel: [ 7206.283445] [<ffffffff8122a500>] SyS_io_submit+0x10/0x20
Aug 21 14:38:45 mail02-old kernel: [ 7206.283451] [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
Aug 21 14:38:51 mail02-old kernel: [ 7212.391288] libceph: osd35 10.97.20.4:6824 socket closed (con state CONNECTING)
...

Same test "fio --filename=/dev/rbd0 --direct=1 --rw=read --bs=8m --size=10G --iodepth=16 --ioengine=libaio --runtime=60 --group_reporting --name=file1"

Any ideas?

#30 Updated by Ilya Dryomov over 7 years ago

Does fio complete eventually? Are there any other hung tasks in dmesg? A task blocking for more than 120 seconds is not an error in itself.

#31 Updated by German Anders over 7 years ago

It seems that using kernel 3.14.9 resolved the issue. Have tested for more than two days with lot of perf tests and run fine.

#32 Updated by Ilya Dryomov over 7 years ago

Well, the issue didn't get resolved, you simply found a kernel it doesn't show up on. I'm still interested to hear answers to the questions I posted above.

#33 Updated by Xavier Trilla over 7 years ago

German,

3.14.x will work well, but it will not work with TUNABLES V3, specifically chooseleaf_vary_r turned on. For that you need kernel >= 3.5.x and those are the kernels affected by the issue.

Ilya, on our experience all Debian and ubuntu kernels above 3.14.x are affected. I did not have the time to test 3.16.1 or the 3.16.0 you posted, but for what I've read the problem is solved on that specific one.

Do you have any idea about when the issue will be solved in >= 3.16.x mainstream kernels?

Thanks!

#34 Updated by Ilya Dryomov over 7 years ago

Xavier Trilla wrote:

German,

3.14.x will work well, but it will not work with TUNABLES V3, specifically chooseleaf_vary_r turned on. For that you need kernel >= 3.5.x and those are the kernels affected by the issue.

Ilya, on our experience all Debian and ubuntu kernels above 3.14.x are affected. I did not have the time to test 3.16.1 or the 3.16.0 you posted, but for what I've read the problem is solved on that specific one.

Do you have any idea about when the issue will be solved in >= 3.16.x mainstream kernels?

Let's not mix the two "issues". The deadlock, which this ticket refers
to, was introduced in 3.15 and has been fixed. The fix has been marked
for stable, and I imagine it should make the next stable release, that
is 3.16.2. (As I mentioned earlier, 3.16.1 along with 3.15.10 were
released on the day Linus pulled our fix into mainline, so 3.15 kernels
will stay affected forever - 3.15.10 was the last 3.15 stable release.)

Now, there is something else reported by German, which appears to have
been introduced in 3.15 as well. However German hasn't replied with
the details I asked for yet, so we don't know if it's the same issue,
a different issue or if it's an issue at all.

#35 Updated by Xavier Trilla over 7 years ago

Ilya sorry, I did a mistyping, I was trying to give some background to German about why it's important to have stable >= 3.15.x kernel, as it's the minor version needed for TUNNABLES V3, but I typed 3.5 instead of 3.15.x :/

Ok, well try 3.16.2 once it's released and we will let you know.

Thanks!

#36 Updated by German Anders about 7 years ago

Ilya Dryomov wrote:

Well, the issue didn't get resolved, you simply found a kernel it doesn't show up on. I'm still interested to hear answers to the questions I posted above.

The fio didn't finished, and a hard-reset of the server is needed.

#37 Updated by Chris Armstrong about 7 years ago

We're seeing this on CoreOS, kernel 3.15.8. Upgrading the kernel ourselves isn't an option, so this'll block RBD usage until they upgrade to a fixed kernel.

#38 Updated by Ilya Dryomov about 7 years ago

I'll try to reproduce German's fio problem.

To those who are stuck on 3.15 for some reason but can rebuild their kernel, rebuild with CONFIG_DEBUG_MUTEXES=y or try to get a "debug" kernel. This problem showed up due to a subtle change to the mutex fastpath, DEBUG_MUTEXES works around it.

#39 Updated by Guillaume Subiron about 7 years ago

Xavier Trilla wrote:

Ok, well try 3.16.2 once it's released and we will let you know.

Sadly, your patch has not been integrated in 3.16.2, so this new kernel release doesn't fix the but.

#40 Updated by Ilya Dryomov about 7 years ago

Guillaume Subiron wrote:

Xavier Trilla wrote:

Ok, well try 3.16.2 once it's released and we will let you know.

Sadly, your patch has not been integrated in 3.16.2, so this new kernel release doesn't fix the but.

Yeah, I've already sent an inquiry to the stable team. Some bad luck with this patch, huh.

#41 Updated by Ilya Dryomov about 7 years ago

The patch is in the queue, just haven't been applied yet: https://lkml.org/lkml/2014/9/3/918.

#42 Updated by Andrei Mikhailovsky about 7 years ago

Ilya,

I've tried the kernel that was patched, but I am still experiencing hang tasks. Although less often than before. Here is a recent hang task:

Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455012] INFO: task kworker/u26:2:721 blocked for more than 120 seconds.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455028] Tainted: G E 3.16.0-ceph-00037-g0532581 #1
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455044] kworker/u26:2 D 0000000000000000 0 721 2 0x00000000
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455057] Workqueue: writeback bdi_writeback_workfn (flush-251:0)
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455060] ffff8806717fb7b8 0000000000000046 ffff8806717fb778 ffffffff810bdc5c
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455063] ffff88066ed53240 0000000000012fc0 ffff8806717fbfd8 0000000000012fc0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455066] ffff880673de1920 ffff88066ed53240 ffff8806717fb788 ffff88067fd538d0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455080] Call Trace:
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455089] [<ffffffff810bdc5c>] ? ktime_get_ts+0x4c/0xe0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455094] [<ffffffff81142fc0>] ? lock_page+0x70/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455102] [<ffffffff816ba789>] schedule+0x29/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455104] [<ffffffff816ba85f>] io_schedule+0x8f/0xd0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455107] [<ffffffff81142fce>] sleep_on_page+0xe/0x20
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455110] [<ffffffff816bac73>] __wait_on_bit_lock+0x73/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455114] [<ffffffff81143a9b>] ? find_get_pages_tag+0xcb/0x170
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455116] [<ffffffff81142fb7>] __lock_page+0x67/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455122] [<ffffffff8109ba60>] ? wake_atomic_t_function+0x40/0x40
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455127] [<ffffffff81249a57>] mpage_prepare_extent_to_map+0x267/0x320
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455131] [<ffffffff8124fedd>] ext4_writepages+0x46d/0xca0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455138] [<ffffffff8133cd97>] ? __blk_run_queue+0x37/0x50
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455144] [<ffffffff8114e3c6>] ? generic_writepages+0x56/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455147] [<ffffffff8114fdb0>] do_writepages+0x20/0x40
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455150] [<ffffffff811d7995>] __writeback_single_inode+0x45/0x290
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455153] [<ffffffff811d85d6>] writeback_sb_inodes+0x216/0x450
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455156] [<ffffffff811d88ae>] __writeback_inodes_wb+0x9e/0xd0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455159] [<ffffffff811d908b>] wb_writeback+0x27b/0x320
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455162] [<ffffffff811d917e>] ? wb_check_old_data_flush+0x4e/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455165] [<ffffffff811db899>] bdi_writeback_workfn+0x209/0x460
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455171] [<ffffffff81070f34>] process_one_work+0x174/0x4a0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455174] [<ffffffff81071c5b>] worker_thread+0x11b/0x530
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455176] [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455181] [<ffffffff81078829>] kthread+0xc9/0xe0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455184] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455188] [<ffffffff816be12c>] ret_from_fork+0x7c/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455191] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455585] INFO: task jbd2/rbd0p1-8:17420 blocked for more than 120 seconds.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455589] Tainted: G E 3.16.0-ceph-00037-g0532581 #1
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455595] jbd2/rbd0p1-8 D 0000000000000009 0 17420 2 0x00000000
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455598] ffff88017bc87c58 0000000000000046 ffff8800794922a0 0000000300000001
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455601] ffff8804af43cb60 0000000000012fc0 ffff88017bc87fd8 0000000000012fc0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455604] ffff880469d9b240 ffff8804af43cb60 ffff88017bc87c68 ffff88017bc87d90
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455606] Call Trace:
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455610] [<ffffffff816ba789>] schedule+0x29/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455617] [<ffffffff8129c0d3>] jbd2_journal_commit_transaction+0x223/0x18f0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455620] [<ffffffff81091921>] ? dequeue_entity+0x181/0x520
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455623] [<ffffffff81092c4c>] ? put_prev_entity+0x3c/0x320
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455626] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455630] [<ffffffff812a0a83>] kjournald2+0xb3/0x230
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455633] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455636] [<ffffffff812a09d0>] ? commit_timeout+0x10/0x10
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455639] [<ffffffff81078829>] kthread+0xc9/0xe0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455642] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455646] [<ffffffff816be12c>] ret_from_fork+0x7c/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455649] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455716] INFO: task dd:23350 blocked for more than 120 seconds.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455720] Tainted: G E 3.16.0-ceph-00037-g0532581 #1
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455726] dd D 0000000000000009 0 23350 8521 0x00000002
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455728] ffff88001e793908 0000000000000086 ffff88001e793988 ffffffff81085544
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455731] ffff8800041ae480 0000000000012fc0 ffff88001e793fd8 0000000000012fc0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455733] ffff880673dd6480 ffff8800041ae480 ffff88001e7938d8 ffff88067fcd38d0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455736] Call Trace:
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455740] [<ffffffff81085544>] ? set_task_cpu+0x74/0x1c0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455743] [<ffffffff816ba789>] schedule+0x29/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455746] [<ffffffff816ba85f>] io_schedule+0x8f/0xd0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455748] [<ffffffff8133e264>] get_request+0x414/0x800
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455751] [<ffffffff810854c6>] ? ttwu_do_activate.constprop.121+0x66/0x70
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455754] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455758] [<ffffffff8134292c>] blk_queue_bio+0xcc/0x360
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455760] [<ffffffff8133ecd0>] generic_make_request+0xc0/0x100
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455762] [<ffffffff8133ed85>] submit_bio+0x75/0x140
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455766] [<ffffffff8114e8ef>] ? __test_set_page_writeback+0x16f/0x1c0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455769] [<ffffffff81251a69>] ext4_io_submit+0x29/0x50
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455772] [<ffffffff81251c61>] ext4_bio_write_page+0x1a1/0x2d0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455781] [<ffffffff81248c08>] mpage_submit_page+0x68/0x90
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455784] [<ffffffff812492bf>] mpage_map_and_submit_buffers+0x15f/0x260
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455787] [<ffffffff812501ab>] ext4_writepages+0x73b/0xca0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455792] [<ffffffff811ce292>] ? mnt_clone_write+0x12/0x30
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455798] [<ffffffff816b3d8d>] ? __slab_free+0xee/0x234
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455801] [<ffffffff81095c56>] ? pick_next_task_fair+0x3c6/0x4e0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455803] [<ffffffff8114fdb0>] do_writepages+0x20/0x40
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455806] [<ffffffff81144a79>] __filemap_fdatawrite_range+0x59/0x60
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455809] [<ffffffff811456cc>] filemap_flush+0x1c/0x20
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455812] [<ffffffff8124b769>] ext4_alloc_da_blocks+0x49/0xa0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455814] [<ffffffff812447d9>] ext4_release_file+0x79/0xc0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455819] [<ffffffff811b2155>] __fput+0xb5/0x200
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455822] [<ffffffff811b22ee>] _
_fput+0xe/0x10
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455824] [<ffffffff81075317>] task_work_run+0xa7/0xe0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455828] [<ffffffff81002f0a>] do_notify_resume+0xaa/0xc0
Sep 14 15:33:07 arh-ibstorage1-ib kernel: [ 1920.455832] [<ffffffff816be462>] int_signal+0x12/0x17

This is using 3.16.0-ceph-00037-g0532581 kernel. My rbd cache settings are turned off. I've ran the following command to cause the hang task:

for i in {1..100} ; do dd if=/dev/zero of=2G bs=4K count=500K ; done

The hang tasks happened after about 40 dd runs, whereas before it was hanging on a 3rd or 4th run. So, definitely an improvement from the ubuntu 3.15 and 3.16 kernels, but not perfect ((

Thanks

Andrei
Thanks

#43 Updated by Ilya Dryomov about 7 years ago

Do those dds eventually complete?

#44 Updated by Andrei Mikhailovsky about 7 years ago

Ilya,

Not sure as I've not left it to run for a long time. There was no progress for about 30 minutes, so I've rebooted the server.

I am running some more tests with this kernel and will report on the progress. I am doing various block sizes and oflag=direct as well.

If you would like me to run some more tests, please let me know.

Andrei

#45 Updated by Andrei Mikhailovsky about 7 years ago

Okay, just tried with block size 4M and after about 20 dd runs I got a hang task.

Also, I've noticed that the speed of dd runs is gradually decreasing with every run. The dds start with about 60MB/s and at the end the speed is only around 20MB/s. Something odd is going on here.

Andrei

#46 Updated by Andrei Mikhailovsky about 7 years ago

Done more testing. The hang tasks happen regardless of the block size (tried 4K, 1M ,4M). Without rbd cache enabled the hang tasks tend to happen sooner (I am seeing the first hang task after about 40 dd runs, whereas with rbd cache enabled it happens after about run 70-75).

Andrei

#47 Updated by Andrei Mikhailovsky about 7 years ago

Ilya,

I was wondering if you've managed to verify my findings? Has anyone experienced similar behaviour as I am?

Thanks

#48 Updated by Ilya Dryomov about 7 years ago

Andrei,

No, I haven't, but plan to try harder. I am however seeing an extreme slowdown, will open a ticket to take a deeper look.

The stacktraces you provided don't have any rbd calls in them. Can you try to see if it's the actual hang or there is some progress? Also, running clients on OSD nodes is not recommended and may lead to lockups, although what you are seeing is not related. Try doing 'echo w > /proc/sysrq-trigger' if/when it locks up?

#49 Updated by Christian Eichelmann about 6 years ago

Hi!

Are there any news here?
We are still having exactly the same issues with our cluster. And our ONLY usecase is to use the kernel rbd driver. What happened to this bug?

Regards,
Christian

#50 Updated by Ilya Dryomov about 6 years ago

  • Regression set to No

The issue in this ticket was a problem triggered by a change to the core kernel and is fixed. The follow up comments in this ticket may or may not be related, I can't remember off hand - it's been a year.
Please file a new ticket under "Ceph kernel client" and include your kernel and ceph versions, your cluster config and topology, description of your workload, any splats from dmesg, etc.

Also available in: Atom PDF