Project

General

Profile

Actions

Bug #23978

closed

ceph kernel mount hangs

Added by Niklas Hambuechen about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I have (on Ubuntu 16.04 with ceph Luminous) a testing CephFS setup.

I wrote to the kernel mount until the OSD was full and I got "OSD full warnings".

Then I deleted files from the CephFS so the OSDs are no longer full.

However, on one machine that is both a ceph server and has the CephFS mounted using the kernel module, the mount is hanging forever.

`df` is hanging, `ls /thecephmount` is hanging, and so on.

I enabled debug logging of the kernel module and got:

# dmesg
...
[    7.542528] new mount options do not match the existing superblock, will be ignored
[87967.250276] FS-Cache: Loaded
[87967.260628] Key type ceph registered
[87967.261290] libceph: loaded (mon/osd proto 15/24)
[87967.270062] FS-Cache: Netfs 'ceph' registered for caching
[87967.270075] ceph: loaded (mds proto 32)
[87967.280200] libceph: mon0 88.99.127.158:6789 session established
[87967.282153] libceph: client864430 fsid f59e49e7-5539-42fd-8706-9b937a517c5a
[88119.744710] libceph: osd1 88.99.127.157:6800 socket closed (con state OPEN)
[88119.759614] libceph: osd2 88.99.127.158:6801 socket closed (con state OPEN)
[88123.004723] libceph: FULL or reached pool quota
[88123.005245] libceph: FULL or reached pool quota
[88123.039028] libceph: FULL or reached pool quota
[88123.091190] libceph: FULL or reached pool quota
[88123.091916] libceph: FULL or reached pool quota
[88123.128915] libceph: FULL or reached pool quota
[88123.145629] libceph: FULL or reached pool quota
[88123.181391] libceph: FULL or reached pool quota
[88123.201528] libceph: FULL or reached pool quota
[88123.202155] libceph: FULL or reached pool quota
[88325.267160] INFO: task kworker/0:2:25706 blocked for more than 120 seconds.
[88325.269249]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88325.270994] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88325.272689] kworker/0:2     D    0 25706      2 0x80000000
[88325.272723] Workqueue: events handle_osds_timeout [libceph]
[88325.272725] Call Trace:
[88325.272735]  __schedule+0x3d6/0x8b0
[88325.272737]  schedule+0x36/0x80
[88325.272740]  rwsem_down_write_failed+0x237/0x3b0
[88325.272745]  call_rwsem_down_write_failed+0x17/0x30
[88325.272748]  ? call_rwsem_down_write_failed+0x17/0x30
[88325.272750]  down_write+0x2d/0x40
[88325.272760]  handle_osds_timeout+0x47/0x130 [libceph]
[88325.272765]  process_one_work+0x15b/0x410
[88325.272768]  worker_thread+0x4b/0x460
[88325.272771]  kthread+0x10c/0x140
[88325.272773]  ? process_one_work+0x410/0x410
[88325.272776]  ? kthread_create_on_node+0x70/0x70
[88325.272778]  ret_from_fork+0x35/0x40
[88325.272785] INFO: task kworker/1:2:27139 blocked for more than 120 seconds.
[88325.274209]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88325.276067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88325.278688] kworker/1:2     D    0 27139      2 0x80000000
[88325.278724] Workqueue: events delayed_work [ceph]
[88325.278777] Call Trace:
[88325.278788]  __schedule+0x3d6/0x8b0
[88325.278795]  ? release_sock+0x8f/0xa0
[88325.278800]  schedule+0x36/0x80
[88325.278804]  rwsem_down_read_failed+0x10a/0x170
[88325.278813]  call_rwsem_down_read_failed+0x18/0x30
[88325.278816]  ? call_rwsem_down_read_failed+0x18/0x30
[88325.278819]  down_read+0x20/0x40
[88325.278829]  ceph_send_cap_releases+0x51/0x350 [ceph]
[88325.278836]  ? __queue_delayed_work+0x68/0xa0
[88325.278845]  delayed_work+0x1a3/0x210 [ceph]
[88325.278849]  process_one_work+0x15b/0x410
[88325.278852]  worker_thread+0x4b/0x460
[88325.278855]  kthread+0x10c/0x140
[88325.278858]  ? process_one_work+0x410/0x410
[88325.278869]  ? kthread_create_on_node+0x70/0x70
[88325.278876]  ret_from_fork+0x35/0x40
[88325.278881] INFO: task kworker/0:1:27429 blocked for more than 120 seconds.
[88325.281285]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88325.283323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88325.286003] kworker/0:1     D    0 27429      2 0x80000000
[88325.286035] Workqueue: events handle_timeout [libceph]
[88325.286037] Call Trace:
[88325.286046]  __schedule+0x3d6/0x8b0
[88325.286049]  schedule+0x36/0x80
[88325.286052]  rwsem_down_write_failed+0x237/0x3b0
[88325.286057]  call_rwsem_down_write_failed+0x17/0x30
[88325.286060]  ? call_rwsem_down_write_failed+0x17/0x30
[88325.286062]  down_write+0x2d/0x40
[88325.286074]  handle_timeout+0x97/0x6a0 [libceph]
[88325.286080]  process_one_work+0x15b/0x410
[88325.286083]  worker_thread+0x4b/0x460
[88325.286086]  kthread+0x10c/0x140
[88325.286089]  ? process_one_work+0x410/0x410
[88325.286091]  ? kthread_create_on_node+0x70/0x70
[88325.286094]  ret_from_fork+0x35/0x40
[88325.286100] INFO: task kworker/u4:2:27501 blocked for more than 120 seconds.
[88325.288580]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88325.290654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88325.293370] kworker/u4:2    D    0 27501      2 0x80000000
[88325.293389] Workqueue: writeback wb_workfn (flush-ceph-1)
[88325.293395] Call Trace:
[88325.293406]  __schedule+0x3d6/0x8b0
[88325.293411]  ? out_of_line_wait_on_atomic_t+0x110/0x110
[88325.293413]  schedule+0x36/0x80
[88325.293415]  bit_wait+0x11/0x60
[88325.293417]  __wait_on_bit+0x63/0x90
[88325.293440]  ? __cap_delay_requeue+0x74/0xf0 [ceph]
[88325.293443]  __inode_wait_for_writeback+0xaa/0xe0
[88325.293455]  ? bit_waitqueue+0x40/0x40
[88325.293461]  inode_wait_for_writeback+0x26/0x40
[88325.293468]  evict+0xb5/0x1a0
[88325.293470]  iput+0x1d2/0x220
[88325.293480]  ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
[88325.293494]  writepages_finish+0x2d3/0x410 [ceph]
[88325.293510]  __complete_request+0x26/0x60 [libceph]
[88325.293520]  complete_request+0x2e/0x70 [libceph]
[88325.293528]  __submit_request+0x256/0x330 [libceph]
[88325.293536]  submit_request+0x2b/0x30 [libceph]
[88325.293544]  ceph_osdc_start_request+0x25/0x40 [libceph]
[88325.293552]  ceph_writepages_start+0xdfe/0x1320 [ceph]
[88325.293558]  do_writepages+0x1f/0x70
[88325.293560]  __writeback_single_inode+0x45/0x330
[88325.293563]  writeback_sb_inodes+0x26a/0x600
[88325.293566]  __writeback_inodes_wb+0x92/0xc0
[88325.293568]  wb_writeback+0x274/0x330
[88325.293573]  ? find_next_bit+0xb/0x10
[88325.293576]  wb_workfn+0x2d5/0x3b0
[88325.293577]  ? wb_workfn+0x2d5/0x3b0
[88325.293582]  process_one_work+0x15b/0x410
[88325.293585]  worker_thread+0x4b/0x460
[88325.293588]  kthread+0x10c/0x140
[88325.293590]  ? process_one_work+0x410/0x410
[88325.293592]  ? kthread_create_on_node+0x70/0x70
[88325.293596]  ret_from_fork+0x35/0x40
[88325.293600] INFO: task kworker/1:1:27573 blocked for more than 120 seconds.
[88325.295922]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88325.298025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88325.300585] kworker/1:1     D    0 27573      2 0x80000000
[88325.300615] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[88325.300617] Call Trace:
[88325.300626]  __schedule+0x3d6/0x8b0
[88325.300628]  schedule+0x36/0x80
[88325.300631]  rwsem_down_read_failed+0x10a/0x170
[88325.300635]  call_rwsem_down_read_failed+0x18/0x30
[88325.300643]  ? call_rwsem_down_read_failed+0x18/0x30
[88325.300651]  down_read+0x20/0x40
[88325.300664]  ceph_osdc_update_epoch_barrier+0x20/0x60 [libceph]
[88325.300684]  ceph_handle_caps+0x209/0x18c0 [ceph]
[88325.300704]  ? ceph_x_check_message_signature+0x6b/0xe0 [libceph]
[88325.300716]  dispatch+0x1ed/0xa80 [ceph]
[88325.300728]  ceph_con_workfn+0x95b/0x1560 [libceph]
[88325.300731]  ? __schedule+0x3de/0x8b0
[88325.300736]  process_one_work+0x15b/0x410
[88325.300739]  worker_thread+0x4b/0x460
[88325.300742]  kthread+0x10c/0x140
[88325.300745]  ? process_one_work+0x410/0x410
[88325.300747]  ? kthread_create_on_node+0x70/0x70
[88325.300751]  ret_from_fork+0x35/0x40
[88446.093514] INFO: task kworker/0:2:25706 blocked for more than 120 seconds.
[88446.096095]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88446.098331] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88446.104021] kworker/0:2     D    0 25706      2 0x80000000
[88446.104053] Workqueue: events handle_osds_timeout [libceph]
[88446.104055] Call Trace:
[88446.104065]  __schedule+0x3d6/0x8b0
[88446.104068]  schedule+0x36/0x80
[88446.104071]  rwsem_down_write_failed+0x237/0x3b0
[88446.104075]  call_rwsem_down_write_failed+0x17/0x30
[88446.104078]  ? call_rwsem_down_write_failed+0x17/0x30
[88446.104080]  down_write+0x2d/0x40
[88446.104091]  handle_osds_timeout+0x47/0x130 [libceph]
[88446.104097]  process_one_work+0x15b/0x410
[88446.104100]  worker_thread+0x4b/0x460
[88446.104103]  kthread+0x10c/0x140
[88446.104105]  ? process_one_work+0x410/0x410
[88446.104108]  ? kthread_create_on_node+0x70/0x70
[88446.104110]  ret_from_fork+0x35/0x40
[88446.104117] INFO: task kworker/1:2:27139 blocked for more than 120 seconds.
[88446.106809]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88446.108894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88446.110570] kworker/1:2     D    0 27139      2 0x80000000
[88446.110603] Workqueue: events delayed_work [ceph]
[88446.110610] Call Trace:
[88446.110622]  __schedule+0x3d6/0x8b0
[88446.110628]  ? release_sock+0x8f/0xa0
[88446.110631]  schedule+0x36/0x80
[88446.110634]  rwsem_down_read_failed+0x10a/0x170
[88446.110638]  call_rwsem_down_read_failed+0x18/0x30
[88446.110641]  ? call_rwsem_down_read_failed+0x18/0x30
[88446.110643]  down_read+0x20/0x40
[88446.110655]  ceph_send_cap_releases+0x51/0x350 [ceph]
[88446.110661]  ? __queue_delayed_work+0x68/0xa0
[88446.110671]  delayed_work+0x1a3/0x210 [ceph]
[88446.110676]  process_one_work+0x15b/0x410
[88446.110682]  worker_thread+0x4b/0x460
[88446.110694]  kthread+0x10c/0x140
[88446.110699]  ? process_one_work+0x410/0x410
[88446.110704]  ? kthread_create_on_node+0x70/0x70
[88446.110708]  ret_from_fork+0x35/0x40
[88446.110713] INFO: task kworker/0:1:27429 blocked for more than 120 seconds.
[88446.112300]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88446.113631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88446.115277] kworker/0:1     D    0 27429      2 0x80000000
[88446.115309] Workqueue: events handle_timeout [libceph]
[88446.115311] Call Trace:
[88446.115319]  __schedule+0x3d6/0x8b0
[88446.115322]  schedule+0x36/0x80
[88446.115325]  rwsem_down_write_failed+0x237/0x3b0
[88446.115330]  call_rwsem_down_write_failed+0x17/0x30
[88446.115333]  ? call_rwsem_down_write_failed+0x17/0x30
[88446.115335]  down_write+0x2d/0x40
[88446.115346]  handle_timeout+0x97/0x6a0 [libceph]
[88446.115351]  process_one_work+0x15b/0x410
[88446.115354]  worker_thread+0x4b/0x460
[88446.115357]  kthread+0x10c/0x140
[88446.115359]  ? process_one_work+0x410/0x410
[88446.115361]  ? kthread_create_on_node+0x70/0x70
[88446.115364]  ret_from_fork+0x35/0x40
[88446.115368] INFO: task kworker/u4:2:27501 blocked for more than 120 seconds.
[88446.116748]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88446.118928] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88446.121602] kworker/u4:2    D    0 27501      2 0x80000000
[88446.121616] Workqueue: writeback wb_workfn (flush-ceph-1)
[88446.121618] Call Trace:
[88446.121627]  __schedule+0x3d6/0x8b0
[88446.121630]  ? out_of_line_wait_on_atomic_t+0x110/0x110
[88446.121632]  schedule+0x36/0x80
[88446.121635]  bit_wait+0x11/0x60
[88446.121637]  __wait_on_bit+0x63/0x90
[88446.121657]  ? __cap_delay_requeue+0x74/0xf0 [ceph]
[88446.121661]  __inode_wait_for_writeback+0xaa/0xe0
[88446.121665]  ? bit_waitqueue+0x40/0x40
[88446.121668]  inode_wait_for_writeback+0x26/0x40
[88446.121670]  evict+0xb5/0x1a0
[88446.121673]  iput+0x1d2/0x220
[88446.121683]  ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
[88446.121693]  writepages_finish+0x2d3/0x410 [ceph]
[88446.121712]  __complete_request+0x26/0x60 [libceph]
[88446.121724]  complete_request+0x2e/0x70 [libceph]
[88446.121735]  __submit_request+0x256/0x330 [libceph]
[88446.121746]  submit_request+0x2b/0x30 [libceph]
[88446.121756]  ceph_osdc_start_request+0x25/0x40 [libceph]
[88446.121768]  ceph_writepages_start+0xdfe/0x1320 [ceph]
[88446.121775]  do_writepages+0x1f/0x70
[88446.121779]  __writeback_single_inode+0x45/0x330
[88446.121783]  writeback_sb_inodes+0x26a/0x600
[88446.121787]  __writeback_inodes_wb+0x92/0xc0
[88446.121789]  wb_writeback+0x274/0x330
[88446.121793]  ? find_next_bit+0xb/0x10
[88446.121796]  wb_workfn+0x2d5/0x3b0
[88446.121800]  ? wb_workfn+0x2d5/0x3b0
[88446.121807]  process_one_work+0x15b/0x410
[88446.121810]  worker_thread+0x4b/0x460
[88446.121813]  kthread+0x10c/0x140
[88446.121816]  ? process_one_work+0x410/0x410
[88446.121820]  ? kthread_create_on_node+0x70/0x70
[88446.121825]  ret_from_fork+0x35/0x40
[88446.121830] INFO: task kworker/1:1:27573 blocked for more than 120 seconds.
[88446.124279]       Not tainted 4.13.0-38-generic #43~16.04.1-Ubuntu
[88446.126344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88446.129154] kworker/1:1     D    0 27573      2 0x80000000
[88446.129184] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[88446.129186] Call Trace:
[88446.129195]  __schedule+0x3d6/0x8b0
[88446.129197]  schedule+0x36/0x80
[88446.129200]  rwsem_down_read_failed+0x10a/0x170
[88446.129205]  call_rwsem_down_read_failed+0x18/0x30
[88446.129208]  ? call_rwsem_down_read_failed+0x18/0x30
[88446.129210]  down_read+0x20/0x40
[88446.129221]  ceph_osdc_update_epoch_barrier+0x20/0x60 [libceph]
[88446.129235]  ceph_handle_caps+0x209/0x18c0 [ceph]
[88446.129247]  ? ceph_x_check_message_signature+0x6b/0xe0 [libceph]
[88446.129259]  dispatch+0x1ed/0xa80 [ceph]
[88446.129268]  ceph_con_workfn+0x95b/0x1560 [libceph]
[88446.129272]  ? __schedule+0x3de/0x8b0
[88446.129276]  process_one_work+0x15b/0x410
[88446.129279]  worker_thread+0x4b/0x460
[88446.129283]  kthread+0x10c/0x140
[88446.129285]  ? process_one_work+0x410/0x410
[88446.129287]  ? kthread_create_on_node+0x70/0x70
[88446.129291]  ret_from_fork+0x35/0x40
[88459.148819] libceph: mon0 88.99.127.158:6789 session lost, hunting for new mon
[89022.975681] libceph: osd1 88.99.127.157:6800 socket closed (con state OPEN)
[89023.117774] libceph: osd2 88.99.127.158:6801 socket closed (con state OPEN)
[89783.794660] libceph: osd0 88.99.127.192:6802 socket closed (con state OPEN)
[92910.213622] TCP: request_sock_TCP: Possible SYN flooding on port 7000. Sending cookies.  Check SNMP counters.
[95553.746898] sysrq: SysRq : This sysrq operation is disabled.
[95661.957059] sysrq: SysRq : Changing Loglevel
[95661.958086] sysrq: Loglevel set to 9
[95783.750306] libceph:  monc delayed_work
[95783.756173] libceph:  delayed_work continuing hunt
[95783.756176] libceph:  __close_session closing mon0
[95783.756187] libceph:  ceph_msg_revoke ffffa00cb4c38288 msg ffffa00dab3d7b00 - was on queue
[95783.756190] libceph:  ceph_msg_put ffffa00dab3d7b00 (was 2)
[95783.756193] libceph:  ceph_msg_revoke_incoming ffffa00cb4c38288 in_msg           (null) msg ffffa00dab3d7000 no-op
[95783.756196] libceph:  ceph_msg_revoke_incoming msg ffffa00dab3d7300 null con
[95783.756200] libceph:  con_close ffffa00cb4c38288 peer 88.99.127.157:6789
[95783.756202] libceph:  reset_connection ffffa00cb4c38288
[95783.756206] libceph:  cancel_con ffffa00cb4c38288
[95783.756208] libceph:  con_close_socket on ffffa00cb4c38288 sock           (null)
[95783.756211] libceph:  con_sock_state_closed con ffffa00cb4c38288 sock 1 -> 1
[95783.756212] libceph:  auth_reset ffffa00db3b69360
[95783.756215] libceph:  pick_new_mon mon0 -> mon2 out of 3 mons
[95783.756216] libceph:  __open_session opening mon2
[95783.756218] libceph:  con_open ffffa00cb4c38288 88.99.127.192:6789
[95783.756224] libceph:  queue_con_delay ffffa00cb4c38288 0
[95783.756225] libceph:  con_keepalive ffffa00cb4c38288
[95783.756226] libceph:  queue_con_delay ffffa00cb4c38288 - already queued
[95783.756227] libceph:  auth_build_hello
[95783.756229] libceph:  ceph_msg_get ffffa00dab3d7b00 (was 1)
[95783.756233] libceph:  ----- ffffa00dab3d7b00 to mon2 17=auth len 60+0+0 -----
[95783.756234] libceph:  __schedule_delayed after 7500
[95792.449628] ceph:  do_getattr inode ffffa00db2e608a8 mask pAsLsXsFs mode 040755
[95792.453274] ceph:  __cap_is_valid ffffa00db2e608a8 cap ffffa00db352ae10 issued pAsLsXsFs but STALE (gen 0 vs 0)
[95792.458183] ceph:  do_request on ffffa00d1ea9c800
[95792.460964] ceph:  reserve caps ctx=ffffa00d1ea9cb58 need=1
[95792.463719] ceph:  reserve caps ctx=ffffa00d1ea9cb58 673 = 666 used + 7 resv + 0 avail
[95792.467646] ceph:  __register_request ffffa00d1ea9c800 tid 2717
[95792.470251] ceph:  __choose_mds ffffa00db2e608a8 is_hash=0 (0) mode 0
[95792.473463] ceph:  choose_mds ffffa00db2e608a8 1.fffffffffffffffe mds0 (auth cap ffffa00db352ae10)
[95792.479113] ceph:  lookup_mds_session ffffa00cb4c44000 52
[95792.482308] ceph:  mdsc get_session ffffa00cb4c44000 52 -> 53
[95792.485570] ceph:  mdsc get_session ffffa00cb4c44000 53 -> 54
[95792.489008] ceph:  do_request mds0 session ffffa00cb4c44000 state open
[95792.492634] ceph:  prepare_send_request ffffa00d1ea9c800 tid 2717 getattr (attempt 1)
[95792.496284] ceph:   inode ffffa00db2e608a8 1.fffffffffffffffe
[95792.499474] libceph:  ceph_msg_new ffffa00db0554900 front 122
[95792.502780] ceph:   r_parent =           (null)
[95792.505312] libceph:  ceph_msg_get ffffa00db0554900 (was 1)
[95792.508951] ceph:  mdsc get_session ffffa00cb4c44000 54 -> 55
[95792.512752] ceph:  mdsc con_get ffffa00cb4c44000 ok (55)
[95792.516389] libceph:  ----- ffffa00db0554900 to mds0 24=client_request len 122+0+0 -----
[95792.520218] ceph:  mdsc put_session ffffa00cb4c44000 55 -> 54
[95792.523514] ceph:  do_request waiting
[95814.468949] libceph:  monc delayed_work
[95814.472123] libceph:  delayed_work continuing hunt
[95814.474739] libceph:  __close_session closing mon2
[95814.477673] libceph:  ceph_msg_revoke ffffa00cb4c38288 msg ffffa00dab3d7b00 - was on queue
[95814.482075] libceph:  ceph_msg_put ffffa00dab3d7b00 (was 2)
[95814.487230] libceph:  ceph_msg_revoke_incoming ffffa00cb4c38288 in_msg           (null) msg ffffa00dab3d7000 no-op
[95814.492508] libceph:  ceph_msg_revoke_incoming msg ffffa00dab3d7300 null con
[95814.495530] libceph:  con_close ffffa00cb4c38288 peer 88.99.127.192:6789
[95814.498344] libceph:  reset_connection ffffa00cb4c38288
[95814.501111] libceph:  cancel_con ffffa00cb4c38288
[95814.503335] libceph:  con_close_socket on ffffa00cb4c38288 sock           (null)
[95814.506412] libceph:  con_sock_state_closed con ffffa00cb4c38288 sock 1 -> 1
[95814.510191] libceph:  auth_reset ffffa00db3b69360
[95814.513644] libceph:  pick_new_mon mon2 -> mon0 out of 3 mons
[95814.516440] libceph:  __open_session opening mon0
[95814.519267] libceph:  con_open ffffa00cb4c38288 88.99.127.157:6789
[95814.522297] libceph:  queue_con_delay ffffa00cb4c38288 0
[95814.525328] libceph:  con_keepalive ffffa00cb4c38288
[95814.528086] libceph:  queue_con_delay ffffa00cb4c38288 - already queued
[95814.531315] libceph:  auth_build_hello
[95814.533870] libceph:  ceph_msg_get ffffa00dab3d7b00 (was 1)
[95814.537250] libceph:  ----- ffffa00dab3d7b00 to mon0 17=auth len 60+0+0 -----
[95814.540325] libceph:  __schedule_delayed after 7500
[95845.187637] libceph:  monc delayed_work
[95845.190925] libceph:  delayed_work continuing hunt
[95845.197910] libceph:  __close_session closing mon0
[95845.201677] libceph:  ceph_msg_revoke ffffa00cb4c38288 msg ffffa00dab3d7b00 - was on queue
[95845.210742] libceph:  ceph_msg_put ffffa00dab3d7b00 (was 2)
[95845.213427] libceph:  ceph_msg_revoke_incoming ffffa00cb4c38288 in_msg           (null) msg ffffa00dab3d7000 no-op
[95845.218287] libceph:  ceph_msg_revoke_incoming msg ffffa00dab3d7300 null con
[95845.221060] libceph:  con_close ffffa00cb4c38288 peer 88.99.127.157:6789
[95845.225086] libceph:  reset_connection ffffa00cb4c38288
[95845.228104] libceph:  cancel_con ffffa00cb4c38288
[95845.230694] libceph:  con_close_socket on ffffa00cb4c38288 sock           (null)
[95845.234997] libceph:  con_sock_state_closed con ffffa00cb4c38288 sock 1 -> 1
[95845.239191] libceph:  auth_reset ffffa00db3b69360
[95845.242613] libceph:  pick_new_mon mon0 -> mon1 out of 3 mons
[95845.246232] libceph:  __open_session opening mon1
[95845.249631] libceph:  con_open ffffa00cb4c38288 88.99.127.158:6789
[95845.253454] libceph:  queue_con_delay ffffa00cb4c38288 0
[95845.256989] libceph:  con_keepalive ffffa00cb4c38288
[95845.260374] libceph:  queue_con_delay ffffa00cb4c38288 - already queued
[95845.264198] libceph:  auth_build_hello
[95845.267007] libceph:  ceph_msg_get ffffa00dab3d7b00 (was 1)
[95845.270484] libceph:  ----- ffffa00dab3d7b00 to mon1 17=auth len 60+0+0 -----
[95845.274565] libceph:  __schedule_delayed after 7500
[95875.906187] libceph:  monc delayed_work
[95875.908780] libceph:  delayed_work continuing hunt
[95875.910460] libceph:  __close_session closing mon1
[95875.912110] libceph:  ceph_msg_revoke ffffa00cb4c38288 msg ffffa00dab3d7b00 - was on queue
[95875.914966] libceph:  ceph_msg_put ffffa00dab3d7b00 (was 2)
[95875.917787] libceph:  ceph_msg_revoke_incoming ffffa00cb4c38288 in_msg           (null) msg ffffa00dab3d7000 no-op
[95875.922113] libceph:  ceph_msg_revoke_incoming msg ffffa00dab3d7300 null con
[95875.925070] libceph:  con_close ffffa00cb4c38288 peer 88.99.127.158:6789
[95875.928833] libceph:  reset_connection ffffa00cb4c38288
[95875.931914] libceph:  cancel_con ffffa00cb4c38288
[95875.934227] libceph:  con_close_socket on ffffa00cb4c38288 sock           (null)
[95875.936974] libceph:  con_sock_state_closed con ffffa00cb4c38288 sock 1 -> 1
[95875.939952] libceph:  auth_reset ffffa00db3b69360
[95875.943983] libceph:  pick_new_mon mon1 -> mon0 out of 3 mons
[95875.947279] libceph:  __open_session opening mon0
[95875.951523] libceph:  con_open ffffa00cb4c38288 88.99.127.157:6789
[95875.955596] libceph:  queue_con_delay ffffa00cb4c38288 0
[95875.962698] libceph:  con_keepalive ffffa00cb4c38288
[95875.966101] libceph:  queue_con_delay ffffa00cb4c38288 - already queued
[95875.969498] libceph:  auth_build_hello
[95875.971888] libceph:  ceph_msg_get ffffa00dab3d7b00 (was 1)
[95875.981640] libceph:  ----- ffffa00dab3d7b00 to mon0 17=auth len 60+0+0 -----
[95875.985765] libceph:  __schedule_delayed after 7500

What could be going on here?

How do I debug this further?


Files

Screenshot from 2018-05-03 02-19-01.png (175 KB) Screenshot from 2018-05-03 02-19-01.png ceph module hangs machine shutdown 1 Niklas Hambuechen, 05/03/2018 12:48 AM
Screenshot from 2018-05-03 02-18-42.png (176 KB) Screenshot from 2018-05-03 02-18-42.png ceph module hangs machine shutdown 2 Niklas Hambuechen, 05/03/2018 12:48 AM
Actions

Also available in: Atom PDF