Project

General

Profile

Bug #42712

ObjectStore/StoreTest.ColSplitTest0/2 hangs

Added by Sage Weil 11 days ago. Updated 8 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
Start date:
11/08/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

2019-11-08T00:50:21.449 INFO:teuthology.orchestra.run.smithi042.stdout:[ RUN      ] ObjectStore/StoreTest.ColSplitTest0/2
2019-11-08T12:17:03.053 DEBUG:teuthology.exit:Got signal 15; running 2 handlers...
2019-11-08T12:17:03.081 DEBUG:teuthology.task.console_log:Killing console logger for smithi042

/a/sage-2019-11-07_22:38:52-rados-wip-sage-testing-2019-11-07-1412-distro-basic-smithi/4480785

History

#1 Updated by Sage Weil 11 days ago

/a/sage-2019-11-08_19:27:06-rados-master-distro-basic-smithi/4484836

#2 Updated by Sage Weil 11 days ago

I think the problem is ff71ad472e94e14f392c618b6eb5e8608afec94f

Here's the log:

2019-11-08T19:57:21.888+0000 7f20f8c14d40 15 bluestore(bluestore.test_temp_dir) _split_collection 52.0_head to 52.20_head  bits 6
2019-11-08T19:57:21.888+0000 7f20f8c14d40 10 bluestore(bluestore.test_temp_dir) _osr_drain_preceding 0x5617866b7200 osr 0x561738f63900

so we're waiting for things to drain..
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _kv_finalize_thread kv_committed <0x561726f44900,0x561739207b00,0x561739207800,0x561726f42c00,0x561726f42600,0x561739a93500,0x561739a92300,0x561739a92900,0x561739a92c00,0x561739a92000,0x561739a93200,0x561739a92f00,0x56173a6b1800,0x56173a6b1b00,0x56173a6b0000,0x5617866b6c00,0x5617866b6f00,0x5617866b7800,0x5617866b7b00,0x5617866b6000>
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _kv_finalize_thread deferred_stable <>
2019-11-08T19:57:21.959+0000 7f20e2f9a700 10 bluestore(bluestore.test_temp_dir) _txc_state_proc txc 0x561726f44900 kv_submitted
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _txc_committed_kv txc 0x561726f44900
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _deferred_queue txc 0x561726f44900 osr 0x561738f63900
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore.DeferredBatch(0x561738f62b40) prepare_write seq 1 0x10000~1000 crc 369964be
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore.DeferredBatch(0x561738f62b40) prepare_write seq 1 0x20000~1000 crc 369964be

the kv txns commit and deferred gets queued... but notably that first txn, 0x561726f44900 gets stuck
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _txc_finish 0x5617866b6000 onodes 
2019-11-08T19:57:21.959+0000 7f20e2f9a700 20 bluestore(bluestore.test_temp_dir) _txc_finish  txc 0x561726f44900 deferred_queued

a bit later we submit some other osr's deferred work,
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) deferred_try_submit 1 osrs, 19 txcs
2019-11-08T19:57:24.893+0000 7f20e2593700 10 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock osr 0x561738f63900 20 ios pending 
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 1 0x10000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x10000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 1 0x20000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x20000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 2 0x30000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x30000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 3 0x40000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x40000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 4 0x50000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x50000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 5 0x60000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x60000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 6 0x70000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x70000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 7 0x80000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x80000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 8 0x90000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x90000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 9 0xa0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xa0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 10 0xb0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xb0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 11 0xc0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xc0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 12 0xd0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xd0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 13 0xe0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xe0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 14 0xf0000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0xf0000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 15 0x100000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x100000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 16 0x110000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x110000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 17 0x120000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x120000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 18 0x130000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x130000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock   seq 19 0x140000~1000
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore(bluestore.test_temp_dir) _deferred_submit_unlock write 0x140000~1000 crc 369964be
2019-11-08T19:57:24.893+0000 7f20e2593700 20 bluestore.MempoolThread(0x561738efea08) _resize_shards cache_size: 134217728 kv_alloc: 67108864 kv_used: 976 meta_alloc: 67108864 meta_used: 28658 data_alloc: 67108864 data_used: 0
2019-11-08T19:57:24.898+0000 7f20e0362700 10 bluestore(bluestore.test_temp_dir) _deferred_aio_finish osr 0x561738f63900
2019-11-08T19:57:24.898+0000 7f20e0362700 20 bluestore(bluestore.test_temp_dir) _deferred_aio_finish dequeueing

but the osr we are block on, 0x561738f63900, makes no progress, and we deadlock.

I think there are two problems with that patch: first, we do want to be aggressive with deferred or else we wait seconds for the deferred work to decide to wake up. and second, because we aren't in aggressive mode, the kv thread doesn't wake up again, which means the other txc's don't get completed, which leaves us stuck.

Reverting!

#3 Updated by Sage Weil 11 days ago

  • Pull request ID set to 31503

#4 Updated by Sage Weil 11 days ago

  • Status changed from Verified to Need Review

#5 Updated by Sage Weil 8 days ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF