Project

General

Profile

Actions

Bug #38195

open

osd-backfill-space.sh exposes rocksdb hang

Added by David Zafman over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After increasing the timeout for backfills finish for http://tracker.ceph.com/issues/38027 we see this kill_daemons() error with ceph-mon.

dzafman-2019-02-04_11:24:54-rados-wip-zafman-testing2-distro-basic-smithi/3549933

Before mon shutdowns it appears to hang in rocksdb before SIGKILL sent by teardown. This run exposes the kill_daemons() execution.

teuthology.log

2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.c35707: //home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:255: kill_daemon:  cat td/osd-backfill-space/mon.a.pid
2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:255: kill_daemon:  local pid=126389
2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:256: kill_daemon:  local send_signal=TERM
2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:257: kill_daemon:  local 'delays=0.1 0.2 1 1 1 2 3 5 5 5 10 10 20 60 60 60 120'
2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:258: kill_daemon:  local exit_code=1
2019-02-04T20:24:27.598 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:259: kill_daemon:  for try in '$delays'
....
2019-02-04T20:30:25.590 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:259: kill_daemon:  for try in '$delays'
2019-02-04T20:30:25.591 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:260: kill_daemon:  kill -0 126389
2019-02-04T20:30:25.591 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:261: kill_daemon:  exit_code=1
2019-02-04T20:30:25.591 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:266: kill_daemon:  send_signal=0
2019-02-04T20:30:25.591 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:267: kill_daemon:  sleep 120
2019-02-04T20:30:25.591 INFO:tasks.workunit.client.0.smithi194.stderr:35707: /home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:269: kill_daemon:  return 1
2019-02-04T20:30:25.594 INFO:tasks.workunit.client.0.smithi194.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1866: run_in_background:  return 1

mon.a.log

...
2019-02-04 20:24:25.296 7f1b39312700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.257518) [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/memtable_list.cc:377] [default] Level-0 commit table #19 started
2019-02-04 20:24:25.296 7f1b39312700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.299890) [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/memtable_list.cc:424] [default] Level-0 commit table #19: memtable #1 done
2019-02-04 20:24:25.296 7f1b39312700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.299929) EVENT_LOG_v1 {"time_micros": 1549311865299904, "job": 9, "event": "flush_finished", "output_compression": "NoCompression", "lsm_state": [1, 0, 0, 0, 0, 0, 1], "immutable_memtables": 0}
2019-02-04 20:24:25.296 7f1b39312700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.299950) [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:177] [default] Level summary: base level 6 max bytes base 268435456 files[1 0 0 0 0 0 1] max score 0.25

2019-02-04 20:24:25.296 7f1b39312700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_files.cc:347] [JOB 9] Try to delete WAL files size 9979119, prev total WAL file size 10025925, number of live WAL files 2.

2019-02-04 20:24:25.296 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
2019-02-04 20:24:25.296 7f1b39b13700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.300155) [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1847] [default] Manual compaction from level-0 to level-6 from 'paxos .. 'paxos; will stop at (end)

2019-02-04 20:24:25.296 7f1b39b13700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/compaction_job.cc:1636] [default] [JOB 10] Compacting 1@0 + 1@6 files to L6, score -1.00
2019-02-04 20:24:25.296 7f1b39b13700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/compaction_job.cc:1640] [default] Compaction start summary: Base version 9 Base level 0, inputs: [19(9580KB)], [17(18MB)]

2019-02-04 20:24:25.296 7f1b39b13700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1549311865300218, "job": 10, "event": "compaction_started", "compaction_reason": "ManualCompaction", "files_L0": [19], "files_L6": [17], "score": -1, "input_data_size": 29158732}
2019-02-04 20:24:25.524 7f1b39b13700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/compaction_job.cc:1324] [default] [JOB 10] Generated table #20: 2415 keys, 25628909 bytes
2019-02-04 20:24:25.524 7f1b39b13700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1549311865525468, "cf_name": "default", "job": 10, "event": "table_file_creation", "file_number": 20, "file_size": 25628909, "table_properties": {"data_size": 25510081, "index_size": 52487, "filter_size": 65628, "raw_key_size": 44917, "raw_average_key_size": 18, "raw_value_size": 25439814, "raw_average_value_size": 10534, "num_data_blocks": 1603, "num_entries": 2415, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": "0"}}
2019-02-04 20:24:25.548 7f1b39b13700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/compaction_job.cc:1390] [default] [JOB 10] Compacted 1@0 + 1@6 files to L6 => 25628909 bytes
2019-02-04 20:24:25.548 7f1b39b13700  3 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/version_set.cc:2578] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
2019-02-04 20:24:25.616 7f1b39b13700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.617853) [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/compaction_job.cc:752] [default] compacted to: base level 6 max bytes base 268435456 files[0 0 0 0 0 0 1] max score 0.00, MB/sec: 129.4 rd, 113.8 wr, level 6, files in(1, 1) out(1) MB in(9.4, 18.5) out(24.4), read-write-amplify(5.6) write-amplify(2.6) OK, records in: 2929, records dropped: 514 output_compression: NoCompression

2019-02-04 20:24:25.616 7f1b39b13700  4 rocksdb: (Original Log Time 2019/02/04-20:24:25.617896) EVENT_LOG_v1 {"time_micros": 1549311865617881, "job": 10, "event": "compaction_finished", "compaction_time_micros": 225308, "output_level": 6, "num_output_files": 1, "total_output_size": 25628909, "num_input_records": 2929, "num_output_records": 2415, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 0, 0, 0, 1]}
2019-02-04 20:24:25.616 7f1b39b13700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1549311865619390, "job": 10, "event": "table_file_deletion", "file_number": 19}
2019-02-04 20:24:25.620 7f1b39b13700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1549311865621737, "job": 10, "event": "table_file_deletion", "file_number": 17}
2019-02-04 20:24:25.620 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
2019-02-04 20:24:25.620 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
2019-02-04 20:24:25.620 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
2019-02-04 20:24:25.620 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
2019-02-04 20:24:25.620 7f1b30300700  4 rocksdb: [/build/ceph-14.0.1-3131-ge01bb11/src/rocksdb/db/db_impl_compaction_flush.cc:1060] [default] Manual compaction starting
<END OF LOG>


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #38027: osd/osd-backfill-space.sh failsResolvedDavid Zafman01/23/2019

Actions
Actions

Also available in: Atom PDF