Project

General

Profile

Bug #36610

filestore merge collection replay problem

Added by Sage Weil over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2018-10-27_02:10:33-rados-wip-sage-testing-2018-10-26-1411-distro-basic-smithi/3188976

osd.3 was partway through merge_collection, restarted, and the merge didn't replay/complete.

History

#1 Updated by Sage Weil over 5 years ago

the osd is stopped during the merge operation:

2018-10-27 04:11:25.837 7f9ad9d45700  5 filestore(/var/lib/ceph/osd/ceph-3) _do_op(2204): 0x5588d28342a0 seq 13136 osr(2.12_head) start
2018-10-27 04:11:25.837 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_transaction(2735): on 0x5588d580b8c0
2018-10-27 04:11:25.837 7f9ad9d45700 15 filestore(/var/lib/ceph/osd/ceph-3) _remove(3603): 2.32_head/#2:4c000000::::head#
2018-10-27 04:11:25.837 7f9ad9d45700 20 filestore(/var/lib/ceph/osd/ceph-3) lfn_unlink(485): clearing omap on #2:4c000000::::head# in cid 2.32_head
2018-10-27 04:11:25.837 7f9ad9d45700 10 filestore oid: #2:4c000000::::head# not skipping op, *spos 13136.0.0
2018-10-27 04:11:25.837 7f9ad9d45700 10 filestore  > header.spos 0.0.0
2018-10-27 04:11:25.837 7f9ad9d45700 20 filestore remove_map_header: removing 105 oid #2:4c000000::::head#
2018-10-27 04:11:25.837 7f9ad9d45700 20 filestore clear_header: clearing seq 105
2018-10-27 04:11:25.838 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _remove(3605): 2.32_head/#2:4c000000::::head# = 0
2018-10-27 04:11:25.838 7f9ad9d45700 15 filestore(/var/lib/ceph/osd/ceph-3) _merge_collection(5770): 2.32_head 2.12_head bits 5
2018-10-27 04:11:25.838 7f9ad9d45700 15 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4995): /var/lib/ceph/osd/ceph-3/current/2.32_head
2018-10-27 04:11:25.838 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4999): /var/lib/ceph/osd/ceph-3/current/2.32_head = 0
2018-10-27 04:11:25.838 7f9ad9d45700 15 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4995): /var/lib/ceph/osd/ceph-3/current/2.12_head
2018-10-27 04:11:25.838 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4999): /var/lib/ceph/osd/ceph-3/current/2.12_head = 0
2018-10-27 04:11:25.838 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _collection_set_bits(5047): /var/lib/ceph/osd/ceph-3/current/2.12_head 5
2018-10-27 04:11:25.838 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _collection_set_bits(5060): /var/lib/ceph/osd/ceph-3/current/2.12_head 5 = 4
2018-10-27 04:11:25.838 7f9ad9d45700 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:25.857 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_global_replay_guard(2506): 13136.0.1 done
2018-10-27 04:11:25.857 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:25.857 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:25.857 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:25.857 7f9ad9d45700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:25.857 7f9ad9d45700 20 _merge bits 5
2018-10-27 04:11:25.857 7f9ad9d45700 20 _merge pre-splitting to shared level 1
2018-10-27 04:11:25.857 7f9ad9d45700 20 split_dirs [] target level: 1
2018-10-27 04:11:25.857 7f9ad9d45700  1 split_dirs [] has 11 objects, 0 level, starting split in pg 2.32_head.
2018-10-27 04:11:25.858 7f9ad9d45700  1 split_dirs [] split completed in pg 2.32_head.
2018-10-27 04:11:25.858 7f9ad9d45700 20 split_dirs [2] target level: 1
2018-10-27 04:11:25.859 7f9ad9d45700 20 split_dirs [] target level: 1
2018-10-27 04:11:25.859 7f9ad9d45700  1 split_dirs [] has 18 objects, 0 level, starting split in pg 2.12_head.
2018-10-27 04:11:25.874 7f9ad9d45700  1 split_dirs [] split completed in pg 2.12_head.
2018-10-27 04:11:25.874 7f9ad9d45700 20 split_dirs [2] target level: 1
2018-10-27 04:11:25.874 7f9ad9d45700 20 _merge_dirs path []
2018-10-27 04:11:25.874 7f9ad9d45700 20 _merge_dirs path [2]
2018-10-27 04:11:25.877 7f9ad9d45700 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-189 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_d80d3105105e3263e6f8_0_long
2018-10-27 04:11:25.881 7f9ad9d45700 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-201 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_ca3f968ca42f1d343b7a_0_long
2018-10-27 04:11:25.883 7f9ad9d45700 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-209 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_31e55d082a2a04ee5161_0_long
2018-10-27 04:11:25.883 7f9ad9d45700 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-261 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_c104e76ad8419f61f806_0_long
2018-10-27 04:11:25.883 7f9ad9d45700 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-305 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_5a12811cf4d944eb2fa2_0_long

note that the seq is 13136

the replay happens in remote/smithi191/log/objectstore_tool.132434.log

2018-10-27 04:11:26.347 7fe3390c2c40  2 journal read_entry 103981056 : seq 13136 41421 bytes
2018-10-27 04:11:26.347 7fe3390c2c40  3 journal journal_replay: applying op seq 13136
2018-10-27 04:11:26.347 7fe3390c2c40 10 journal op_apply_start 13136 open_ops 0 -> 1
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _do_transaction(2735): on 0x5606f7f154a0
2018-10-27 04:11:26.347 7fe3390c2c40 15 filestore(/var/lib/ceph/osd/ceph-3) _merge_collection(5770): 2.32_head 2.12_head bits 5
2018-10-27 04:11:26.347 7fe3390c2c40 15 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4995): /var/lib/ceph/osd/ceph-3/current/2.32_head
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4999): /var/lib/ceph/osd/ceph-3/current/2.32_head = 0
2018-10-27 04:11:26.347 7fe3390c2c40 15 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4995): /var/lib/ceph/osd/ceph-3/current/2.12_head
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) collection_stat(4999): /var/lib/ceph/osd/ceph-3/current/2.12_head = 0
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2715): object has 13136.0.1 == current pos 13136.0.1, in_progress=true, CONDITIONAL REPLAY
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2715): object has 13136.0.1 == current pos 13136.0.1, in_progress=true, CONDITIONAL REPLAY
2018-10-27 04:11:26.347 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2715): object has 13136.0.1 == current pos 13136.0.1, in_progress=true, CONDITIONAL REPLAY
2018-10-27 04:11:26.347 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.348 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_global_replay_guard(2506): 13136.0.1 done
2018-10-27 04:11:26.348 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:26.348 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:26.348 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:26.348 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:26.348 7fe3390c2c40 20 _merge bits 5
2018-10-27 04:11:26.348 7fe3390c2c40 20 _merge pre-splitting to shared level 1
2018-10-27 04:11:26.348 7fe3390c2c40 20 split_dirs [] target level: 1
2018-10-27 04:11:26.348 7fe3390c2c40  1 split_dirs [] has 0 objects, 0 level, starting split in pg 2.32_head.
2018-10-27 04:11:26.348 7fe3390c2c40  1 split_dirs [] split completed in pg 2.32_head.
2018-10-27 04:11:26.348 7fe3390c2c40 20 split_dirs [2] target level: 1
2018-10-27 04:11:26.348 7fe3390c2c40 20 split_dirs [] target level: 1
2018-10-27 04:11:26.348 7fe3390c2c40  1 split_dirs [] has 0 objects, 0 level, starting split in pg 2.12_head.
2018-10-27 04:11:26.348 7fe3390c2c40  1 split_dirs [] split completed in pg 2.12_head.
2018-10-27 04:11:26.348 7fe3390c2c40 20 split_dirs [2] target level: 1
2018-10-27 04:11:26.348 7fe3390c2c40 20 _merge_dirs path []
2018-10-27 04:11:26.348 7fe3390c2c40 20 _merge_dirs path [2]
2018-10-27 04:11:26.349 7fe3390c2c40 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-793 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_7f1463cb19edfedb014e_0_long
2018-10-27 04:11:26.349 7fe3390c2c40 20 LFNIndex(/var/lib/ceph/osd/ceph-3/current/2.32_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-3/current/2.32_head/DIR_2/smithi19138632-797 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_6c8ba1008b2221e72644_0_long
2018-10-27 04:11:26.349 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2621): 13136.0.1
2018-10-27 04:11:26.349 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.350 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2647): 13136.0.1 done
2018-10-27 04:11:26.350 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2621): 13136.0.1
2018-10-27 04:11:26.350 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2647): 13136.0.1 done
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2724): object has 5573.0.0 < current pos 13136.0.1, in past, will replay
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2724): object has 8249.0.1 < current pos 13136.0.1, in past, will replay
2018-10-27 04:11:26.351 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_global_replay_guard(2506): 13136.0.1 done
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:26.351 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2565): 13136.0.1 START
2018-10-27 04:11:26.352 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard(2597): 13136.0.1 done
2018-10-27 04:11:26.352 7fe3390c2c40 20 _merge bits 5
2018-10-27 04:11:26.352 7fe3390c2c40 20 _merge pre-splitting to shared level 1
2018-10-27 04:11:26.352 7fe3390c2c40 20 split_dirs [] target level: 1
2018-10-27 04:11:26.352 7fe3390c2c40  1 split_dirs [] has 0 objects, 0 level, starting split in pg 2.32_TEMP.
2018-10-27 04:11:26.352 7fe3390c2c40  1 split_dirs [] split completed in pg 2.32_TEMP.
2018-10-27 04:11:26.352 7fe3390c2c40 20 split_dirs [] target level: 1
2018-10-27 04:11:26.352 7fe3390c2c40  1 split_dirs [] has 0 objects, 0 level, starting split in pg 2.12_TEMP.
2018-10-27 04:11:26.352 7fe3390c2c40  1 split_dirs [] split completed in pg 2.12_TEMP.
2018-10-27 04:11:26.352 7fe3390c2c40 20 _merge_dirs path []
2018-10-27 04:11:26.352 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2621): 13136.0.1
2018-10-27 04:11:26.352 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.353 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2647): 13136.0.1 done
2018-10-27 04:11:26.353 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2621): 13136.0.1
2018-10-27 04:11:26.353 7fe3390c2c40 20 filestore dbobjectmap: seq is 215
2018-10-27 04:11:26.353 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _close_replay_guard(2647): 13136.0.1 done
2018-10-27 04:11:26.353 7fe3390c2c40 10 filestore(/var/lib/ceph/osd/ceph-3) _check_replay_guard(2719): object has 13136.0.1 == current pos 13136.0.1, in_progress=false, SKIPPING REPLAY
2018-10-27 04:11:26.353 7fe3390c2c40 15 filestore(/var/lib/ceph/osd/ceph-3) _merge_collection(5892): 2.32_head 2.12_head bits 5 = 0

note the last two lines!

#2 Updated by Sage Weil over 5 years ago

  • Status changed from 12 to Fix Under Review

#3 Updated by Sage Weil over 5 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF