Project

General

Profile

Actions

Bug #21387

closed

mark_unfound_lost hangs

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-09-12 23:54:12.018214 7f414d38d700 20 osd.2 27 do_command tid 2 [{"mulcmd": "delete", "prefix": "pg", "cmd": "mark_unfound_lost", "pgid": "2.e"}]
2017-09-12 23:54:12.018310 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] all_unfound_are_queried_or_lost all of might_have_unfound 0,1 have been queried or are marked lost
2017-09-12 23:54:12.018350 7f414d38d700  3 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] mark_all_unfound_lost l_delete
2017-09-12 23:54:12.018373 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] 27'324 (19'3) l_delete 2:70b3bf12:::existing_4:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018402 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] 27'325 (19'4) l_delete 2:78623b1d:::existing_6:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018420 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] submit_log_entries 27'324 (19'3) l_delete 2:70b3bf12:::existing_4:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0,27'325 (19'4) l_delete 2:78623b1d:::existing_6:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018445 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] new_repop: repgather(0x7f4181e1cd00 27'325 rep_tid=1881 committed?=0 applied?=0 r=0)
2017-09-12 23:54:12.018472 7f414d38d700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'323 lc 17'2 (0'0,27'323] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] merge_new_log_entries 27'324 (19'3) l_delete 2:70b3bf12:::existing_4:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0,27'325 (19'4) l_delete 2:78623b1d:::existing_6:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018512 7f414d38d700 20 update missing, append 27'324 (19'3) l_delete 2:70b3bf12:::existing_4:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018533 7f414d38d700 20 update missing, append 27'325 (19'4) l_delete 2:78623b1d:::existing_6:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0
2017-09-12 23:54:12.018692 7f414d38d700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 27'324, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2017-09-12 23:54:12.018977 7f414d38d700 20 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 lua=27'323 crt=27'323 mlcod 17'2 active+degraded m=2 u=2] merge_new_log_entries peer_missing for 0 = missing(2 may_include_deletes = 1)
2017-09-12 23:54:12.019020 7f414d38d700 20 osd.2 27 share_map_peer 0x7f4180d3f800 already has epoch 27
2017-09-12 23:54:12.019026 7f414d38d700  1 -- 172.21.15.140:6813/7771 --> 172.21.15.140:6809/6813 -- pg_update_log_missing(2.e epoch 27/26 rep_tid 1881 entries 27'324 (19'3) l_delete 2:70b3bf12:::existing_4:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0,27'325 (19'4) l_delete 2:78623b1d:::existing_6:head by unknown.0.0:0 2017-09-12 23:54:12.018369 0) v2 -- 0x7f418107cb40 con 0
...
2017-09-12 23:54:12.019616 7f4165bf4700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 lua=27'323 crt=27'323 mlcod 17'2 active+degraded m=2] op_applied version 27'325
...
2017-09-12 23:54:23.291911 7f415639f700 10 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2] handle_peering_event: epoch_sent: 27 epoch_requested: 27 DoRecovery
2017-09-12 23:54:23.291954 7f415639f700  5 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2] exit Started/Primary/Active/NotRecovering 30.000150 1 0.000046
2017-09-12 23:54:23.291976 7f415639f700  5 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+degraded m=2] enter Started/Primary/Active/WaitLocalRecoveryReserved
2017-09-12 23:54:23.291992 7f415639f700 20 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+recovery_wait+degraded m=2] get_recovery_priority recovery priority for pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+recovery_wait+degraded m=2] is 180, state is 4195330
2017-09-12 23:54:23.292022 7f415639f700 15 osd.2 pg_epoch: 27 pg[2.e( v 27'325 lc 17'2 (0'0,27'325] local-lis/les=26/27 n=321 ec=12/12 lis/c 26/16 les/c/f 27/17/0 26/26/26) [2,0] r=0 lpr=26 pi=[12,26)/2 luod=27'320 crt=27'323 mlcod 17'2 active+recovery_wait+degraded m=2] publish_stats_to_osd 27:345
...
2017-09-12 23:54:35.246475 7f416abfe700  0 log_channel(cluster) log [WRN] : slow request 30.782548 seconds old, received at 2017-09-12 23:54:04.463818: osd_op(client.4359.0:4965 2.e 2:796691b6:::benchmark_data_smithi140_8857_object4964:head [set-alloc-hint object_size 4096 write_size 4096,write 0~4096] snapc 0=[] ondisk+write+known_if_redirected e27) currently op_applied

/a/sage-2017-09-12_23:24:57-rados-wip-sage-testing2-2017-09-09-2248-distro-basic-smithi/1624615
Actions #1

Updated by David Zafman over 6 years ago

  • Assignee set to David Zafman
Actions #2

Updated by David Zafman over 6 years ago

  • Status changed from 12 to Can't reproduce

Multiple fixes to mark_all_unfound_lost() has fixed this. Possibly the most important master branch commit is 689bff354a6bfa1cf47b0d0a04fa9f1b2ef68f75

Actions

Also available in: Atom PDF