Project

General

Profile

Bug #11687

stuck incomplete

Added by Samuel Just almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-05-20 08:06:31.986367 7f6e76e53700 5 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetInfo 0.139473 4 0.000455
2015-05-20 08:06:31.986380 7f6e76e53700 5 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2015-05-20 08:06:31.986398 7f6e76e53700 10 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] calc_acting osd.0 1.4e( v 473'302 (292'200,473'302] local-les=473 n=4 ec=5 les/c 473/473 556/556/556)
2015-05-20 08:06:31.986413 7f6e76e53700 10 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] calc_acting osd.1 1.4e( v 473'302 (293'202,473'302] lb 0//0//-1 local-les=477 n=0 ec=5 les/c 473/473 556/556/556)
2015-05-20 08:06:31.986428 7f6e76e53700 10 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] calc_acting osd.4 1.4e( v 473'302 (120'121,473'302] local-les=473 n=4 ec=5 les/c 473/473 556/556/556)
2015-05-20 08:06:31.986441 7f6e76e53700 10 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] calc_acting osd.5 1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556)
2015-05-20 08:06:31.986455 7f6e76e53700 10 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] choose_acting failed
2015-05-20 08:06:31.986471 7f6e76e53700 5 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetLog 0.000090 0 0.000000
2015-05-20 08:06:31.986485 7f6e76e53700 15 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] publish_stats_to_osd 556:10
2015-05-20 08:06:31.986496 7f6e76e53700 5 osd.5 pg_epoch: 556 pg[1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556) [5,0] r=0 lpr=556 pi=471-555/9 crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/Incomplete

osds 0 or 4 should have been chosen as authoritative, but osd.1 has a info.les == 577. It seems that osd.4 went down after sending that activation message but before recording its own info.les and before osd.0 got the message leaving osd.1 as the only osd with a high enough info.les to go active. The bug appears to be that in find_best_info, info.last_epoch_started should not be used to determine the min acceptable last_update, only info.history.last_epoch_started.


Related issues

Related to Ceph - Bug #11110: updated history.last_epoch_started but not info.last_epoch_started Resolved 03/13/2015
Copied to Ceph - Backport #12362: stuck incomplete Resolved 05/20/2015

History

#1 Updated by Samuel Just almost 9 years ago

  • Assignee set to Samuel Just
  • Priority changed from Normal to Urgent

#2 Updated by Samuel Just almost 9 years ago

  • Status changed from New to 7

#3 Updated by Samuel Just almost 9 years ago

  • Backport set to hammer

Related: 0712d8d90b4eb455ae56cce5eafdce3e50de39e0 and 9a2ff34d75cc69759584fb802f903068669f6233

#5 Updated by Samuel Just over 8 years ago

  • Status changed from 7 to Pending Backport

#6 Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF