Project

General

Profile

Bug #38159

ec does not recover below min_size

Added by Sage Weil about 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

PG  OBJECTS DEGRADED MISPLACED UNFOUND BYTES    LOG STATE               SINCE VERSION REPORTED    UP        ACTING    SCRUB_STAMP                DEEP_SCRUB_STAMP           
2.5      11        0         0       0 20559523   0 remapped+incomplete    8h 808'465 11030:12032 [7,0,1]p7 [7,0,3]p7 2019-02-03 21:02:10.599802 2019-02-03 20:59:35.124447 

/a/sage-2019-02-03_18:58:17-rados-wip-sage2-testing-2019-02-03-1047-distro-basic-smithi/3545666

History

#1 Updated by Sage Weil about 5 years ago

  • Subject changed from ec pg stuck incomplete to ec does not recover below min_size
  • Priority changed from Urgent to Normal
2019-02-03 20:47:14.528 7ff368abf700  5 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] enter Started/Primary/Peering/GetLog
2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] choose_acting all_info osd.0(1) 2.5s1( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=
19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19)
2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] choose_acting all_info osd.7(0) 2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=
19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19)
2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] calc_acting prefer osd.7(0) because it is current primary
2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] For position 0:  selecting up[i]: 7(0)
2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] recoverable_and_ge_min_size failed, below min size
2019-02-03 20:47:14.528 7ff368abf700  5 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] exit Started/Primary/Peering/GetLog 0.000147 0 0.000000

the relevant code:
  // We go incomplete if below min_size for ec_pools since backfill
  // does not currently maintain rollbackability
  // Otherwise, we will go "peered", but not "active" 
  if (num_want_acting < pool.info.min_size &&
      (pool.info.is_erasure() ||
       !cct->_conf->osd_allow_recovery_below_min_size)) {
    dout(10) << __func__ << " failed, below min size" << dendl;
    return false;
  }

#2 Updated by Sage Weil about 5 years ago

We coudl perhaps point the finger at the min_size choice:

2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout:        {
2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout:            "pool": 2,
2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout:            "pool_name": "unique_pool_0",
2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout:            "create_time": "2019-02-03 20:44:22.552415",
2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout:            "flags": 16389,
2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout:            "flags_names": "hashpspool,ec_overwrites,pool_snaps",
2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout:            "type": 3,
2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout:            "size": 3,
2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout:            "min_size": 3,

size min_size 3

see https://github.com/ceph/ceph/pull/26894

#3 Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New

Also available in: Atom PDF