Bug #38159
ec does not recover below min_size
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 2.5 11 0 0 0 20559523 0 remapped+incomplete 8h 808'465 11030:12032 [7,0,1]p7 [7,0,3]p7 2019-02-03 21:02:10.599802 2019-02-03 20:59:35.124447
/a/sage-2019-02-03_18:58:17-rados-wip-sage2-testing-2019-02-03-1047-distro-basic-smithi/3545666
History
#1 Updated by Sage Weil about 5 years ago
- Subject changed from ec pg stuck incomplete to ec does not recover below min_size
- Priority changed from Urgent to Normal
2019-02-03 20:47:14.528 7ff368abf700 5 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] enter Started/Primary/Peering/GetLog 2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] choose_acting all_info osd.0(1) 2.5s1( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec= 19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) 2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] choose_acting all_info osd.7(0) 2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec= 19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) 2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] calc_acting prefer osd.7(0) because it is current primary 2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] For position 0: selecting up[i]: 7(0) 2019-02-03 20:47:14.528 7ff368abf700 10 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] recoverable_and_ge_min_size failed, below min size 2019-02-03 20:47:14.528 7ff368abf700 5 osd.7 pg_epoch: 189 pg[2.5s0( v 182'165 (0'0,182'165] local-lis/les=149/150 n=4 ec=19/19 lis/c 149/149 les/c/f 150/150/0 189/189/19) [7,0,2147483647]p7(0) r=0 lpr=189 pi=[149,189)/1 crt=182'165 lcod 158'163 mlcod 0'0 peering mbc={} ps=[9b~1,9e~1]] exit Started/Primary/Peering/GetLog 0.000147 0 0.000000
the relevant code:
// We go incomplete if below min_size for ec_pools since backfill // does not currently maintain rollbackability // Otherwise, we will go "peered", but not "active" if (num_want_acting < pool.info.min_size && (pool.info.is_erasure() || !cct->_conf->osd_allow_recovery_below_min_size)) { dout(10) << __func__ << " failed, below min size" << dendl; return false; }
#2 Updated by Sage Weil about 5 years ago
We coudl perhaps point the finger at the min_size choice:
2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout: { 2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout: "pool": 2, 2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout: "pool_name": "unique_pool_0", 2019-02-03T20:45:58.815 INFO:teuthology.orchestra.run.smithi131.stdout: "create_time": "2019-02-03 20:44:22.552415", 2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout: "flags": 16389, 2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout: "flags_names": "hashpspool,ec_overwrites,pool_snaps", 2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout: "type": 3, 2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout: "size": 3, 2019-02-03T20:45:58.816 INFO:teuthology.orchestra.run.smithi131.stdout: "min_size": 3,
size min_size 3
#3 Updated by Patrick Donnelly over 4 years ago
- Status changed from 12 to New