Actions
Bug #35924
closedchoose_acting picked want > pool size
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-09-10 21:28:37.713 7f3d9523e700 5 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] enter Started/Primary/Peering/GetLog 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_acting all_info osd.0 4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_acting all_info osd.1 4.c( v 154'691 lc 123'110 (0'0,154'691] local-lis/les=144/145 n=691 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_acting all_info osd.4 4.c( v 159'745 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 15 7/118/0 160/161/161) 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] find_best_info prefer osd.4 because it is complete while best has missing 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] calc_replicated_acting newest update on osd.4 with 4.c( v 159'745 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 li s/c 156/117 les/c/f 157/118/0 160/161/161) calc_replicated_acting primary is osd.4 with 4.c( v 159'745 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) osd.0 (up) accepted 4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) osd.1 (up) accepted 4.c( v 154'691 lc 123'110 (0'0,154'691] local-lis/les=144/145 n=691 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) 2018-09-10 21:28:37.713 7f3d9523e700 20 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_async_recovery_replicated candidates by cost are: 2018-09-10 21:28:37.713 7f3d9523e700 20 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_async_recovery_replicated result want=[4,0,1] async_recovery= 2018-09-10 21:28:37.713 7f3d9523e700 10 osd.0 pg_epoch: 161 pg[4.c( v 159'745 lc 146'579 (0'0,159'745] local-lis/les=156/157 n=745 ec=117/117 lis/c 156/117 les/c/f 157/118/0 160/161/161) [0,1] r=0 lpr=161 pi=[117,161)/2 crt=159'745 lcod 146'578 mlcod 0'0 peering m=112 mbc={}] choose_acting want [4,0,1] != acting [0,1], requesting pg_temp change
but in that epoch,
pool 4 'unique_pool_2' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 118 flags hashpspool,creating stripe_width 0 application rados
/a/sage-2018-09-10_17:11:45-rados-wip-sage-testing-2018-09-10-0917-distro-basic-smithi/3002911
This leads to the PG getting stuck because the mon now rejects pg_temps that are > the pool size.
Updated by Sage Weil over 5 years ago
- Status changed from 12 to Fix Under Review
- Priority changed from Immediate to Urgent
Updated by Josh Durgin over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #35962: luminous: choose_acting picked want > pool size added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #35963: mimic: choose_acting picked want > pool size added
Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved
Updated by Nathan Cutler over 4 years ago
- Related to Bug #42577: acting_recovery_backfill won't catch all up peers added
Actions