Actions
Bug #9389
closedec pg stuck peering, did not send query for one shard
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
"recovery_state": [ { "name": "Started\/Primary\/Peering\/GetInfo", "enter_time": "2014-09-08 08:10:05.258543", "requested_info_from": [ { "osd": "2(0)"}]}, ... of "probing_osds": [ "0(1)", "1(2)", "2(0)", "4(0)", "5(3)"],
and it tries to send it:
2014-09-08 08:10:05.258639 7f8545562700 10 osd.5 pg_epoch: 825 pg[1.1es3( v 785'235 (0'0,785'235] local-les=812 n=0 ec=11 les/c 812/808 822/822/818) [2,0,1,5] r=3 lpr=825 pi=730-821/8 crt=753'233 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.2(0)
but when the pg_query goes it out it only has
2014-09-08 08:10:05.259864 7f8545562700 7 osd.5 825 do_queries querying osd.2 on 2 PGs 2014-09-08 08:10:05.259865 7f8545562700 1 -- 10.214.136.6:6811/49073 --> 10.214.133.10:6801/51165 -- pg_query(1.21s3,1.fds1 epoch 825) v3 -- ?+0 0x7b14d00 con 0x7137a20
ubuntu@teuthology:/a/teuthology-2014-09-08_02:32:01-rados-master-testing-basic-multi/472310
Updated by Samuel Just over 9 years ago
/a/samuelj-2014-09-20_19:00:23-rados-wip-sam-testing-firefly2-wip-testing-old-vanilla-basic-multi/501557
probably related, in GetMissing though.
Updated by Samuel Just over 9 years ago
At least on that one, looks like do_queries doesn't send the query. That can happen if the osd is down as of the osd epoch (almost certainly not the case here), or if up_from for the osd is after the peering_wq working map eopch (also probably not true in this case), or if messenger get_connection somehow returned NULL.
Updated by Sage Weil over 9 years ago
- Status changed from New to Need More Info
d851c3f2338e8d17dfd78d631b9f7977365356aa adds better debug output (and cleans up a bit)
Updated by Samuel Just over 9 years ago
- Status changed from Need More Info to Duplicate
Actions