Bug #9219: lost_unfound test got ENOENT: i don't have pgid 1.e - Ceph - Ceph

Actions

Copy link

Bug #9219

closed

lost_unfound test got ENOENT: i don't have pgid 1.e

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Greg Farnum

Category:

Monitor

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2014-08-24T10:31:01.379 INFO:tasks.rep_lost_unfound_delete.ceph_manager:active!
2014-08-24T10:31:01.380 INFO:teuthology.orchestra.run.burnupi33:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph tell osd.0 flush_pg_stats'
2014-08-24T10:31:01.486 INFO:teuthology.orchestra.run.burnupi33:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph tell osd.2 flush_pg_stats'
2014-08-24T10:31:01.594 INFO:teuthology.orchestra.run.burnupi33:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'
2014-08-24T10:31:01.771 INFO:tasks.rep_lost_unfound_delete.ceph_manager:{u'election_epoch': 6, u'quorum': [0, 1, 2], u'mdsmap': {u'max': 0, u'epoch': 1, u'by_rank': [], u'up': 0, u'in': 0}, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'a', u'rank': 0, u'addr': u'10.214.135.24:6789/0'}, {u'name': u'b', u'rank': 1, 
u'addr': u'10.214.135.24:6790/0'}, {u'name': u'c', u'rank': 2, u'addr': u'10.214.135.24:6791/0'}], u'modified': u'2014-08-24 10:28:28.102217', u'fsid': u'd780d717-3ced-48dc-be0e-0a0fe5bae1bd', u'created': u'2014-08-24 10:28:28.102217'}, u'health': {u'detail': [], u'timechecks': {u'round_status': u'finished', u'epoc
h': 6, u'round': 2, u'mons': [{u'latency': u'0.000000', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'a'}, {u'latency': u'0.001008', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'b'}, {u'latency': u'0.002810', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'c'}]}, u'health': {u'hea
lth_services': [{u'mons': [{u'last_updated': u'2014-08-24 10:30:32.223012', u'name': u'a', u'avail_percent': 94, u'kb_total': 944268724, u'kb_avail': 888579120, u'health': u'HEALTH_OK', u'kb_used': 7700408, u'store_stats': {u'bytes_total': 428989, u'bytes_log': 424183, u'last_updated': u'0.000000', u'bytes_misc': 4
806, u'bytes_sst': 0}}, {u'last_updated': u'2014-08-24 10:30:32.227053', u'name': u'b', u'avail_percent': 94, u'kb_total': 944268724, u'kb_avail': 888579120, u'health': u'HEALTH_OK', u'kb_used': 7700408, u'store_stats': {u'bytes_total': 624031, u'bytes_log': 619225, u'last_updated': u'0.000000', u'bytes_misc': 4806
, u'bytes_sst': 0}}, {u'last_updated': u'2014-08-24 10:30:32.436684', u'name': u'c', u'avail_percent': 94, u'kb_total': 944268724, u'kb_avail': 888579016, u'health': u'HEALTH_OK', u'kb_used': 7700512, u'store_stats': {u'bytes_total': 652174, u'bytes_log': 647368, u'last_updated': u'0.000000', u'bytes_misc': 4806, u
'bytes_sst': 0}}]}]}, u'overall_status': u'HEALTH_WARN', u'summary': [{u'severity': u'HEALTH_WARN', u'summary': u'28 pgs degraded'}, {u'severity': u'HEALTH_WARN', u'summary': u'10 pgs recovering'}, {u'severity': u'HEALTH_WARN', u'summary': u'4 pgs recovery_wait'}, {u'severity': u'HEALTH_WARN', u'summary': u'1 pgs s
tuck unclean'}, {u'severity': u'HEALTH_WARN', u'summary': u'28 pgs undersized'}, {u'severity': u'HEALTH_WARN', u'summary': u'recovery 55/56 objects degraded (98.214%); 27/28 unfound (96.429%)'}]}, u'pgmap': {u'bytes_total': 1999307276288, u'degraded_objects': 55, u'num_pgs': 28, u'data_bytes': 16940, u'degraded_tot
al': 56, u'bytes_used': 212553728, u'unfound_ratio': u'96.429', u'unfound_objects': 27, u'version': 31, u'pgs_by_state': [{u'count': 14, u'state_name': u'active+undersized+degraded'}, {u'count': 4, u'state_name': u'active+recovery_wait+undersized+degraded'}, {u'count': 10, u'state_name': u'active+recovering+undersi
zed+degraded'}], u'degraded_ratio': u'98.214', u'bytes_avail': 1996923547648, u'unfound_total': 28}, u'quorum_names': [u'a', u'b', u'c'], u'osdmap': {u'osdmap': {u'full': False, u'nearfull': False, u'num_osds': 3, u'num_up_osds': 2, u'epoch': 18, u'num_in_osds': 2}}, u'fsid': u'd780d717-3ced-48dc-be0e-0a0fe5bae1bd'
}
2014-08-24T10:31:01.772 INFO:tasks.rep_lost_unfound_delete:there are 27 unfound objects
2014-08-24T10:31:01.773 INFO:teuthology.orchestra.run.burnupi33:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json'
2014-08-24T10:31:01.943 INFO:teuthology.orchestra.run.burnupi33.stderr:dumped all in format json
2014-08-24T10:31:01.957 INFO:tasks.rep_lost_unfound_delete:listing missing/lost in 1.e state active+recovering+undersized+degraded
2014-08-24T10:31:01.958 INFO:teuthology.orchestra.run.burnupi33:Running: u"adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph -- pg 1.e list_missing '{}'" 
2014-08-24T10:31:02.121 INFO:teuthology.orchestra.run.burnupi33.stderr:Error ENOENT: i don't have pgid 1.e

ubuntu@teuthology:/a/teuthology-2014-08-24_02:30:02-rados-next-testing-basic-multi/446542

Actions

Copy link

Updated by Sage Weil over 9 years ago

Priority changed from High to Urgent

ubuntu@teuthology:/a/teuthology-2014-09-07_02:30:03-rados-next-testing-basic-multi/470719

same pg even (1.e)

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Category set to Monitor
Status changed from New to 12
Assignee set to Greg Farnum

The OSDs are getting the new OSDMap very late compared to the client

mon.c{2}:

2014-09-07 03:16:41.490103 7fc45b90b700 10 mon.c@2(peon).paxosservice(osdmap 1..19) refresh

osd.0:

2014-09-07 03:18:38.202154 7f4b0dc9c700  1 -- 10.214.131.11:6805/1124 <== mon.2 10.214.131.11:6791/0 5 ==== osd_map(19..19 src has 1..19) v3 ==== 214+0+0 (388528641 0 0) 0x33e4240 con 0x33e0c60

That's the first time any of the OSDs get osdmap 19! Meanwhile, the client got it some time prior to
osd.2:

2014-09-07 03:16:44.401140 7f73ac6fa700  1 -- 10.214.131.11:6800/1555 <== client.4283 10.214.131.11:0/1002018 2 ==== command(tid 2: {"prefix": "pg", "cmd": "list_missing", "pgid": "1.e", "offset": "{}"}) v1 ==== 94+0+0 (2949332957 0 0) 0x2187c20 con 0x21ed080
...
2014-09-07 03:18:38.206897 7f73aaef7700  1 -- 10.214.131.11:6804/1555 <== osd.0 10.214.131.11:6809/1124 1 ==== osd_map(19..19 src has 1..19) v3 ==== 214+0+0 (388528641 0 0) 0x21f6640 con 0x22411e0

It looks like osdmap 19 was created via a client command:
mon.a{0}:

2014-09-07 03:16:40.507947 7f4321414700  7 mon.a@0(leader).osd e18 prepare_update mon_command({"prefix": "osd in", "ids": ["2"]} v 0) v1 from client.4271 10.214.131.11:0/1001673

Certainly that should have prompted the monitor to share the map with osd.2, but it didn't...perhaps because the leader wasn't already connected to it? Will need to explore the map sharing code a little more thoroughly.

And it appears that the commands don't have any of the normal infrastructure for keeping maps in sync that normal client<->OSD communications have, so maybe we need a ticket about that too.

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Okay, so at the time osdmap 19 was created, we had two of three OSDs running (osd.1 was down and out, and teuthology had stopped it running). osd.0 and osd.2 were both connected to peon mon.2.
So monitors 1 and 2 did not share the map, because they had no connected sessions.
mon.2 chose to share the map with osd.2 (randomly appropriate, since the map change was marking osd.2 in).
osd.2 then injected a pipe failure and so 1) didn't receive the message, 2) reconnected to mon.1

... time passed without anybody triggering a map propagation...

The client ran list_missing, got a fresh osdmap from the monitors, and talked to osd.2 (who didn't have the PG and so replied with the ENOENT response).

I think the way to solve this is to have OSDs share their osdmap epoch on every connect like on every boot?
But we might also need to (separately) have pg ops to the OSDs check for epochs.

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Status changed from 12 to Fix Under Review

I created a few other tickets for the specific pg command issue, and created a PR so the OSD will subscribe to any osdmaps it missed when reconnecting to a monitor:
https://github.com/ceph/ceph/pull/2499

(That's for giant; I assume we can wait until giant merges into master to get this there.)

Actions

Copy link