Actions
Bug #19737
closedEAGAIN encountered during pg scrub (jewel)
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Tests
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
test description: rados/singleton-nomsgr/{all/lfn-upgrade-infernalis.yaml rados.yaml}
- Infernalis is installed
- HEALTH_OK is reached
- test pool created with pg_num 1
- create_verify_lfn_objects task completes
- "sequential" block starts
- create_verify_lfn_objects task runs again (wtf?)
- cluster is upgraded to jewel
- all daemons except osd.2 are restarted (so osd.2 continues on infernalis)
- ceph_manager.wait_for_clean runs
- ceph_manager.do_pg_scrub runs
- sequential block ends
- ceph_manager.do_pg_scrub runs again (wtf?)
- create_verify_lfn_objects task runs on the mixed cluster
- osd.2 is restarted (becoming jewel)
- ceph osd set require_jewel_osds
- ceph_manager.do_pg_scrub runs
Reading the log, everything seems to work fine up to and including "ceph_manager.wait_for_clean"
At this point, all Ceph daemons except for osd.2 are running jewel; osd.2 is running infernalis.
The last step of the sequential block - do_pg_scrub task - starts, does some work, and fails with EAGAIN:
2017-04-21T06:47:16.670 INFO:tasks.ceph.ceph_manager.ceph:clean! 2017-04-21T06:47:16.670 INFO:teuthology.task.sequential:In sequential, running task ceph_manager.do_pg_scrub... ... 2017-04-21T06:47:17.486 INFO:teuthology.orchestra.run.smithi176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph pg scrub 1.0' 2017-04-21T06:47:17.633 INFO:teuthology.orchestra.run.smithi176.stderr:Error EAGAIN: pg 1.0 primary osd.1 not up
Immediately after that, we find ourselves in "create_verify_lfn_objects" instead of the expected second do_pg_scrub:
2017-04-21T06:47:17.641 INFO:tasks.create_verify_lfn_objects:ceph_verify_lfn_objects verifying...
That task completes, but does not appear to be relevant because right on its heels comes the Traceback from the EAGAIN:
Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/sequential.py", line 46, in task mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 2041, in task fn(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 1469, in do_pg_scrub self.raw_cluster_cmd('pg', stype, self.get_pgid(pool, pgnum)) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 865, in raw_cluster_cmd stdout=StringIO(), File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 414, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 149, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 171, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed on smithi176 with status 11: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph pg scrub 1.0' 2017-04-21T06:47:33.266 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/?q=0f9ca46556a642158e873d093d39cd2c
Actions