Bug #16847
closedhammer: ceph-qa-suite do_pg_scrub() does nothing due to scrub stamp change
0%
Description
This bug is haunting the hammer-backports integration tests. Copied from #15679.
2016-05-01T02:38:57.946 INFO:tasks.ceph.ceph_manager:waiting for scrub type deep-scrub
2016-05-01T02:38:57.947 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph osd dump --format=json'
2016-05-01T02:38:58.199 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph pg deep-scrub 4.0'
2016-05-01T02:38:58.333 INFO:teuthology.orchestra.run.smithi067.stderr:instructing pg 4.0 on osd.4 to deep-scrub
2016-05-01T02:39:06.574 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:06.575235 7fbe0baa2700 -1 log_channel(cluster) log [ERR] : 4.0 shard 0: soid 4:a0216fbc:::repair_test_obj:head candidate had a read error
2016-05-01T02:39:06.575 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:06.575665 7fbe0baa2700 -1 log_channel(cluster) log [ERR] : 4.0 deep-scrub 0 missing, 1 inconsistent objects
2016-05-01T02:39:06.575 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:06.575675 7fbe0baa2700 -1 log_channel(cluster) log [ERR] : 4.0 deep-scrub 1 errors
No repair happened below. Possibly because a schedule deep-scrub made do_pg_scrub() see a scrub stamp change
2016-05-01T02:39:09.072 INFO:tasks.repair_test:repairing
2016-05-01T02:39:09.072 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph osd dump --format=json'
2016-05-01T02:39:09.220 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph pg dump --format=json'
2016-05-01T02:39:09.357 INFO:teuthology.orchestra.run.smithi067.stderr:dumped all in format json
2016-05-01T02:39:09.370 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph osd dump --format=json'
2016-05-01T02:39:12.521 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph pg dump --format=json'
2016-05-01T02:39:12.667 INFO:teuthology.orchestra.run.smithi067.stderr:dumped all in format json
2016-05-01T02:39:12.679 INFO:tasks.repair_test:re-scrubbing
2016-05-01T02:39:13.585 INFO:teuthology.orchestra.run.smithi067.stderr:instructing pg 4.0 on osd.4 to deep-scrub
2016-05-01T02:39:16.576 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:16.577236 7fbe0e2a7700 -1 log_channel(cluster) log [ERR] : 4.0 shard 0: soid 4:a0216fbc:::repair_test_obj:head candidate had a read error
2016-05-01T02:39:16.577 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:16.577680 7fbe0e2a7700 -1 log_channel(cluster) log [ERR] : 4.0 deep-scrub 0 missing, 1 inconsistent objects
2016-05-01T02:39:16.578 INFO:tasks.ceph.osd.4.smithi030.stderr:2016-05-01 09:39:16.577690 7fbe0e2a7700 -1 log_channel(cluster) log [ERR] : 4.0 deep-scrub 1 errors
2016-05-01T02:39:23.596 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph osd dump --format=json'
2016-05-01T02:39:23.677 INFO:teuthology.orchestra.run.smithi067.stderr:2016-05-01 09:39:23.666872 7f7379ffb700 0 monclient: hunting for new mon
2016-05-01T02:39:23.753 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph pg dump --format=json'
2016-05-01T02:39:24.143 INFO:teuthology.orchestra.run.smithi067.stderr:dumped all in format json
2016-05-01T02:39:24.155 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph osd dump --format=json'
2016-05-01T02:39:24.307 INFO:teuthology.orchestra.run.smithi067:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph pg dump --format=json'
2016-05-01T02:39:24.446 INFO:teuthology.orchestra.run.smithi067.stderr:dumped all in format json
2016-05-01T02:39:24.458 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 66, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 45, in run_one_task
return fn(**kwargs)
File "/var/lib/teuthworker/src/ceph-qa-suite_wip-8885/tasks/repair_test.py", line 304, in task
repair_test_1(ctx, dataerr, choose_replica, "deep-scrub")
File "/var/lib/teuthworker/src/ceph-qa-suite_wip-8885/tasks/repair_test.py", line 108, in repair_test_1
assert not ctx.manager.pg_inconsistent(pool, 0)
AssertionError
Updated by Nathan Cutler over 7 years ago
- Copied from Bug #15679: ceph-qa-suite do_pg_scrub() does nothing due to scrub stamp change added
Updated by Nathan Cutler over 7 years ago
Showed up in: http://158.69.78.47:8081/ubuntu-2016-07-27_15:27:10-rados-hammer-backports---basic-openstack/
Fixed by cherry-picking https://github.com/ceph/ceph-qa-suite/commit/89dcc0daf31eabc04810f759e0694dc8303fee4d and https://github.com/ceph/ceph-qa-suite/commit/5c0edbae3eb35654b4efb2e6eb097014953aea12 to testing branch (based on hammer).
Relevant excerpt from http://tracker.ceph.com/issues/15895
- possible bug http://tracker.ceph.com/issues/15679
- http://158.69.78.47/ubuntu-2016-07-27_15:27:10-rados-hammer-backports---basic-openstack/74/
- running with
--num 10
to see if it's reproducible: http://158.69.78.47:8081/ubuntu-2016-07-28_09:05:53-rados-hammer-backports---basic-openstack/- result: reproducible
- cherry-picked https://github.com/ceph/ceph-qa-suite/commit/89dcc0daf31eabc04810f759e0694dc8303fee4d and https://github.com/ceph/ceph-qa-suite/commit/5c0edbae3eb35654b4efb2e6eb097014953aea12 to testing branch
- running again with
--num 10
: http://158.69.78.47:8081/ubuntu-2016-07-28_10:49:00-rados-hammer-backports---basic-openstack/- green (fix works)
Updated by Nathan Cutler over 7 years ago
- Status changed from New to Fix Under Review
- Assignee set to Nathan Cutler
Updated by Nathan Cutler over 7 years ago
- Status changed from Fix Under Review to Resolved
- Source changed from other to Community (dev)