Actions
Bug #6633
closedosd: pgls vs osd restart/peering race misses objects
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
test saw a sequence like:
- create object
- start osd
- things peer
- pgls
-> pgls returns empty result
I expect the problem is that pgls is not blocking until peering completes, and/or is not including the missing set items in the result, and/or is not making a necessyar osd.flush() call.
2013-10-24T01:58:31.150 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: [ RUN ] LibRadosList.ListObjectsPP 2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:in_osds: [0, 4, 5, 2, 1, 3] out_osds: [] dead_osds: [3] live_osds: [1, 0, 5, 2, 4] 2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 2 min_out 0 min_live 2 min_dead 0 2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:Reviving osd 3 2013-10-24T01:58:31.830 INFO:teuthology.task.ceph.osd.3:Restarting 2013-10-24T01:58:31.831 DEBUG:teuthology.orchestra.run:Running [10.214.132.35]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo /home/ubuntu/cephtest/daemon-helper kill ceph-osd -f -i 3' 2013-10-24T01:58:31.833 INFO:teuthology.task.ceph.osd.3:Started 2013-10-24T01:58:31.833 DEBUG:teuthology.orchestra.run:Running [10.214.132.35]: 'sudo /home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok dump_ops_in_flight' 2013-10-24T01:58:31.887 INFO:teuthology.task.ceph.osd.3.out:[10.214.132.35]: starting osd.3 at :/0 osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal 2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: no valid command found; 10 closest matches: 2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config show 2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: help 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: log dump 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: get_command_descriptions 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: git_version 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config set <var> <val> [<val>...] 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: version 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: 2 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config get <var> 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: 0 2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: admin_socket: invalid command 2013-10-24T01:58:31.937 INFO:teuthology.task.thrashosds.ceph_manager:waiting on admin_socket for 3, ['dump_ops_in_flight'] 2013-10-24T01:58:32.332 INFO:teuthology.task.ceph.osd.3.err:[10.214.132.35]: 2013-10-24 01:58:32.332141 7fc3613e9780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: test/librados/list.cc:43: Failure 2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Value of: false 2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Expected: (iter == ioctx.objects_end()) 2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Which is: true 2013-10-24T01:58:35.773 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: [ FAILED ] LibRadosList.ListObjectsPP (4623 ms)
ubuntu@teuthology:/a/teuthology-2013-10-23_19:00:21-rados-dumpling-testing-basic-plana/65413
Updated by Samuel Just over 10 years ago
Logs on slider:/~samuelj/buglogs/13-10-27-15:50:48
Updated by Samuel Just over 10 years ago
- Assignee set to Samuel Just
list.cc actually has slightly flaky tests if they are concurrent with pg splitting. wip-6633 so far has fixes to tolerate splitting.
This does not explain the bug, however, still looking.
Actions