Project

General

Profile

Actions

Bug #6633

closed

osd: pgls vs osd restart/peering race misses objects

Added by Sage Weil over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

test saw a sequence like:
- create object
- start osd
- things peer
- pgls
-> pgls returns empty result

I expect the problem is that pgls is not blocking until peering completes, and/or is not including the missing set items in the result, and/or is not making a necessyar osd.flush() call.

2013-10-24T01:58:31.150 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: [ RUN      ] LibRadosList.ListObjectsPP
2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:in_osds:  [0, 4, 5, 2, 1, 3]  out_osds:  [] dead_osds:  [3] live_osds:  [1, 0, 5, 2, 4]
2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 2 min_out 0 min_live 2 min_dead 0
2013-10-24T01:58:31.830 INFO:teuthology.task.thrashosds.thrasher:Reviving osd 3
2013-10-24T01:58:31.830 INFO:teuthology.task.ceph.osd.3:Restarting
2013-10-24T01:58:31.831 DEBUG:teuthology.orchestra.run:Running [10.214.132.35]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo /home/ubuntu/cephtest/daemon-helper kill ceph-osd -f -i 3'
2013-10-24T01:58:31.833 INFO:teuthology.task.ceph.osd.3:Started
2013-10-24T01:58:31.833 DEBUG:teuthology.orchestra.run:Running [10.214.132.35]: 'sudo /home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok dump_ops_in_flight'
2013-10-24T01:58:31.887 INFO:teuthology.task.ceph.osd.3.out:[10.214.132.35]: starting osd.3 at :/0 osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: no valid command found; 10 closest matches:
2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config show
2013-10-24T01:58:31.930 INFO:teuthology.orchestra.run.err:[10.214.132.35]: help
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: log dump
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: get_command_descriptions
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: git_version
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config set <var> <val> [<val>...]
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: version
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: 2
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: config get <var>
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: 0
2013-10-24T01:58:31.931 INFO:teuthology.orchestra.run.err:[10.214.132.35]: admin_socket: invalid command
2013-10-24T01:58:31.937 INFO:teuthology.task.thrashosds.ceph_manager:waiting on admin_socket for 3, ['dump_ops_in_flight']
2013-10-24T01:58:32.332 INFO:teuthology.task.ceph.osd.3.err:[10.214.132.35]: 2013-10-24 01:58:32.332141 7fc3613e9780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: test/librados/list.cc:43: Failure
2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Value of: false
2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Expected: (iter == ioctx.objects_end())
2013-10-24T01:58:35.771 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: Which is: true
2013-10-24T01:58:35.773 INFO:teuthology.task.workunit.client.0.out:[10.214.132.35]: [  FAILED  ] LibRadosList.ListObjectsPP (4623 ms)

ubuntu@teuthology:/a/teuthology-2013-10-23_19:00:21-rados-dumpling-testing-basic-plana/65413
Actions #1

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Samuel Just over 10 years ago

Logs on slider:/~samuelj/buglogs/13-10-27-15:50:48

Actions #3

Updated by Samuel Just over 10 years ago

  • Assignee set to Samuel Just

list.cc actually has slightly flaky tests if they are concurrent with pg splitting. wip-6633 so far has fixes to tolerate splitting.

This does not explain the bug, however, still looking.

Actions #4

Updated by Samuel Just over 10 years ago

  • Status changed from New to 7

wip-pgls testing

Actions #5

Updated by Samuel Just about 10 years ago

  • Status changed from 7 to Resolved
Actions #6

Updated by Sage Weil about 10 years ago

backported to dumpling.

Actions

Also available in: Atom PDF