Bug #15351
closedlibrbd: resize timed out
0%
Description
2016-03-31T16:41:40.793 INFO:teuthology.orchestra.run.smithi059.stdout:4034 trunc from 0x6e11185 to 0x63ae4fd 2016-03-31T16:41:42.362 INFO:tasks.thrashosds.thrasher:in_osds: [4, 0, 1, 5, 3] out_osds: [2] dead_osds: [] live_osds: [2, 5, 0, 1, 4, 3] 2016-03-31T16:41:42.362 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2016-03-31T16:41:42.362 INFO:tasks.thrashosds.thrasher:inject_pause on 3 2016-03-31T16:41:42.363 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3 2016-03-31T16:41:42.363 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False 2016-03-31T16:41:42.363 INFO:teuthology.orchestra.run.smithi032:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config set filestore_inject_stall 3' 2016-03-31T16:41:45.845 INFO:teuthology.orchestra.run.smithi059.stdout:rbd_resize(104523005) failed 2016-03-31T16:41:45.845 INFO:teuthology.orchestra.run.smithi059.stdout:dotruncate: ops->resize: Connection timed out 2016-03-31T16:41:45.845 INFO:teuthology.orchestra.run.smithi059.stdout:LOG DUMP (4034 total operations):
Updated by Jason Dillaman about 8 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
Updated by Jason Dillaman about 8 years ago
Having trouble figuring out where the ETIMEDOUT error came from. In theory, this error should bubble-up from the Objecter unless time-outs were configured (which they weren't). ERESTART would make more sense to me since the watch timed out, which would cause the resize to be canceled with ERESTART.
Updated by Josh Durgin about 8 years ago
Not sure where it's coming from either. Seems like it would have to be one of the notifies or something watch related.
Updated by Jason Dillaman about 8 years ago
- Priority changed from Urgent to High
Updated by Josh Durgin about 8 years ago
Updated by Jason Dillaman about 8 years ago
Pretty consistent that is happens after a "Testing filestore_inject_stall pause injection for duration 3" thrasher operation.
Updated by Josh Durgin about 8 years ago
No injection here, but could be ovh VMs being slow: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-11_16:00:01-rbd-master---basic-openstack/33767/teuthology.log
Updated by Jason Dillaman about 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Jason Dillaman about 8 years ago
That was a subtle one -- couldn't reproduce w/ killing OSDs, hanging OSDs, etc. Needed a notify timeout during the final header update notification.
Updated by Jason Dillaman about 8 years ago
- Status changed from Fix Under Review to Resolved