Project

General

Profile

Actions

Bug #36074

closed

[rbd-mirror] sporadic deep-copy failure possible during fsx stress test

Added by Jason Dillaman over 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If an image is being synced by an rbd-mirror daemon and is then blacklisted from the local cluster while it's deep-copying an object (specifically after it's written the last snapshot version but before it updates the object map), it might silently fail without passing the -ESHUTDOWN error code back to "ImageCopyRequest<I>::handle_object_copy". Without the failure error code, the sync point object position can get improperly updated so that the now the object map remains incorrect yet not flagged as invalid (due to the blacklisting).

http://qa-proxy.ceph.com/teuthology/jdillaman-2018-09-18_08:50:35-rbd-wip-rbd-mirror-distro-basic-smithi/3038017/teuthology.log


Related issues 1 (0 open1 closed)

Copied to rbd - Backport #36122: mimic: [rbd-mirror] sporadic deep-copy failure possible during fsx stress testResolvedJason DillamanActions
Actions #1

Updated by Jason Dillaman over 5 years ago

2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 ObjectCopyRequest: dst_oid=rbd_data.106d34a0242b.0000000000000000
2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_src_object_extents: 2 src extents
2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_list_snaps: src_oid=rbd_data.10659609c337.0000000000000000
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_list_snaps: r=0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=0, end_src_snap_id=97, clone_end_snap_id=97, end_dst_snap_id=122, diff=[0~4194304], end_size=4194304, exists=1
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: [0~4194304]
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: read/write op: 0~4194304 dst: 0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=97, end_src_snap_id=171, clone_end_snap_id=0, end_dst_snap_id=125, diff=[0~4194304], end_size=0, exists=0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: clearing truncate diff: [0~4194304]
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: extent 0~4194304 intersects truncation 0~4194304, inserting zero 0~4194304
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: src_snap_seq=97
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read op: 0~4194304
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read rbd_data.10659609c337.0000000000000000
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_object: r=0
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 merge_write_ops:
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_from_parent_ops: no parent
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_from_parent: r=0
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops:
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: src_snap_seq=97, end_size=4194304
2018-09-18 13:15:04.809 7febfdffb700  5 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: object DNE for snap_id: 125
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=0, dst_snaps=[]
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: write op: 0~4194304
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=122, dst_snaps=[122]
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: remove op
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_update_object_map: dst_snap_id=122, object_state=1
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::ObjectMap: 0x7feb78050410 aio_update: start=0, end=1, ->1
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 update_object_map: ictx=0x7feb800236e0, oid=rbd_object_map.106d34a0242b.000000000000007a, [0,1) = ->1
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 handle_update_object_map: r=-108
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_update_object_map: r=0
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 finish: r=0
Actions #2

Updated by Jason Dillaman over 5 years ago

2018-09-18 15:01:14.577 7f1cb37fe700 10 librbd::object_map::RefreshRequest: 0x7f1ca8009770 send_load: oid=rbd_object_map.106d34a0242b.000000000000007a
... snip ...
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectDispatch: 0x2401390 read: rbd_data.106d34a0242b.0000000000000000 0~8388608
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectRequest: 0x7f1ca800f8f0 send:
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::ObjectMap: 0x7f1ca8009400 object_may_exist: object_no=0 r=0
Actions #3

Updated by Jason Dillaman over 5 years ago

  • Description updated (diff)
Actions #4

Updated by Jason Dillaman over 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #5

Updated by Mykola Golub over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36122: mimic: [rbd-mirror] sporadic deep-copy failure possible during fsx stress test added
Actions #7

Updated by Mykola Golub about 5 years ago

  • Pull request ID set to 24179

A small mock test fixup, which would be nice to backport too: https://github.com/ceph/ceph/pull/26322

Actions #8

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF