Bug #17831
closedosd: ENOENT on clone
0%
Description
-1542> 2016-11-08 21:42:03.424076 7f6da6f54700 10 filestore(/var/lib/ceph/osd/ceph-1) clone_range 1.7a_head/#1:5fe289c3:::smithi06215392-46:127# -> 1.7a_head/#1:5fe289c3:::smithi06215392-46:123# 0~3663992 to 0 = -2 -1541> 2016-11-08 21:42:03.424081 7f6da6f54700 -1 filestore(/var/lib/ceph/osd/ceph-1) error (2) No such file or directory not handled on operation 0x7f6dbf9b9a30 (13361.0.6, or op 6, counting from 0) -1540> 2016-11-08 21:42:03.424084 7f6da6f54700 0 filestore(/var/lib/ceph/osd/ceph-1) ENOENT on clone suggests osd bug ... -1509> 2016-11-08 21:42:03.424086 7f6da6f54700 0 filestore(/var/lib/ceph/osd/ceph-1) transaction dump: { "ops": [ { "op_num": 0, "op_name": "remove", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#" }, { "op_num": 1, "op_name": "touch", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#" }, { "op_num": 2, "op_name": "truncate", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#", "offset": 3663992 }, { "op_num": 3, "op_name": "omap_setheader", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#", "header_length": "0" }, { "op_num": 4, "op_name": "op_setallochint", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#", "expected_object_size": "0", "expected_write_size": "0" }, { "op_num": 5, "op_name": "setattrs", "collection": "1.7a_head", "oid": "#1:5fe289c3:::smithi06215392-46:123#", "attr_lens": { "_": 289, "__header": 58 } }, { "op_num": 6, "op_name": "clonerange2", "collection": "1.7a_head", "src_oid": "#1:5fe289c3:::smithi06215392-46:127#", "dst_oid": "#1:5fe289c3:::smithi06215392-46:123#", "src_offset": 0, "len": 3663992, "dst_offset": 0 }, { "op_num": 7, "op_name": "omap_setkeys", "collection": "meta", "oid": "#-1:c0371625:::snapmapper:0#", "attr_lens": { "OBJ_0000000000000001.AF74193C.123.smithi06215392-46..": 106 } }, { "op_num": 8, "op_name": "omap_setkeys", "collection": "meta", "oid": "#-1:c0371625:::snapmapper:0#", "attr_lens": { "MAP_0000000000000112_0000000000000001.AF74193C.123.smithi06215392-46..": 70, "MAP_0000000000000113_0000000000000001.AF74193C.123.smithi06215392-46..": 70, "MAP_0000000000000116_0000000000000001.AF74193C.123.smithi06215392-46..": 70, "MAP_000000000000011E_0000000000000001.AF74193C.123.smithi06215392-46..": 70, "MAP_0000000000000123_0000000000000001.AF74193C.123.smithi06215392-46..": 70 } }, { "op_num": 9, "op_name": "omap_rmkeys", "collection": "1.7a_head", "oid": "#1:5e000000::::head#" }, { "op_num": 10, "op_name": "omap_setkeys", "collection": "1.7a_head", "oid": "#1:5e000000::::head#", "attr_lens": { "_info": 855 } } ] }/a/sage-2016-11-08_20:40:20-rados:thrash-wip-sage-testing---basic-smithi/532700
Updated by Xinze Chi over 7 years ago
The recovering object may use the snap object which is generate by clone op to complete the recovery process. But the snap object may be not applied to store by remote peer. Because the recovery op priority is high than clone op.
So the remote peer may recovery the object first and then do clone op.
Updated by Samuel Just over 7 years ago
- Related to Bug #15774: osd_op_queue_cut_off osd_op_queue debug_random generate assert failure. added
Updated by Samuel Just over 7 years ago
- Priority changed from Urgent to Immediate
Updated by Samuel Just over 7 years ago
- Assignee changed from David Zafman to Samuel Just
Updated by Samuel Just over 7 years ago
- Priority changed from Immediate to High
Xinze Chi's diagnosis is entirely correct. This is actually pretty simple, just need to modify ReplicatedBackend to take RWState locks on clone sources. No need for fancy blocking either, we always have the option of simply not using that clone.
Updated by Samuel Just over 7 years ago
- Status changed from 7 to In Progress
https://github.com/athanatos/ceph/tree/wip-17831 -- it'll be a little bit before I have time to test and debug it.
Updated by Samuel Just over 7 years ago
- Has duplicate Bug #18373: osd: repop vs push race added
Updated by Samuel Just over 7 years ago
- Priority changed from High to Immediate
Sage saw this again, see http://tracker.ceph.com/issues/18373
Updated by Samuel Just over 7 years ago
- Status changed from 7 to Fix Under Review
Updated by Samuel Just over 7 years ago
Updated by Alexey Sheplyakov over 7 years ago
I guess the fix should be backported to jewel since it does not lock the clone source.
Updated by Sage Weil over 7 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Alexey Sheplyakov over 7 years ago
- Copied to Backport #18581: jewel: osd: ENOENT on clone added
Updated by Samuel Just over 7 years ago
- Related to Bug #18583: osd: calc_clone_subsets misuses try_read_lock vs missing added
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #18610: kraken: osd: ENOENT on clone added
Updated by Samuel Just about 7 years ago
- Related to Bug #18809: FAILED assert(object_contexts.empty()) (live on master only from Jan-Feb 2017, all other instances are different) added
Updated by Samuel Just about 7 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
kraken,jewel)
http://tracker.ceph.com/issues/18927 and http://tracker.ceph.com/issues/18809 were caused by this series, I don't think we should backport it.