Project

General

Profile

Actions

Bug #50395

closed

filestore: ENODATA error after directory split confuses transaction

Added by Mykola Golub about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, octopus, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We had a case reported by our customer, when a faulty disk was returning ENODATA error on directory split and it created some mess due to transactions aborting a transaction operation when encountering the directory split error but not aborting the whole transaction, executing another operations.

The kernel log:

2020-12-16T07:02:36.736166+09:00 node5 kernel: [10270806.635341] sd 0:2:10:0: [sdk] tag#1 BRCM Debug mfi stat 0x2d, data len requested/completed 0x4000/0x0
2020-12-16T07:02:36.736180+09:00 node5 kernel: [10270806.635349] sd 0:2:10:0: [sdk] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
2020-12-16T07:02:36.736181+09:00 node5 kernel: [10270806.635351] sd 0:2:10:0: [sdk] tag#1 Sense Key : Medium Error [current]
2020-12-16T07:02:36.736184+09:00 node5 kernel: [10270806.635353] sd 0:2:10:0: [sdk] tag#1 Add. Sense: Unrecovered read error
2020-12-16T07:02:36.736203+09:00 node5 kernel: [10270806.635355] sd 0:2:10:0: [sdk] tag#1 CDB: Read(16) 88 00 00 00 00 00 02 67 ec 00 00 00 00 20 00 00
2020-12-16T07:02:36.736234+09:00 node5 kernel: [10270806.635357] blk_update_request: critical medium error, dev sdk, sector 40365056
2020-12-16T07:02:36.736240+09:00 node5 kernel: [10270806.635379] XFS (sdk2): metadata I/O error: block 0x1edda00 ("xfs_trans_read_buf_map") error 61 numblks 32
2020-12-16T07:02:36.736241+09:00 node5 kernel: [10270806.635384] XFS (sdk2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.

The osd log:
2020-12-16 07:02:36.419810 7f283a43d700  1 _created [2,5,C,A,6] has 447 objects, starting split in pg 5.452s0_head.
2020-12-16 07:02:36.736125 7f283a43d700  1 _created [2,5,C,A,6] split completed in pg 5.452s0_head.
2020-12-16 07:02:36.736150 7f283a43d700 -1 filestore(/var/lib/ceph/osd/ceph-57) error creating 0#5:4a3568ed:::<CENSORED>:head# (/var/lib/ceph/osd/ceph-57/current/5.452s0_head/DIR_2/DIR_5/DIR_C/DIR_A/DIR_6/<CENSORED>__head_B716AC52__5_ffffffffffffffff_0) in index: (61) No data available

So a transaction operation created a new object file, detected that the directory needed splitting, tried to split, failed, aborted the operation in the middle, returned the ENODATA error to FileStore::_do_transaction, but it was ignored and the transaction continued.

We do not have an idea where exactly the split was failing but it seemed it did not cause data loss, but it aborted a transaction operation in the middle and it could make some mess.
We had a case reported by our customer, when a faulty disk was returning ENODATA error on directory split and it created some mess due to transactions aborting a transaction operation when encountering
the directory split error but not aborting the whole transaction, executing another operations.

We were seeing at least two types of such "messy" transactions:

1) On rados writing a new objects, one of the first transaction operations is OP_TOUCH. It creates the object file, tries to split the directory, aborts and skips creating the object spill_out attribute due to this.

2) On rados deleting an object, one of the transactions operations is OP_COLL_MOVE_RENAME, which creates a temporary link, which triggers the directory split and the error, the op is aborted in the middle leaving the original object file not removed.


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #50479: octopus: filestore: ENODATA error after directory split confuses transactionResolvedMykola GolubActions
Copied to RADOS - Backport #50480: pacific: filestore: ENODATA error after directory split confuses transactionResolvedMykola GolubActions
Copied to RADOS - Backport #50481: nautilus: filestore: ENODATA error after directory split confuses transactionResolvedMykola GolubActions
Actions #1

Updated by Kefu Chai about 3 years ago

  • Status changed from New to Pending Backport
  • Backport set to nautilus, octopus, pacific
  • Pull request ID set to 40916
Actions #2

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50479: octopus: filestore: ENODATA error after directory split confuses transaction added
Actions #3

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction added
Actions #4

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50481: nautilus: filestore: ENODATA error after directory split confuses transaction added
Actions #5

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF