Project

General

Profile

Bug #38395

luminous: write following remove might access previous onode

Added by Igor Fedotov 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
02/20/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

So the sequence is as follows:
T1:
remove A

T2:
touch A
write A

In Luminous there is a chance that A is removed from the cache (as no ref is taken) before T1 lands to DB.
Subsequent B might access outdated onode A from DB and proceed with invalid A state.
Which finally results in, e.g. duplicate removal from the allocator.
Final symptoms remind #38049 and #36541 but the root cause is different.

This has been fixed in Mimic+ by
https://github.com/ceph/ceph/pull/18196/commits/0347518d02eda4e0e2da5241f1d77bc7304d59fb
and follow-ups.

debug-osd.10.log.gz (240 KB) Igor Fedotov, 02/20/2019 10:25 AM

History

#1 Updated by Igor Fedotov 3 months ago

  • Status changed from New to Verified
  • Assignee set to Igor Fedotov

#2 Updated by Igor Fedotov 3 months ago

  • Status changed from Verified to Need Review
  • Pull request ID set to 26540

#4 Updated by Igor Fedotov 2 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF