Project

General

Profile

Actions

Bug #50670

closed

[pwl ssd] head / tail pointer corruption

Added by Ilya Dryomov almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After filling the ssd cache to capacity and maintaining the load (so that the cache is constantly retiring old and generating new log entries), I have observed that the head (m_first_valid_entry) and tail (m_first_free_entry) pointers can get corrupted. Since the cache is at capacity, they are expected to cycle through the log space very close to each other, with the invariant that head > tail (head - tail = 8-12K, the rest is allocated). However, the tail pointer sometimes "jumps" the head pointer, violating the invariant and making the full cache appear almost empty (only 8-12K used).


Related issues 2 (0 open2 closed)

Related to rbd - Bug #50675: [pwl ssd] cache larger than 4G will corrupt itselfResolvedCONGMIN YIN

Actions
Copied to rbd - Backport #52310: pacific: [pwl ssd] head / tail pointer corruptionResolvedDeepika UpadhyayActions
Actions #1

Updated by Ilya Dryomov almost 3 years ago

  • Assignee set to Ilya Dryomov

I suspect this is related to AbstractWriteLog::check_allocation() logic and may be addressed by https://github.com/ceph/ceph/pull/41101, at least partially. Yet to test.

Actions #2

Updated by Ilya Dryomov almost 3 years ago

  • Related to Bug #50675: [pwl ssd] cache larger than 4G will corrupt itself added
Actions #3

Updated by CONGMIN YIN almost 3 years ago

In check_allocation(), "if (m_bytes_allocated + bytes_allocated >= m_bytes_allocated_cap)" control ssd allocation in ssd mode. But after this, lock is released. before "m_bytes_allocated += bytes_allocated;", m_bytes_allocated might plus again and exceed m_bytes_allocated_cap. so tail entry might exceed tail. see https://github.com/ceph/ceph/pull/41968/commits/91b8862cd2b09b4555236dc06088f68339ff4f1c

Actions #4

Updated by Ilya Dryomov almost 3 years ago

  • Project changed from rbd to Ceph
Actions #5

Updated by Ilya Dryomov almost 3 years ago

  • Project changed from Ceph to rbd
Actions #6

Updated by Ilya Dryomov almost 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Ilya Dryomov to CONGMIN YIN
  • Pull request ID set to 41968
Actions #7

Updated by Ilya Dryomov over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Ilya Dryomov over 2 years ago

  • Backport set to pacific
Actions #9

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52310: pacific: [pwl ssd] head / tail pointer corruption added
Actions #10

Updated by Ilya Dryomov about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF