Project

General

Profile

Actions

Bug #50669

closed

[pwl ssd] multiple crash / power failure recovery issues

Added by Ilya Dryomov almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

At the very least:

- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)
- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogus


Related issues 2 (0 open2 closed)

Related to rbd - Bug #50668: [pwl] cache can't be opened after a crash or power failureResolvedIlya Dryomov

Actions
Copied to rbd - Backport #50856: pacific: [pwl ssd] multiple crash / power failure recovery issuesResolvedIlya DryomovActions
Actions #1

Updated by Ilya Dryomov almost 3 years ago

  • Related to Bug #50668: [pwl] cache can't be opened after a crash or power failure added
Actions #2

Updated by Ilya Dryomov almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by Mahati Chamarthy almost 3 years ago

Ilya Dryomov wrote:

At the very least:

- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)

They are read from media when there's an existing cache with data [0]

- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogus

https://github.com/ceph/ceph/pull/41101/ PR fixes the above two issues

[0] https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L238-L239

Actions #4

Updated by Ilya Dryomov almost 3 years ago

Mahati Chamarthy wrote:

Ilya Dryomov wrote:

At the very least:

- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)

They are read from media when there's an existing cache with data [0]
[0] https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L238-L239

If you trace that code carefully you will see that they are read but not used. I already have the fix, just haven't pushed yet because I'm running into other issues.

- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogus

https://github.com/ceph/ceph/pull/41101/ PR fixes the above two issues

It does solve some of m_free_log_entries questions, but it doesn't fix write_data_pos problem. Again, I already have the fix -- if I don't track down the other failures I'm seeing by tomorrow, I'll open a draft PR just to get us on the same page.

Actions #5

Updated by Ilya Dryomov almost 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 41354
Actions #6

Updated by Ilya Dryomov almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific
Actions #7

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50856: pacific: [pwl ssd] multiple crash / power failure recovery issues added
Actions #8

Updated by Ilya Dryomov about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF