Bug #50669
closed[pwl ssd] multiple crash / power failure recovery issues
0%
Description
At the very least:
- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)
- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogus
Updated by Ilya Dryomov almost 3 years ago
- Related to Bug #50668: [pwl] cache can't be opened after a crash or power failure added
Updated by Mahati Chamarthy almost 3 years ago
Ilya Dryomov wrote:
At the very least:
- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)
They are read from media when there's an existing cache with data [0]
- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogus
https://github.com/ceph/ceph/pull/41101/ PR fixes the above two issues
[0] https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L238-L239
Updated by Ilya Dryomov almost 3 years ago
Mahati Chamarthy wrote:
Ilya Dryomov wrote:
At the very least:
- m_first_valid_entry and m_first_free_entry aren't read from media (correct values are there but aren't used)
They are read from media when there's an existing cache with data [0]
[0] https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L238-L239
If you trace that code carefully you will see that they are read but not used. I already have the fix, just haven't pushed yet because I'm running into other issues.
- write_data_pos isn't written to media (incorrect value is there and is used)
- m_free_log_entries logic is bogushttps://github.com/ceph/ceph/pull/41101/ PR fixes the above two issues
It does solve some of m_free_log_entries questions, but it doesn't fix write_data_pos problem. Again, I already have the fix -- if I don't track down the other failures I'm seeing by tomorrow, I'll open a draft PR just to get us on the same page.
Updated by Ilya Dryomov almost 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 41354
Updated by Ilya Dryomov almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to pacific
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50856: pacific: [pwl ssd] multiple crash / power failure recovery issues added
Updated by Ilya Dryomov about 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".