Bug #52323
closed[pwl ssd] incorrect first_valid_entry calculation in retire_entries()
0%
Description
If we kill the running program which write data w/ pwl/ssd cache. It can't restart because read wrong . There are two reasons:
1:
2021-08-19T15:37:02.026+0800 7f812e5fc700 0 librbd::cache::pwl::ssd::WriteLog: 0x7f836c0151b0 schedule_update_root: New root: pool_size=1073741824 first_valid_entry=228626432 first_free_entry=228044800 flushed_sync_gen=19536
2021-08-19T15:37:02.030+0800 7f8398bc5700 0 librbd::cache::pwl::ssd::WriteLog: 0x7f836c0151b0 schedule_update_root: New root: pool_size=1073741824 first_valid_entry=228626432 first_free_entry=228696064 flushed_sync_gen=19536
>> new data will overwrite no-retire-log. This mean we judge that there is a problem with the condition that the cache is full. This because we repeat free allocation of space .
2:Wrong calculation of the location of WriteLogCacheEntry cause decode failed.
Updated by jianpeng ma over 2 years ago
PR https://github.com/ceph/ceph/pull/42843 fix two problem.
Updated by Ilya Dryomov over 2 years ago
- Subject changed from Can't restart pwl/ssd after kill program to [pwl ssd] Can't restart after kill program
- Status changed from New to Fix Under Review
- Assignee set to jianpeng ma
- Pull request ID set to 42843
Updated by Ilya Dryomov over 2 years ago
The first problem (allocated space accounting) is tracked in https://tracker.ceph.com/issues/52341.
Updated by Ilya Dryomov over 2 years ago
- Related to Bug #52341: [pwl] m_bytes_allocated is calculated incorrectly on reopen added
Updated by Ilya Dryomov over 2 years ago
- Subject changed from [pwl ssd] Can't restart after kill program to [pwl ssd] incorrect first_valid_entry calculation in retire_entries()
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #52422: pacific: [pwl ssd] incorrect first_valid_entry calculation in retire_entries() added
Updated by Ilya Dryomov about 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".