Bug #54288
closedrocksdb: Corruption: missing start of fragmented record
0%
Description
After a kernel panic one of our OSD is unable to start and is showing:
rocksdb: [db/db_impl_open.cc:518] db/099134.log: dropping 1728 bytes; Corruption: missing start of fragmented record(2
)
Updated by Igor Fedotov about 2 years ago
- Status changed from New to Triaged
- Assignee set to Igor Fedotov
Investigation revealed that RocksDB embedded to Ceph Octopus release implicitly substitutes default wal_recovery_mode value (=kPointInTimeRecovery=2) to a less reliable kTolerateCorruptedTailRecords(=2) if recycle_log_file_num is non-zero. And the latter is what we have got by default in bluestore_rocksdb_options parameter.
Relevant RocksDB code is in SanitizeOptions() function from db/db_impl_open.cc:
if (result.recycle_log_file_num &&
(result.wal_recovery_mode WALRecoveryMode::kPointInTimeRecovery ||
result.wal_recovery_mode WALRecoveryMode::kAbsoluteConsistency)) {
// kPointInTimeRecovery is indistinguishable from
// kTolerateCorruptedTailRecords in recycle mode since we define
// the "end" of the log as the first corrupt record we encounter.
// kAbsoluteConsistency doesn't make sense because even a clean
// shutdown leaves old junk at the end of the log file.
result.wal_recovery_mode = WALRecoveryMode::kTolerateCorruptedTailRecords;
}
One can see the actual values for both wal_recovery_mode and recycle_log_file_num by inspecting OSD log:022-02-14T11:58:58.846+0000 7f1fec229100 4 rocksdb: Options.wal_recovery_mode: 0
022-02-14T11:58:58.846+0000 7f1fec229100 4 rocksdb: Options.recycle_log_file_num: 4
And resetting the latter does the trick indeed. wal_recovery_mode becomes equal to 2 and that permits OSD to recover from the data corruption and proceed with the startup:
2022-02-14T22:18:39.763+0000 7f8173b22100 4 rocksdb: Options.recycle_log_file_num: 0
2022-02-14T22:18:39.763+0000 7f8173b22100 4 rocksdb: Options.wal_recovery_mode: 2
...
2022-02-14T22:18:40.199+0000 7f8173b22100 3 rocksdb: [db/db_impl_open.cc:518] /persistent/db/099134.log: dropping 1728 bytes; Corruption: missing start of fragmented record(2)
2022-02-14T22:18:40.199+0000 7f8173b22100 4 rocksdb: [db/db_impl_open.cc:810] Point in time recovered to log #99134 seq #5023211213
Updated by Igor Fedotov about 2 years ago
Interestingly RocksDB from Pacific onward deals with wal_recovery_mode and recycle_log_file_num inconsistency a bit differently. That's how it looks like in Quincy/master: if (result.recycle_log_file_num &&
(result.wal_recovery_mode
WALRecoveryMode::kTolerateCorruptedTailRecords ||
result.wal_recovery_mode WALRecoveryMode::kPointInTimeRecovery ||
result.wal_recovery_mode == WALRecoveryMode::kAbsoluteConsistency)) {
// - kTolerateCorruptedTailRecords is inconsistent with recycle log file
// feature. WAL recycling expects recovery success upon encountering a
// corrupt record at the point where new data ends and recycled data
// remains at the tail. However, `kTolerateCorruptedTailRecords` must fail
// upon encountering any such corrupt record, as it cannot differentiate
// between this and a real corruption, which would cause committed updates
// to be truncated -- a violation of the recovery guarantee.
// - kPointInTimeRecovery and kAbsoluteConsistency are incompatible with
// recycle log file feature temporarily due to a bug found introducing a
// hole in the recovered data
// (https://github.com/facebook/rocksdb/pull/7252#issuecomment-673766236).
// Besides this bug, we believe the features are fundamentally compatible.
result.recycle_log_file_num = 0;
}
So we implicitly getting recycle_log_file_num set to zero in fresh Ceph releases and this apparently confirms we can use the same approach in Octopus.
Updated by Igor Fedotov about 2 years ago
- Status changed from Triaged to Fix Under Review
Updated by Igor Fedotov about 2 years ago
just to note: just seen another cluster suffering with the issue and recoverable through the proposed approach
Updated by Yuri Weinstein almost 2 years ago
Updated by Igor Fedotov almost 2 years ago
- Status changed from Fix Under Review to Resolved
Updated by Igor Fedotov almost 2 years ago
- Status changed from Resolved to Rejected
Reverted by https://github.com/ceph/ceph/pull/47053