Bug #49168
closedBluefs improperly handles huge (>4GB) writes which causes data corruption
0%
Description
Here is the symptomatic log snippet, please note the length(9136e44b) in _flush() call and offset/length in subsequent read one(0x19136e0f3~358):
2021-02-04T14:08:53.767+0100 7f79713e0100 10 bluefs open_for_write db/542258.sst
2021-02-04T14:08:53.767+0100 7f79713e0100 10 bluefs open_for_write h 0x5584358dc9a0 on file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
2021-02-04T14:09:29.863+0100 7f79713e0100 10 bluefs _flush 0x5584358dc9a0 0x0~9136e44b to file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
2021-02-04T14:09:29.863+0100 7f79713e0100 10 bluefs _flush_range 0x5584358dc9a0 pos 0x0 0x0~9136e44b to file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
...
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs open_for_read db/542258.sst (random)
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs open_for_read h 0x558435cef800 on file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000])
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs _read h 0x558435cef800 0x19136e0f3~358 from file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000]) prefetch
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs _read_random h 0x558435cef800 0x19136e416~35 from file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000])
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs unlink db/542258.sst
2021-02-04T14:09:35.315+0100 7f79713e0100 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
2021-02-04T14:09:35.343+0100 7f79713e0100 10 bluefs unlock_file 0x55843199ce40 on file(ino 2 size 0x0 mtime 2019-08-08T16:51:54.229935+0200 allocated 0 extents [])
2021-02-04T14:09:35.343+0100 7f79713e0100 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
2021-02-04T14:09:35.343+0100 7f79713e0100 -1 rocksdb: Corruption: file is too short (6731261003 bytes) to be an sstabledb/542258.sst
Updated by Igor Fedotov about 3 years ago
The above looks like 32-bit value overflow and indeed BlueFS::_flush() uses FileWriter::get_buffer_length() which is a wrapper over bufferlist::length().
The latter uses 'unsigned' to track its length....
Hence data is broken if RocksDB issues a write/flush with length > 4GB (which unfortunately might take place, going to fire another ticket for that...)
Updated by Igor Fedotov about 3 years ago
- Subject changed from Bluefs improperly handles huge (>4GB) writes to Bluefs improperly handles huge (>4GB) writes which causes data corruption
Updated by Igor Fedotov about 3 years ago
- Related to Bug #49170: BlueFS might end-up with huge WAL files when upgrading OMAPs added
Updated by Igor Fedotov about 3 years ago
- Status changed from New to In Progress
- Pull request ID set to 39320
Updated by Igor Fedotov about 3 years ago
- Backport set to pacific, octopus, nautilus, mimic, luminous
Updated by Igor Fedotov about 3 years ago
- Priority changed from Normal to Urgent
Updated by Kefu Chai about 3 years ago
- Status changed from In Progress to Pending Backport
Updated by Backport Bot about 3 years ago
- Copied to Backport #49477: mimic: Bluefs improperly handles huge (>4GB) writes which causes data corruption added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49478: luminous: Bluefs improperly handles huge (>4GB) writes which causes data corruption added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49479: pacific: Bluefs improperly handles huge (>4GB) writes which causes data corruption added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49480: nautilus: Bluefs improperly handles huge (>4GB) writes which causes data corruption added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49481: octopus: Bluefs improperly handles huge (>4GB) writes which causes data corruption added
Updated by Igor Fedotov almost 3 years ago
- Status changed from Pending Backport to Resolved