Project

General

Profile

Bug #49168

Bluefs improperly handles huge (>4GB) writes which causes data corruption

Added by Igor Fedotov 8 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus, mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Here is the symptomatic log snippet, please note the length(9136e44b) in _flush() call and offset/length in subsequent read one(0x19136e0f3~358):

2021-02-04T14:08:53.767+0100 7f79713e0100 10 bluefs open_for_write db/542258.sst
2021-02-04T14:08:53.767+0100 7f79713e0100 10 bluefs open_for_write h 0x5584358dc9a0 on file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
2021-02-04T14:09:29.863+0100 7f79713e0100 10 bluefs _flush 0x5584358dc9a0 0x0~9136e44b to file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
2021-02-04T14:09:29.863+0100 7f79713e0100 10 bluefs _flush_range 0x5584358dc9a0 pos 0x0 0x0~9136e44b to file(ino 434330 size 0x0 mtime 2021-02-04T14:08:53.772038+0100 allocated 0 extents [])
...
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs open_for_read db/542258.sst (random)
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs open_for_read h 0x558435cef800 on file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000])
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs _read h 0x558435cef800 0x19136e0f3~358 from file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000]) prefetch
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs _read_random h 0x558435cef800 0x19136e416~35 from file(ino 434330 size 0x9136e44b mtime 2021-02-04T14:09:29.867004+0100 allocated 91370000 extents [1:0x6e1dec0000~8780000,1:0x72e1280000~86c0000,1:0x6ee5e50000~4d50000,1:0x6b34bf0000~4d10000,1:0x70eff30000~45b0000,1:0x72789d0000~4500000,1:0x7344210000~4500000,1:0x7332e00000~4490000,1:0x6b595b0000~4480000,1:0x71e4f40000~4470000,1:0x7205860000~4400000,1:0x70e76f0000~43f0000,1:0x717c1c0000~43e0000,1:0x73c6860000~43d0000,1:0x734cb80000~43b0000,1:0x70df050000~4380000,1:0x6c7f390000~4340000,1:0x6e2aa20000~4300000,1:0x4e74570000~3800000,1:0x6c4cfb0000~37f0000,1:0x72f44f0000~3750000,1:0x51379e0000~3730000,1:0x71ad660000~3730000,1:0x4af4640000~3720000,1:0x4d95920000~35e0000,1:0x6c87a40000~3400000,1:0x4edaf70000~3350000,1:0x721bba0000~32d0000,1:0x6bf42d0000~30f0000,1:0x6c622b0000~2fb0000,1:0x6cafea0000~2f50000,1:0x4c92b50000~2b00000,1:0x7095b80000~2850000,1:0x6f3f790000~26d0000,1:0x4d87380000~26a0000,1:0x504f930000~24c0000,1:0x510f760000~23c0000,1:0x4e18d80000~1aa0000])
2021-02-04T14:09:35.311+0100 7f79713e0100 10 bluefs unlink db/542258.sst
2021-02-04T14:09:35.315+0100 7f79713e0100 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
2021-02-04T14:09:35.343+0100 7f79713e0100 10 bluefs unlock_file 0x55843199ce40 on file(ino 2 size 0x0 mtime 2019-08-08T16:51:54.229935+0200 allocated 0 extents [])
2021-02-04T14:09:35.343+0100 7f79713e0100 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
2021-02-04T14:09:35.343+0100 7f79713e0100 -1 rocksdb: Corruption: file is too short (6731261003 bytes) to be an sstabledb/542258.sst


Related issues

Related to bluestore - Bug #49170: BlueFS might end-up with huge WAL files when upgrading OMAPs Pending Backport
Copied to bluestore - Backport #49477: mimic: Bluefs improperly handles huge (>4GB) writes which causes data corruption Rejected
Copied to bluestore - Backport #49478: luminous: Bluefs improperly handles huge (>4GB) writes which causes data corruption Rejected
Copied to bluestore - Backport #49479: pacific: Bluefs improperly handles huge (>4GB) writes which causes data corruption Resolved
Copied to bluestore - Backport #49480: nautilus: Bluefs improperly handles huge (>4GB) writes which causes data corruption Resolved
Copied to bluestore - Backport #49481: octopus: Bluefs improperly handles huge (>4GB) writes which causes data corruption Resolved

History

#1 Updated by Igor Fedotov 8 months ago

The above looks like 32-bit value overflow and indeed BlueFS::_flush() uses FileWriter::get_buffer_length() which is a wrapper over bufferlist::length().
The latter uses 'unsigned' to track its length....
Hence data is broken if RocksDB issues a write/flush with length > 4GB (which unfortunately might take place, going to fire another ticket for that...)

#2 Updated by Igor Fedotov 8 months ago

  • Subject changed from Bluefs improperly handles huge (>4GB) writes to Bluefs improperly handles huge (>4GB) writes which causes data corruption

#3 Updated by Igor Fedotov 8 months ago

  • Related to Bug #49170: BlueFS might end-up with huge WAL files when upgrading OMAPs added

#4 Updated by Igor Fedotov 8 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 39320

#5 Updated by Igor Fedotov 8 months ago

  • Backport set to pacific, octopus, nautilus, mimic, luminous

#6 Updated by Igor Fedotov 8 months ago

  • Priority changed from Normal to Urgent

#7 Updated by Kefu Chai 7 months ago

  • Status changed from In Progress to Pending Backport

#8 Updated by Backport Bot 7 months ago

  • Copied to Backport #49477: mimic: Bluefs improperly handles huge (>4GB) writes which causes data corruption added

#9 Updated by Backport Bot 7 months ago

  • Copied to Backport #49478: luminous: Bluefs improperly handles huge (>4GB) writes which causes data corruption added

#10 Updated by Backport Bot 7 months ago

  • Copied to Backport #49479: pacific: Bluefs improperly handles huge (>4GB) writes which causes data corruption added

#11 Updated by Backport Bot 7 months ago

  • Copied to Backport #49480: nautilus: Bluefs improperly handles huge (>4GB) writes which causes data corruption added

#12 Updated by Backport Bot 7 months ago

  • Copied to Backport #49481: octopus: Bluefs improperly handles huge (>4GB) writes which causes data corruption added

#13 Updated by Igor Fedotov 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF