Project

General

Profile

Bug #54409

OSD fails to start up with "rocksdb: Corruption: Bad table magic number" error

Added by Igor Fedotov 9 months ago. Updated 9 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-02-21T13:53:43.367+0100 7fc9582f4700 10 bluefs open_for_read h 0x55bd4301f000 on file(ino 1419 size 0x73c mtime 2022-02-21T13:40:05.150376+0100 allocated 10000 extents [1:0x630000~10000])
...
2022-02-21T13:53:43.379+0100 7fc9582f4700 10 bluefs _read_random h 0x55bd4301f000 0x707~35 from file(ino 1419 size 0x73c mtime 2022-02-21T13:40:05.150376+0100 allocated 10000 extents [1:0x630000~10000])
...
2022-02-21T13:53:43.419+0100 7fc9645f4f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/001426.sst


Related issues

Duplicates bluestore - Bug #54547: Deferred writes might cause "rocksdb: Corruption: Bad table magic number" Resolved

History

#1 Updated by Igor Fedotov 9 months ago

From the following log snippet it looks like SST file content (LBA: 0x630000) got overwritten by presumably deferred write resubmission happened after OSD restart:

1) Bunch of user content's writes including LBA in question followed by OSD shutdown
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x630000~d000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x630000~d000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x630000~d000 aio 0x56075aed4e10
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7d0000~d000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x7d0000~d000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7d0000~d000 aio 0x56075aed5c10
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_submit ioc 0x56075ae8fbd8 pending 2 running 0
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x4c5000~8000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x4c5000~8000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x4c5000~8000 aio 0x56075aed5d10
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7dd000~1000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x7dd000~1000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7dd000~1000 aio 0x56075cfff010
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_submit ioc 0x56075ae8fa98 pending 2 running 0
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x960000~c000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x960000~c000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x960000~c000 aio 0x56075cffe310
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_submit ioc 0x56075ae8f958 pending 1 running 0
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7de000~1000 (direct)
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x7de000~1000
2022-02-21T13:29:47.550+0100 7fb91422f700 5 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7de000~1000 aio 0x56075cfff910
2022-02-21T13:29:47.550+0100 7fb91422f700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) aio_submit ioc 0x56075ae8f818 pending 1 running 0
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x630000~d000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075aed4e10 r 53248 ioc 0x56075ae8fbd8 with 1 aios left
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x7d0000~d000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075aed5c10 r 53248 ioc 0x56075ae8fbd8 with 0 aios left
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x4c5000~8000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075aed5d10 r 32768 ioc 0x56075ae8fa98 with 1 aios left
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x7dd000~1000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075cfff010 r 4096 ioc 0x56075ae8fa98 with 0 aios left
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x960000~c000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075cffe310 r 49152 ioc 0x56075ae8f958 with 0 aios left
2022-02-21T13:29:47.550+0100 7fb918a47700 20 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x7de000~1000
2022-02-21T13:29:47.550+0100 7fb918a47700 10 bdev(0x56072a691400 /var/lib/ceph/osd/ceph-7/block) _aio_thread finished aio 0x56075cfff910 r 4096 ioc 0x56075ae8f818 with 0 aios left
2022-02-21T13:30:12.066+0100 7fb91d4ea700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2022-02-21T13:30:12.066+0100 7fb91d4ea700 -1 osd.7 4686 * Got signal Terminated
2022-02-21T13:30:12.066+0100 7fb91d4ea700 -1 osd.7 4686
Immediate shutdown (osd_fast_shutdown=true) *

2) OSD startup and SST file write+read at LBA=0x630000, length 0x1000
2022-02-21T13:39:37.711+0100 7f6fac289f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1975
...
2022-02-21T13:40:05.148+0100 7f6fac289f00 10 bluefs open_for_write db/001426.sst
2022-02-21T13:40:05.148+0100 7f6fac289f00 20 bluefs open_for_write mapping db/001426.sst vsel_hint 0x1
2022-02-21T13:40:05.148+0100 7f6fac289f00 10 bluefs open_for_write h 0x55f087cfcf00 on file(ino 1419 size 0x0 mtime 2022-02-21T13:40:05.149695+0100 allocated 0 extents [])
...
2022-02-21T13:40:05.148+0100 7f6fac289f00 20 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) aio_write 0x630000~1000 (buffered)
2022-02-21T13:40:05.148+0100 7f6fac289f00 20 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x630000~1000
2022-02-21T13:40:05.148+0100 7f6fac289f00 5 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) _sync_write 0x630000~1000 (buffered)
2022-02-21T13:40:05.148+0100 7f6fac289f00 20 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0x630000~1000
...
2022-02-21T13:40:05.192+0100 7f6fac289f00 10 bluefs open_for_read h 0x55f088e0b000 on file(ino 1419 size 0x73c mtime 2022-02-21T13:40:05.150376+0100 allocated 10000 extents [1:0x630000~10000])
2022-02-21T13:40:05.192+0100 7f6fac289f00 5 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) read 0x630000~1000 (buffered)
...

3) The same bunch of user content(?) writes, presumably deferred writes resubmission. LBA=0x630000 is updated again

2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x4c5000~8000 (direct)
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x4c5000~8000
2022-02-21T13:40:05.540+0100 7f6fac289f00 5 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x4c5000~8000 aio 0x55f0afc31f10
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x630000~d000 (direct)
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x630000~d000
2022-02-21T13:40:05.540+0100 7f6fac289f00 5 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x630000~d000 aio 0x55f0afc31e10
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7d0000~f000 (direct)
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x7d0000~f000
2022-02-21T13:40:05.540+0100 7f6fac289f00 5 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x7d0000~f000 aio 0x55f0afc31d10
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x960000~c000 (direct)
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write rebuilding buffer to be aligned
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) _aio_log_start 0x960000~c000
2022-02-21T13:40:05.540+0100 7f6fac289f00 5 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_write 0x960000~c000 aio 0x55f0a7079010
2022-02-21T13:40:05.540+0100 7f6fac289f00 20 bdev(0x55f088a27400 /var/lib/ceph/osd/ceph-7/block) aio_submit ioc 0x55f08a1ca918 pending 4 running 0
2022-02-21T13:40:05.544+0100 7f6fa1f8d700 20 bdev(0x55f088a27800 /var/lib/ceph/osd/ceph-7/block) _aio_log_finish 1 0xe40000~31000

4) Another OSD restart followed by broken SST file reading

2022-02-21T13:49:28.452+0100 7f6fa8a34700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2022-02-21T13:49:28.452+0100 7f6fa8a34700 -1 osd.7 4711 * Got signal Terminated
2022-02-21T13:49:28.452+0100 7f6fa8a34700 -1 osd.7 4711
Immediate shutdown (osd_fast_shutdown=true) *
2022-02-21T13:53:40.455+0100 7fc9645f4f00 0 set uid:gid to 64045:64045 (ceph:ceph)
...
2022-02-21T13:53:40.455+0100 7fc9645f4f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1967
...
2022-02-21T13:53:43.367+0100 7fc9582f4700 10 bluefs open_for_read h 0x55bd4301f000 on file(ino 1419 size 0x73c mtime 2022-02-21T13:40:05.150376+0100 allocated 10000 extents [1:0x630000~10000])
2022-02-21T13:53:43.379+0100 7fc9582f4700 10 bluefs _read_random h 0x55bd4301f000 0x707~35 from file(ino 1419 size 0x73c mtime 2022-02-21T13:40:05.150376+0100 allocated 10000 extents [1:0x630000~10000])
..
2022-02-21T13:53:43.419+0100 7fc9645f4f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/001426.sst

#2 Updated by Neha Ojha 9 months ago

  • Affected Versions v16.2.6 added

#3 Updated by Adam Kupczyk 9 months ago

  • Status changed from New to Need More Info
  • Affected Versions deleted (v16.2.6)

#4 Updated by Igor Fedotov 9 months ago

  • Duplicated by Bug #54547: Deferred writes might cause "rocksdb: Corruption: Bad table magic number" added

#5 Updated by Igor Fedotov 9 months ago

  • Status changed from Need More Info to Duplicate

#6 Updated by Igor Fedotov 9 months ago

  • Duplicated by deleted (Bug #54547: Deferred writes might cause "rocksdb: Corruption: Bad table magic number")

#7 Updated by Igor Fedotov 9 months ago

  • Status changed from Duplicate to New

#8 Updated by Igor Fedotov 9 months ago

  • Status changed from New to Triaged

#9 Updated by Igor Fedotov 9 months ago

  • Duplicates Bug #54547: Deferred writes might cause "rocksdb: Corruption: Bad table magic number" added

#10 Updated by Igor Fedotov 9 months ago

  • Status changed from Triaged to Duplicate

Also available in: Atom PDF