Project

General

Profile

Actions

Bug #48036

closed

bluefs corrupted in a OSD

Added by Satoru Takeuchi over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
BlueStore
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I hit a problem that is very similar to the following issue/PR in v15.2.4.

upgrade/nautilus-x-master: bluefs mount failed to replay log: (14) Bad address during upgrade
https://tracker.ceph.com/issues/46886

os/bluestore: get rid of obsolete stuff in bluefs. #36745 (it's not backported to Octopus)
https://github.com/ceph/ceph/pull/36745

In addition, This PR is not backported to Octopus. If my problem is the same as issue#46886, please backport PR#36754 to Octopus.

Here is the detail.

"ceph-bluestore-tool bluefs-bdev-expand --path <osdDataPath>" failed with the following log (the full log is attached as "osd-9-bluefs.txt").

```
2020-10-28T02:07:55.499+0000 7f0aa7f1e0c0 -1 bluefs _check_new_allocations invalid extent 1: 0x7ae18a0000~10000: duplicate reference, ino 26
2020-10-28T02:07:55.500+0000 7f0aa7f1e0c0 -1 bluefs mount failed to replay log: (14) Bad address
2020-10-28T02:07:55.500+0000 7f0aa7f1e0c0 -1 bluestore(/var/lib/ceph/osd/ceph-9) _open_bluefs failed bluefs mount: (14) Bad address
```

The tail of the bluefs's journal log is as follows (the full log is attached as "bluefs-dump.zip").

```
...
2020-10-28T08:39:58.788+0000 7f79675c70c0 10 bluefs replay 0x263000: txn(seq 51924 len 0xc8 crc 0x4314d6b)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_unlink db/CURRENT
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_file_remove 25
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_link db/CURRENT to 30
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 30 = 0x56375176dad0
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_unlink db/000020.dbtmp
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_file_update file(ino 31 size 0x0 mtime 2020-10-27T23:10:42.718117+0000 allocated 0 extents [])
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 31 = 0x56375176c0d0 (new)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_link db.wal/000021.log to 31
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 31 = 0x56375176c0d0
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_unlink db.wal/000015.log
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_file_remove 26 ... (1)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_dir_unlink db/MANIFEST-000014
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x263000: op_file_remove 24
2020-10-28T08:39:58.788+0000 7f79675c70c0 10 bluefs _read h 0x563751752d80 0x264000~1000 from file(ino 1 size 0x264000 mtime 0.000000 allocated 410000 extents [1:0x7ae18b0000~10000,1:0x7ae1470000~400000])
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _read left 0xac000 len 0x1000
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _read result chunk (0x1000 bytes):
00000000 01 01 80 00 00 00 1c 67 9e 3f 31 cb 40 5b bc f4 |.......g.?1.@[..|
00000010 49 4f 77 68 e7 24 d5 ca 00 00 00 00 00 00 60 00 |IOwh.$........`.|
00000020 00 00 08 01 01 0f 00 00 00 20 00 f2 a8 98 5f 7c |......... ....
||
00000030 5b ec 36 00 00 00 00 00 04 02 00 00 00 64 62 14 |[.6..........db.|
00000040 00 00 00 4f 50 54 49 4f 4e 53 2d 30 30 30 30 32 |...OPTIONS-00002|
00000050 32 2e 64 62 74 6d 70 20 00 00 00 00 00 00 00 08 |2.dbtmp ........|
00000060 01 01 1c 00 00 00 20 d6 26 f2 a8 98 5f 30 83 f2 |...... .&..._0..|
00000070 3a 00 01 00 00 00 01 01 06 00 00 00 29 86 eb 01 |:...........)...|
00000080 43 01 29 e6 5d fa 00 00 00 00 00 00 00 00 00 00 |C.).]...........|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| *
00000ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001000

2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _read got 4096
2020-10-28T08:39:58.788+0000 7f79675c70c0 10 bluefs _replay 0x264000: txn(seq 51925 len 0x60 crc 0xfa5de629)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x264000: op_file_update file(ino 32 size 0x0 mtime 2020-10-27T23:10:42.921459+0000 allocated 0 extents [])
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 32 = 0x56375176cd00 (new)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x264000: op_dir_link db/OPTIONS-000022.dbtmp to 32
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 32 = 0x56375176cd00
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x264000: op_file_update file(ino 32 size 0x1356 mtime 2020-10-27T23:10:42.988971+0000 allocated 10000 extents [1:0x7ae18a0000~10000]) ... (2)
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 32 = 0x56375176cd00
2020-10-28T08:39:58.788+0000 7f79675c70c0 10 bluefs _read h 0x563751752d80 0x265000~1000 from file(ino 1 size 0x265000 mtime 0.000000 allocated 410000 extents [1:0x7ae18b0000~10000,1:0x7ae1470000~400000])
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _read left 0xab000 len 0x1000
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _read result chunk (0x1000 bytes):
00000000 01 01 82 00 00 00 1c 67 9e 3f 31 cb 40 5b bc f4 |.......g.?1.@[..|
00000010 49 4f 77 68 e7 24 d6 ca 00 00 00 00 00 00 62 00 |IOwh.$........b.|
00000020 00 00 04 02 00 00 00 64 62 0e 00 00 00 4f 50 54 |.......db....OPT|
00000030 49 4f 4e 53 2d 30 30 30 30 32 33 20 00 00 00 00 |IONS-000023 ....|
00000040 00 00 00 05 02 00 00 00 64 62 14 00 00 00 4f 50 |........db....OP|
00000050 54 49 4f 4e 53 2d 30 30 30 30 32 32 2e 64 62 74 |TIONS-000022.dbt|
00000060 6d 70 05 02 00 00 00 64 62 0e 00 00 00 4f 50 54 |mp.....db....OPT|
00000070 49 4f 4e 53 2d 30 30 30 30 31 34 09 16 00 00 00 |IONS-000014.....|
00000080 00 00 00 00 69 19 3e 68 00 00 00 00 00 00 00 00 |....i.>h........|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| *
00000ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001000

2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _read got 4096
2020-10-28T08:39:58.788+0000 7f79675c70c0 10 bluefs _replay 0x266000: txn(seq 51927 len 0x4b crc 0x9ae4c5e9)
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x266000: op_file_update file(ino 26 size 0x24e934 mtime 2020-10-23T08:29:44.981029+0000 allocated 250000 extents [1:0x7ae18a0000~10000,1:0x7ae18c0000~
10000,1:0x7ae1870000~20000,1:0x7ae1e10000~210000])
2020-10-28T08:39:58.788+0000 7f79675c70c0 30 bluefs _get_file ino 26 = 0x56375176cea0 (new)
2020-10-28T08:39:58.788+0000 7f79675c70c0 -1 bluefs _check_new_allocations invalid extent 1: 0x7ae18a0000~10000: duplicate reference, ino 26 ... (3)
2020-10-28T08:39:58.788+0000 7f79675c70c0 -1 bluefs mount failed to replay log: (14) Bad address
2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _stop_alloc
```

It looks this bluefs corrupted after the following steps.

(1) delete ino 26
(2) allocate "0x7ae18a0000~10000" to ino 32
(3) allocate the same extent to already-removed ino 26 and detect duplicate reference.

In addition, the mtime of the last op_file_update of ino 26 is "2020-10-23T08:29:44.981029+0000".

2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x266000: op_file_update file(ino 26 size 0x24e934 mtime 2020-10-23T08:29:44.981029+0000 allocated 250000 extents [1:0x7ae18a0000~10000,1:0x7ae18c0000~

However, it's far older than the previous ones.

2020-10-28T08:39:58.788+0000 7f79675c70c0 20 bluefs _replay 0x264000: op_file_update file(ino 32 size 0x1356 mtime 2020-10-27T23:10:42.988971+0000 allocated 10000 extents [1:0x7ae18a0000~10000])


Files

osd-9-bluefs.txt (6.86 KB) osd-9-bluefs.txt Satoru Takeuchi, 10/29/2020 05:23 AM
bluefs-dump.zip (81.1 KB) bluefs-dump.zip Satoru Takeuchi, 10/29/2020 05:57 AM
ceph-0.log (156 KB) ceph-0.log Satoru Takeuchi, 11/06/2020 11:06 AM
osd_tail.log.gz (179 KB) osd_tail.log.gz Satoru Takeuchi, 11/09/2020 06:09 AM
bluefs-log-dump.log.gz (193 KB) bluefs-log-dump.log.gz Satoru Takeuchi, 11/09/2020 06:10 AM
Actions

Also available in: Atom PDF