Actions
Bug #23840
closedBluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We're using ceph 12.2.4, where db and wal are on seperate nvme.
After restart on some OSDs we see the following error:
-3> 2018-04-24 16:18:33.940657 7f95ee81de00 10 bluefs _replay 0xffffe000: txn(seq 1550851 len 0xa6a crc 0x8cdd70c6)
-2> 2018-04-24 16:18:33.940663 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xfffff000~1000 from file(ino 1 size 0xfffff000 mtime 0.000000 bdev 0 extents
-1> 2018-04-24 16:18:33.940884 7f95ee81de00 10 bluefs _replay 0xfffff000: txn(seq 1550852 len 0xa6a crc 0x5e1c1b4f)
0> 2018-04-24 16:18:33.943750 7f95ee81de00 -1 /build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool)' thread 7f95ee81de00 time 2018-04-24 16:18:33.940909
/build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: 551: FAILED assert((log_reader->buf.pos & ~super.block_mask()) == 0)
Right now we have some pgs inactive.
Full log in attachment.
Files
Updated by Rafal Wadolowski almost 6 years ago
What is interesting, it looks like the max read of file is limited to 0xffffffff.
-11> 2018-04-24 16:18:33.939739 7f95ee81de00 10 bluefs _replay 0xffffa000: txn(seq 1550847 len 0xa6a crc 0x95281bbd)
-10> 2018-04-24 16:18:33.939745 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffb000~1000 from file(ino 1 size 0xffffb000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
-9> 2018-04-24 16:18:33.939970 7f95ee81de00 10 bluefs _replay 0xffffb000: txn(seq 1550848 len 0xa6a crc 0x86579e18)
-8> 2018-04-24 16:18:33.939976 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffc000~1000 from file(ino 1 size 0xffffc000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
-7> 2018-04-24 16:18:33.940196 7f95ee81de00 10 bluefs _replay 0xffffc000: txn(seq 1550849 len 0xa6a crc 0x61eaa8e2)
-6> 2018-04-24 16:18:33.940203 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffd000~1000 from file(ino 1 size 0xffffd000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
-5> 2018-04-24 16:18:33.940427 7f95ee81de00 10 bluefs _replay 0xffffd000: txn(seq 1550850 len 0xa6a crc 0xbcc8bda3)
-4> 2018-04-24 16:18:33.940433 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffe000~1000 from file(ino 1 size 0xffffe000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
-3> 2018-04-24 16:18:33.940657 7f95ee81de00 10 bluefs _replay 0xffffe000: txn(seq 1550851 len 0xa6a crc 0x8cdd70c6)
-2> 2018-04-24 16:18:33.940663 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xfffff000~1000 from file(ino 1 size 0xfffff000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
-1> 2018-04-24 16:18:33.940884 7f95ee81de00 10 bluefs _replay 0xfffff000: txn(seq 1550852 len 0xa6a crc 0x5e1c1b4f)
0> 2018-04-24 16:18:33.943750 7f95ee81de00 -1 /build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool)' thread 7f95ee81de00 time 2018-04-24 16:18:33.940909
/build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: 551: FAILED assert((log_reader->buf.pos & ~super.block_mask()) == 0)
In each line of bluefs _read, we see iterating address of log file, and it is breaking at 0xfffff000~1000.
Updated by Rafal Wadolowski almost 6 years ago
- File shutdown.log shutdown.log added
Moment of planned shutdown
Updated by Sage Weil almost 6 years ago
- Status changed from New to 7
Updated by Sage Weil almost 6 years ago
- Project changed from Ceph to bluestore
- Category deleted (
OSD) - Priority changed from Normal to High
- Backport set to luminous
Updated by Rafal Wadolowski almost 6 years ago
This change is working, I think it could be merge with master.
Updated by Kefu Chai almost 6 years ago
- Status changed from 7 to Pending Backport
Updated by Enming Zhang almost 6 years ago
I have met the same issue in Luminous.
Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #23881: luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0) added
Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved
Actions