Project

General

Profile

Bug #23840

Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)

Added by Rafal Wadolowski 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
Start date:
04/24/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

We're using ceph 12.2.4, where db and wal are on seperate nvme.

After restart on some OSDs we see the following error:

-3> 2018-04-24 16:18:33.940657 7f95ee81de00 10 bluefs _replay 0xffffe000: txn(seq 1550851 len 0xa6a crc 0x8cdd70c6)
    -2> 2018-04-24 16:18:33.940663 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xfffff000~1000 from file(ino 1 size 0xfffff000 mtime 0.000000 bdev 0 extents 
    -1> 2018-04-24 16:18:33.940884 7f95ee81de00 10 bluefs _replay 0xfffff000: txn(seq 1550852 len 0xa6a crc 0x5e1c1b4f)
     0> 2018-04-24 16:18:33.943750 7f95ee81de00 -1 /build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool)' thread 7f95ee81de00 time 2018-04-24 16:18:33.940909
/build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: 551: FAILED assert((log_reader->buf.pos & ~super.block_mask()) == 0)

Right now we have some pgs inactive.

Full log in attachment.

osd.log View (19.1 KB) Rafal Wadolowski, 04/24/2018 04:33 PM

shutdown.log View (8.15 KB) Rafal Wadolowski, 04/24/2018 05:16 PM


Related issues

Copied to bluestore - Backport #23881: luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0) Resolved

History

#1 Updated by Rafal Wadolowski 11 months ago

What is interesting, it looks like the max read of file is limited to 0xffffffff.

   -11> 2018-04-24 16:18:33.939739 7f95ee81de00 10 bluefs _replay 0xffffa000: txn(seq 1550847 len 0xa6a crc 0x95281bbd)
   -10> 2018-04-24 16:18:33.939745 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffb000~1000 from file(ino 1 size 0xffffb000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
    -9> 2018-04-24 16:18:33.939970 7f95ee81de00 10 bluefs _replay 0xffffb000: txn(seq 1550848 len 0xa6a crc 0x86579e18)
    -8> 2018-04-24 16:18:33.939976 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffc000~1000 from file(ino 1 size 0xffffc000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
    -7> 2018-04-24 16:18:33.940196 7f95ee81de00 10 bluefs _replay 0xffffc000: txn(seq 1550849 len 0xa6a crc 0x61eaa8e2)
    -6> 2018-04-24 16:18:33.940203 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffd000~1000 from file(ino 1 size 0xffffd000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
    -5> 2018-04-24 16:18:33.940427 7f95ee81de00 10 bluefs _replay 0xffffd000: txn(seq 1550850 len 0xa6a crc 0xbcc8bda3)
    -4> 2018-04-24 16:18:33.940433 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xffffe000~1000 from file(ino 1 size 0xffffe000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
    -3> 2018-04-24 16:18:33.940657 7f95ee81de00 10 bluefs _replay 0xffffe000: txn(seq 1550851 len 0xa6a crc 0x8cdd70c6)
    -2> 2018-04-24 16:18:33.940663 7f95ee81de00 10 bluefs _read h 0x55a7876b6f00 0xfffff000~1000 from file(ino 1 size 0xfffff000 mtime 0.000000 bdev 0 extents [1:0x15300000+100000,0:0x3d800000+c00000,
    -1> 2018-04-24 16:18:33.940884 7f95ee81de00 10 bluefs _replay 0xfffff000: txn(seq 1550852 len 0xa6a crc 0x5e1c1b4f)
     0> 2018-04-24 16:18:33.943750 7f95ee81de00 -1 /build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool)' thread 7f95ee81de00 time 2018-04-24 16:18:33.940909
/build/ceph-12.2.4/src/os/bluestore/BlueFS.cc: 551: FAILED assert((log_reader->buf.pos & ~super.block_mask()) == 0)

In each line of bluefs _read, we see iterating address of log file, and it is breaking at 0xfffff000~1000.

#2 Updated by Rafal Wadolowski 11 months ago

Moment of planned shutdown

#3 Updated by Sage Weil 11 months ago

  • Status changed from New to Testing

#4 Updated by Sage Weil 11 months ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)
  • Priority changed from Normal to High
  • Backport set to luminous

#5 Updated by Rafal Wadolowski 11 months ago

This change is working, I think it could be merge with master.

#6 Updated by Kefu Chai 11 months ago

  • Status changed from Testing to Pending Backport

#7 Updated by Enming Zhang 11 months ago

I have met the same issue in Luminous.

#8 Updated by Nathan Cutler 11 months ago

  • Copied to Backport #23881: luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0) added

#9 Updated by Nathan Cutler 11 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF