Project

General

Profile

Actions

Bug #55328

closed

OSD crashed due to checksum error

Added by Shinya Hayashi about 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD.14 crashed and produced the following logs.

2022-04-09T02:18:07Z {} debug    -42> 2022-04-09T02:18:07.078+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x0: 0x0 reading 0x0~4000
2022-04-09T02:18:07Z {} debug    -41> 2022-04-09T02:18:07.078+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio
2022-04-09T02:18:07Z {} debug    -40> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum c
rc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -39> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expecte
d 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0#
2022-04-09T02:18:07Z {} debug    -38> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214)
2022-04-09T02:18:07Z {} debug    -37> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read
2022-04-09T02:18:07Z {} debug    -36> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tr
acker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[]
2022-04-09T02:18:07Z {} debug    -35> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000)
 use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[]
2022-04-09T02:18:07Z {} debug    -34> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0
x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -33> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x4000: 0x4000 reading 0x4000~1000
2022-04-09T02:18:07Z {} debug    -32> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000)
use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]}
2022-04-09T02:18:07Z {} debug    -31> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x0: 0x0 reading 0x0~4000
2022-04-09T02:18:07Z {} debug    -30> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio
2022-04-09T02:18:07Z {} debug    -29> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum c
rc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -28> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expecte
d 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0#
2022-04-09T02:18:07Z {} debug    -27> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214)
2022-04-09T02:18:07Z {} debug    -26> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read
2022-04-09T02:18:07Z {} debug    -25> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[]
2022-04-09T02:18:07Z {} debug    -24> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[]
2022-04-09T02:18:07Z {} debug    -23> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -22> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x4000: 0x4000 reading 0x4000~1000
2022-04-09T02:18:07Z {} debug    -21> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]}
2022-04-09T02:18:07Z {} debug    -20> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x0: 0x0 reading 0x0~4000
2022-04-09T02:18:07Z {} debug    -19> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio
2022-04-09T02:18:07Z {} debug    -18> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -17> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expected 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0#
2022-04-09T02:18:07Z {} debug    -16> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214)
2022-04-09T02:18:07Z {} debug    -15> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read
2022-04-09T02:18:07Z {} debug    -14> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[]
2022-04-09T02:18:07Z {} debug    -13> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[]
2022-04-09T02:18:07Z {} debug    -12> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug    -11> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x4000: 0x4000 reading 0x4000~1000
2022-04-09T02:18:07Z {} debug    -10> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc  blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]}
2022-04-09T02:18:07Z {} debug     -9> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc    region 0x0: 0x0 reading 0x0~4000
2022-04-09T02:18:07Z {} debug     -8> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio
2022-04-09T02:18:07Z {} debug     -7> 2022-04-09T02:18:07.104+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl  blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]}
2022-04-09T02:18:07Z {} debug     -6> 2022-04-09T02:18:07.104+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expected 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0#
2022-04-09T02:18:07Z {} debug     -5> 2022-04-09T02:18:07.104+0000 7f488b599700 20 _unpin0x55f7dc238000   #-1:a806e935:::osdmap.146:0# unpinned
2022-04-09T02:18:07Z {} debug     -4> 2022-04-09T02:18:07.104+0000 7f488b599700 10 bluestore(/var/lib/ceph/osd/ceph-14) read meta #-1:a806e935:::osdmap.146:0# 0x0~433e = -5
2022-04-09T02:18:07Z {} debug     -3> 2022-04-09T02:18:07.108+0000 7f488b599700 -1 /root/project/src/ceph/src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 7f488b599700 time 2022-04-09T02:18:07.105274+0000
2022-04-09T02:18:07Z {} /root/project/src/ceph/src/osd/OSD.cc: 8061: FAILED ceph_assert(p != added_maps_bl.end())
2022-04-09T02:18:07Z {}
2022-04-09T02:18:07Z {} ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
2022-04-09T02:18:07Z {} 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55f7d0e4ee61]
2022-04-09T02:18:07Z {} 2: ceph-osd(+0xac6069) [0x55f7d0e4f069]
2022-04-09T02:18:07Z {} 3: (OSD::handle_osd_map(MOSDMap*)+0x15e2) [0x55f7d0f2fe22]
2022-04-09T02:18:07Z {} 4: (OSD::_dispatch(Message*)+0x18b) [0x55f7d0f52ccb]
2022-04-09T02:18:07Z {} 5: (OSD::ms_dispatch(Message*)+0x84) [0x55f7d0f53014]
2022-04-09T02:18:07Z {} 6: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x55f7d19f8719]
2022-04-09T02:18:07Z {} 7: (DispatchQueue::entry()+0x58f) [0x55f7d19f73cf]
2022-04-09T02:18:07Z {} 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x55f7d1810e11]
2022-04-09T02:18:07Z {} 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f489d75c609]
2022-04-09T02:18:07Z {} 10: clone()
2022-04-09T02:18:07Z {}
2022-04-09T02:18:07Z {} debug     -2> 2022-04-09T02:18:07.114+0000 7f488b599700 -1 *** Caught signal (Aborted) **
2022-04-09T02:18:07Z {} in thread 7f488b599700 thread_name:ms_dispatch
2022-04-09T02:18:07Z {}
2022-04-09T02:18:07Z {} ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
2022-04-09T02:18:07Z {} 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f489d7683c0]
2022-04-09T02:18:07Z {} 2: gsignal()
2022-04-09T02:18:07Z {} 3: abort()
2022-04-09T02:18:07Z {} 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ad) [0x55f7d0e4eebc]
2022-04-09T02:18:07Z {} 5: ceph-osd(+0xac6069) [0x55f7d0e4f069]
2022-04-09T02:18:07Z {} 6: (OSD::handle_osd_map(MOSDMap*)+0x15e2) [0x55f7d0f2fe22]
2022-04-09T02:18:07Z {} 7: (OSD::_dispatch(Message*)+0x18b) [0x55f7d0f52ccb]
2022-04-09T02:18:07Z {} 8: (OSD::ms_dispatch(Message*)+0x84) [0x55f7d0f53014]
2022-04-09T02:18:07Z {} 9: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x55f7d19f8719]
2022-04-09T02:18:07Z {} 10: (DispatchQueue::entry()+0x58f) [0x55f7d19f73cf]
2022-04-09T02:18:07Z {} 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x55f7d1810e11]
2022-04-09T02:18:07Z {} 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f489d75c609]
2022-04-09T02:18:07Z {} 13: clone()
2022-04-09T02:18:07Z {} NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This OSD worked for about a day until it crashed, and no restart occurred.

We rebooted other node during the OSD was runnning, but we are not sure if it is related to this issue.

We collected the log of OSD.14. Because it is huge and cannot be attached to this ticket, we share the link to the log.
https://cybozu-my.sharepoint.com/:u:/g/personal/shayashi_cybozu_onmicrosoft_com/EfIMgrTgsdxPnvu9D2d9AcIBpKx_ssP0tZtvLfA6yLtl1w


Files

dd.bin (1000 KB) dd.bin Shinya Hayashi, 05/16/2022 02:42 AM
dd_to_end.zip (383 KB) dd_to_end.zip Shinya Hayashi, 05/17/2022 01:25 AM
Actions

Also available in: Atom PDF