Actions
Bug #55328
closedOSD crashed due to checksum error
Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
OSD.14 crashed and produced the following logs.
2022-04-09T02:18:07Z {} debug -42> 2022-04-09T02:18:07.078+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x0: 0x0 reading 0x0~4000 2022-04-09T02:18:07Z {} debug -41> 2022-04-09T02:18:07.078+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio 2022-04-09T02:18:07Z {} debug -40> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum c rc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -39> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expecte d 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0# 2022-04-09T02:18:07Z {} debug -38> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214) 2022-04-09T02:18:07Z {} debug -37> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read 2022-04-09T02:18:07Z {} debug -36> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tr acker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[] 2022-04-09T02:18:07Z {} debug -35> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[] 2022-04-09T02:18:07Z {} debug -34> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0 x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -33> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x4000: 0x4000 reading 0x4000~1000 2022-04-09T02:18:07Z {} debug -32> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]} 2022-04-09T02:18:07Z {} debug -31> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x0: 0x0 reading 0x0~4000 2022-04-09T02:18:07Z {} debug -30> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio 2022-04-09T02:18:07Z {} debug -29> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum c rc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -28> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expecte d 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0# 2022-04-09T02:18:07Z {} debug -27> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214) 2022-04-09T02:18:07Z {} debug -26> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read 2022-04-09T02:18:07Z {} debug -25> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[] 2022-04-09T02:18:07Z {} debug -24> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[] 2022-04-09T02:18:07Z {} debug -23> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -22> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x4000: 0x4000 reading 0x4000~1000 2022-04-09T02:18:07Z {} debug -21> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]} 2022-04-09T02:18:07Z {} debug -20> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x0: 0x0 reading 0x0~4000 2022-04-09T02:18:07Z {} debug -19> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio 2022-04-09T02:18:07Z {} debug -18> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -17> 2022-04-09T02:18:07.103+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expected 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0# 2022-04-09T02:18:07Z {} debug -16> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read 0x0~433e size 0x433e (17214) 2022-04-09T02:18:07Z {} debug -15> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read will do buffered read 2022-04-09T02:18:07Z {} debug -14> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need 0x0~4000 cache has 0x[] 2022-04-09T02:18:07Z {} debug -13> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _read_cache blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x4000~33e cache has 0x[] 2022-04-09T02:18:07Z {} debug -12> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need {<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -11> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x4000: 0x4000 reading 0x4000~1000 2022-04-09T02:18:07Z {} debug -10> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc blob Blob(0x55f7eaf0df80 blob([0xa5b0000~4000] csum crc32c/0x1000) use_tracker(0x4*0x1000 0x[1000,1000,1000,1000]) SharedBlob(0x55f7de65de30 sbid 0x0)) need {<0x0, 0x4000> : [0x0:0~4000]} 2022-04-09T02:18:07Z {} debug -9> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _prepare_read_ioc region 0x0: 0x0 reading 0x0~4000 2022-04-09T02:18:07Z {} debug -8> 2022-04-09T02:18:07.103+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _do_read waiting for aio 2022-04-09T02:18:07Z {} debug -7> 2022-04-09T02:18:07.104+0000 7f488b599700 20 bluestore(/var/lib/ceph/osd/ceph-14) _generate_read_result_bl blob Blob(0x55f7eaa36d20 blob([!~4000,0xa5b4000~1000] csum crc32c/0x1000) use_tracker(0x5*0x1000 0x[0,0,0,0,33e]) SharedBlob(0x55f7de2f1b90 sbid 0x0)) need 0x{<0x4000, 0x1000> : [0x4000:4000~33e]} 2022-04-09T02:18:07Z {} debug -6> 2022-04-09T02:18:07.104+0000 7f488b599700 -1 bluestore(/var/lib/ceph/osd/ceph-14) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4000, got 0x24dc4dde, expected 0x4c69e4bd, device location [0xa5b4000~1000], logical extent 0x4000~1000, object #-1:a806e935:::osdmap.146:0# 2022-04-09T02:18:07Z {} debug -5> 2022-04-09T02:18:07.104+0000 7f488b599700 20 _unpin0x55f7dc238000 #-1:a806e935:::osdmap.146:0# unpinned 2022-04-09T02:18:07Z {} debug -4> 2022-04-09T02:18:07.104+0000 7f488b599700 10 bluestore(/var/lib/ceph/osd/ceph-14) read meta #-1:a806e935:::osdmap.146:0# 0x0~433e = -5 2022-04-09T02:18:07Z {} debug -3> 2022-04-09T02:18:07.108+0000 7f488b599700 -1 /root/project/src/ceph/src/osd/OSD.cc: In function 'void OSD::handle_osd_map(MOSDMap*)' thread 7f488b599700 time 2022-04-09T02:18:07.105274+0000 2022-04-09T02:18:07Z {} /root/project/src/ceph/src/osd/OSD.cc: 8061: FAILED ceph_assert(p != added_maps_bl.end()) 2022-04-09T02:18:07Z {} 2022-04-09T02:18:07Z {} ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable) 2022-04-09T02:18:07Z {} 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55f7d0e4ee61] 2022-04-09T02:18:07Z {} 2: ceph-osd(+0xac6069) [0x55f7d0e4f069] 2022-04-09T02:18:07Z {} 3: (OSD::handle_osd_map(MOSDMap*)+0x15e2) [0x55f7d0f2fe22] 2022-04-09T02:18:07Z {} 4: (OSD::_dispatch(Message*)+0x18b) [0x55f7d0f52ccb] 2022-04-09T02:18:07Z {} 5: (OSD::ms_dispatch(Message*)+0x84) [0x55f7d0f53014] 2022-04-09T02:18:07Z {} 6: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x55f7d19f8719] 2022-04-09T02:18:07Z {} 7: (DispatchQueue::entry()+0x58f) [0x55f7d19f73cf] 2022-04-09T02:18:07Z {} 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x55f7d1810e11] 2022-04-09T02:18:07Z {} 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f489d75c609] 2022-04-09T02:18:07Z {} 10: clone() 2022-04-09T02:18:07Z {} 2022-04-09T02:18:07Z {} debug -2> 2022-04-09T02:18:07.114+0000 7f488b599700 -1 *** Caught signal (Aborted) ** 2022-04-09T02:18:07Z {} in thread 7f488b599700 thread_name:ms_dispatch 2022-04-09T02:18:07Z {} 2022-04-09T02:18:07Z {} ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable) 2022-04-09T02:18:07Z {} 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f489d7683c0] 2022-04-09T02:18:07Z {} 2: gsignal() 2022-04-09T02:18:07Z {} 3: abort() 2022-04-09T02:18:07Z {} 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ad) [0x55f7d0e4eebc] 2022-04-09T02:18:07Z {} 5: ceph-osd(+0xac6069) [0x55f7d0e4f069] 2022-04-09T02:18:07Z {} 6: (OSD::handle_osd_map(MOSDMap*)+0x15e2) [0x55f7d0f2fe22] 2022-04-09T02:18:07Z {} 7: (OSD::_dispatch(Message*)+0x18b) [0x55f7d0f52ccb] 2022-04-09T02:18:07Z {} 8: (OSD::ms_dispatch(Message*)+0x84) [0x55f7d0f53014] 2022-04-09T02:18:07Z {} 9: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xb9) [0x55f7d19f8719] 2022-04-09T02:18:07Z {} 10: (DispatchQueue::entry()+0x58f) [0x55f7d19f73cf] 2022-04-09T02:18:07Z {} 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x55f7d1810e11] 2022-04-09T02:18:07Z {} 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f489d75c609] 2022-04-09T02:18:07Z {} 13: clone() 2022-04-09T02:18:07Z {} NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This OSD worked for about a day until it crashed, and no restart occurred.
We rebooted other node during the OSD was runnning, but we are not sure if it is related to this issue.
We collected the log of OSD.14. Because it is huge and cannot be attached to this ticket, we share the link to the log.
https://cybozu-my.sharepoint.com/:u:/g/personal/shayashi_cybozu_onmicrosoft_com/EfIMgrTgsdxPnvu9D2d9AcIBpKx_ssP0tZtvLfA6yLtl1w
Files
Actions