Project

General

Profile

Actions

Bug #47453

open

checksum failures lead to assert on OSD shutdown in lab tests

Added by Greg Farnum over 3 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-09-14T11:50:01.150 INFO:tasks.ceph.osd.0.smithi186.stderr:2020-09-14T11:50:01.151+0000 7f68f74b6700 -1 received  signal: Hangup from /usr/bin/python3 /usr/bin/daemon-helper kill ceph-osd -f --cluster ceph -i 0  (PID: 13568) UID: 0
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr:/build/ceph-16.0.0-5071-gd026253/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fd8452fdb80 time 2020-09-14T11:50:01.157742+0000
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr:/build/ceph-16.0.0-5071-gd026253/src/kv/RocksDBStore.cc: 1616: ceph_abort_msg("block checksum mismatch: expected 4204593633, got 976185286  in db/000073.sst offset 155593 size 4231")
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: ceph version 16.0.0-5071-gd026253 (d02625331c4e06ca213d9720d98137d83a87cb90) pacific (dev)
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7fd83b35ba27]
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: 2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3ec) [0x563d20f485ec]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 3: (()+0xdd3fb9) [0x563d20da7fb9]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 4: (()+0xdbf8c1) [0x563d20d938c1]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 5: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x27b) [0x563d20daeaeb]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 6: (BlueStore::fsck_check_objects_shallow(BlueStore::FSCKDepth, long, boost::intrusive_ptr<BlueStore::Collection>, ghobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list const&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> >, mempool::pool_allocator<(mempool::pool_index_t)11, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::map<boost::intrusive_ptr<BlueStore::Blob>, unsigned short, std::less<boost::intrusive_ptr<BlueStore::Blob> >, std::allocator<std::pair<boost::intrusive_ptr<BlueStore::Blob> const, unsigned short> > >*, BlueStore::FSCK_ObjectCtx const&)+0x36b) [0x563d20e07ceb]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 7: (BlueStore::_fsck_check_objects(BlueStore::FSCKDepth, BlueStore::FSCK_ObjectCtx&)+0x18a2) [0x563d20e0bac2]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 8: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x1626) [0x563d20e0ffc6]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 9: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c2) [0x563d20e29252]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 10: (BlueStore::_mount(bool, bool)+0x504) [0x563d20e29da4]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 11: (main()+0x2cb1) [0x563d2084c231]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 12: (__libc_start_main()+0xe7) [0x7fd839af6b97]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 13: (_start()+0x2a) [0x563d2086109a]

Showed up twice on two different machines:
https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433145
https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433164

Actions #1

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Need More Info

These jobs don't have any logs.

Actions #2

Updated by Neha Ojha over 3 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #3

Updated by Neha Ojha about 3 years ago

  • Status changed from Can't reproduce to New
  • Backport set to pacific
2021-04-07T06:57:31.293 INFO:teuthology.orchestra.run.smithi078.stderr:/build/ceph-16.2.0-59-g93436ead/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fc2ac5d8d00 time 2021-04-07T06:57:31.288497+0000
2021-04-07T06:57:31.294 INFO:teuthology.orchestra.run.smithi078.stderr:/build/ceph-16.2.0-59-g93436ead/src/kv/RocksDBStore.cc: 1840: ceph_abort_msg("block checksum mismatch: expected 3306748615, got 1891494403  in db/000039.sst offset 155438 size 4078")
2021-04-07T06:57:31.294 INFO:teuthology.orchestra.run.smithi078.stderr: ceph version 16.2.0-59-g93436ead (93436ead83e199c64855708dad0c6fce75d6f04a) pacific (stable)
2021-04-07T06:57:31.294 INFO:teuthology.orchestra.run.smithi078.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7fc2a25424db]
2021-04-07T06:57:31.295 INFO:teuthology.orchestra.run.smithi078.stderr: 2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3ec) [0x558c926a3e3c]
2021-04-07T06:57:31.295 INFO:teuthology.orchestra.run.smithi078.stderr: 3: ceph-objectstore-tool(+0xe33aa9) [0x558c92501aa9]
2021-04-07T06:57:31.295 INFO:teuthology.orchestra.run.smithi078.stderr: 4: ceph-objectstore-tool(+0xe1dae1) [0x558c924ebae1]
2021-04-07T06:57:31.295 INFO:teuthology.orchestra.run.smithi078.stderr: 5: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x27b) [0x558c9250855b]
2021-04-07T06:57:31.296 INFO:teuthology.orchestra.run.smithi078.stderr: 6: (BlueStore::fsck_check_objects_shallow(BlueStore::FSCKDepth, long, boost::intrusive_ptr<BlueStore::Collection>, ghobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list const&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mempool::pool_allocator<(mempool::pool_index_t)11, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::map<boost::intrusive_ptr<BlueStore::Blob>, unsigned short, std::less<boost::intrusive_ptr<BlueStore::Blob> >, std::allocator<std::pair<boost::intrusive_ptr<BlueStore::Blob> const, unsigned short> > >*, BlueStore::FSCK_ObjectCtx const&)+0x360) [0x558c92564080]
2021-04-07T06:57:31.296 INFO:teuthology.orchestra.run.smithi078.stderr: 7: (BlueStore::_fsck_check_objects(BlueStore::FSCKDepth, BlueStore::FSCK_ObjectCtx&)+0x18a2) [0x558c925682a2]
2021-04-07T06:57:31.297 INFO:teuthology.orchestra.run.smithi078.stderr: 8: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x1626) [0x558c9256c706]
2021-04-07T06:57:31.297 INFO:teuthology.orchestra.run.smithi078.stderr: 9: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x259) [0x558c92585a59]
2021-04-07T06:57:31.297 INFO:teuthology.orchestra.run.smithi078.stderr: 10: (BlueStore::_mount()+0x1e8) [0x558c925863a8]
2021-04-07T06:57:31.297 INFO:teuthology.orchestra.run.smithi078.stderr: 11: main()
2021-04-07T06:57:31.297 INFO:teuthology.orchestra.run.smithi078.stderr: 12: __libc_start_main()
2021-04-07T06:57:31.298 INFO:teuthology.orchestra.run.smithi078.stderr: 13: _start()

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{more-active-recovery} backoff/peering_and_degraded ceph clusters/{fixed-2 openstack} crc-failures/default d-balancer/crush-compat mon_election/classic msgr-failures/fastclose msgr/async-v1only objectstore/bluestore-bitmap rados supported-random-distro$/{ubuntu_latest} thrashers/default thrashosds-health workloads/cache}

/a/teuthology-2021-04-07_03:31:02-rados-pacific-distro-basic-smithi/6026187 - no logs

Actions #4

Updated by Sridhar Seshasayee over 2 years ago

/a/yuriw-2021-12-07_16:02:55-rados-wip-yuri11-testing-2021-12-06-1619-distro-default-smithi/6550960


    -1> 2021-12-08T01:49:27.984+0000 7fe1e365a0c0 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-9459-g999737c1/rpm/el8/BUILD/ceph-17.0.0-9459-g999737c1/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fe1e365a0c0 time 2021-12-08T01:49:27.981168+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-9459-g999737c1/rpm/el8/BUILD/ceph-17.0.0-9459-g999737c1/src/kv/RocksDBStore.cc: 1863: ceph_abort_msg("block checksum mismatch: stored = 632344828, computed = 928675071  in db/000047.sst offset 264241 size 3745")

 ceph version 17.0.0-9459-g999737c1 (999737c17bfe912f6f7f669bcefe6a7cb2078721) quincy (dev)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x5609c72ebd63]
 2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3cc) [0x5609c7f3580c]
 3: ceph-osd(+0xbedb0d) [0x5609c7908b0d]
 4: ceph-osd(+0xbd8311) [0x5609c78f3311]
 5: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x479) [0x5609c790ea39]
 6: (BlueStore::fsck_check_objects_shallow(BlueStore::FSCKDepth, long, boost::intrusive_ptr<BlueStore::Collection>, ghobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list const&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mempool::pool_allocator<(mempool::pool_index_t)11, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::map<boost::intrusive_ptr<BlueStore::Blob>, unsigned short, std::less<boost::intrusive_ptr<BlueStore::Blob> >, std::allocator<std::pair<boost::intrusive_ptr<BlueStore::Blob> const, unsigned short> > >*, BlueStore::FSCK_ObjectCtx const&)+0x28e) [0x5609c79764de]
 7: (BlueStore::_fsck_check_objects(BlueStore::FSCKDepth, BlueStore::FSCK_ObjectCtx&)+0x15f7) [0x5609c797b487]
 8: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x14c0) [0x5609c797f170]
 9: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0xcb) [0x5609c7998c8b]
 10: (BlueStore::_mount()+0x7b) [0x5609c799998b]
 11: (OSD::init()+0x403) [0x5609c7426a93]
 12: main()
 13: __libc_start_main()
 14: _start()

Actions #5

Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to High
Actions #6

Updated by Neha Ojha over 2 years ago

Sridhar Seshasayee wrote:

/a/yuriw-2021-12-07_16:02:55-rados-wip-yuri11-testing-2021-12-06-1619-distro-default-smithi/6550960

[...]

This has logs and a coredump.

Actions #7

Updated by Adam Kupczyk about 2 years ago

  • Priority changed from High to Normal
Actions

Also available in: Atom PDF