Project

General

Profile

Actions

Bug #49064

closed

test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder makes osds crash

Added by Neha Ojha about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-01-27T01:27:45.028 INFO:tasks.workunit.client.0.smithi003.stdout:[ RUN      ] EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder
2021-01-27T01:27:47.095 INFO:tasks.workunit.client.0.smithi003.stdout:Test size : loop(64); bulk_size(32768)
2021-01-27T01:31:31.136 INFO:tasks.ceph.osd.1.smithi003.stderr:*** Caught signal (Aborted) **
2021-01-27T01:31:31.193 INFO:tasks.ceph.osd.1.smithi003.stderr: in thread 7f31a1356700 thread_name:tp_osd_tp
2021-01-27T01:31:31.194 INFO:tasks.ceph.osd.2.smithi003.stderr:*** Caught signal (Aborted) **
2021-01-27T01:31:31.194 INFO:tasks.ceph.osd.2.smithi003.stderr: in thread 7fd431786700 thread_name:tp_osd_tp
2021-01-27T01:31:35.287 INFO:tasks.ceph.osd.0.smithi003.stderr:2021-01-27T01:31:16.871+0000 7fb13ded4700 -1 osd.0 67 heartbeat_check: no reply from 172.21.15.3:6816 osd.1 since back 2021-01-27T01:27:57.654292+0000 front 2021-01-27T01:27:57.654556+0000 (oldest deadline 2021-01-27T01:28:24.986702+0000)
2021-01-27T01:31:35.288 INFO:tasks.ceph.osd.0.smithi003.stderr:*** Caught signal (Aborted) **
2021-01-27T01:31:35.288 INFO:tasks.ceph.osd.0.smithi003.stderr: in thread 7fb1236eb700 thread_name:tp_osd_tp
2021-01-27T01:31:50.280 INFO:tasks.ceph.osd.0.smithi003.stderr:2021-01-27T01:31:25.975+0000 7fb13ded4700 -1 osd.0 67 heartbeat_check: no reply from 172.21.15.3:6817 osd.2 since back 2021-01-27T01:27:57.654657+0000 front 2021-01-27T01:27:57.655005+0000 (oldest deadline 2021-01-27T01:28:24.986702+0000)
2021-01-27T01:31:52.024 INFO:tasks.ceph.osd.1.smithi003.stderr:daemon-helper: command crashed with signal 6
2021-01-27T01:31:52.713 INFO:tasks.ceph.osd.2.smithi003.stderr: ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc)
2021-01-27T01:31:52.714 INFO:tasks.ceph.osd.2.smithi003.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0]
2021-01-27T01:31:52.714 INFO:tasks.ceph.osd.2.smithi003.stderr: 2: pthread_kill()
2021-01-27T01:31:52.715 INFO:tasks.ceph.osd.2.smithi003.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c]
2021-01-27T01:31:52.715 INFO:tasks.ceph.osd.2.smithi003.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e]
2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820]
2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4]
2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de]
2021-01-27T01:31:52.717 INFO:tasks.ceph.osd.2.smithi003.stderr: 8: clone()
2021-01-27T01:31:55.231 INFO:tasks.ceph.osd.2.smithi003.stderr:2021-01-27T01:31:54.059+0000 7fd431786700 -1 *** Caught signal (Aborted) **
2021-01-27T01:31:55.231 INFO:tasks.ceph.osd.2.smithi003.stderr: in thread 7fd431786700 thread_name:tp_osd_tp
2021-01-27T01:31:55.232 INFO:tasks.ceph.osd.2.smithi003.stderr:
2021-01-27T01:31:55.232 INFO:tasks.ceph.osd.2.smithi003.stderr: ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc)
2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0]
2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 2: pthread_kill()
2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c]
2021-01-27T01:31:55.234 INFO:tasks.ceph.osd.2.smithi003.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e]
2021-01-27T01:31:55.234 INFO:tasks.ceph.osd.2.smithi003.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820]
2021-01-27T01:31:55.235 INFO:tasks.ceph.osd.2.smithi003.stderr: 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4]
2021-01-27T01:31:55.235 INFO:tasks.ceph.osd.2.smithi003.stderr: 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de]
2021-01-27T01:31:55.236 INFO:tasks.ceph.osd.2.smithi003.stderr: 8: clone()

/a/yuriw-2021-01-26_18:26:10-rados-wip-yuri7-testing-2021-01-26-0840-pacific-distro-basic-smithi/5831177

Seems to be failing intermittently - https://sentry.ceph.com/organizations/ceph/issues/2221/events/5848d46b186e47d2991569eada895318/events/?project=2


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #49134: pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder makes osds crashResolvedActions
Actions #1

Updated by Neha Ojha about 3 years ago

   -3013> 2021-01-27T01:31:19.264+0000 7fd431786700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fd431786700' had 
   timed out after 15.000000954s
   -2988> 2021-01-27T01:31:21.594+0000 7fd431786700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fd431786700' had 
   suicide timed out after 150.000000000s
   -69> 2021-01-27T01:31:54.059+0000 7fd431786700 -1 *** Caught signal (Aborted) **
   in thread 7fd431786700 thread_name:tp_osd_tp

 ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc)
 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0]
 2: pthread_kill()
 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c]
 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e]
 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820]
 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4]
 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de]
 8: clone()
Actions #2

Updated by Neha Ojha about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Neha Ojha
Actions #3

Updated by Neha Ojha about 3 years ago

Trying to reproduce this in https://pulpito.ceph.com/nojha-2021-02-01_22:48:44-rados:singleton-master-distro-basic-smithi/ - we at least see slow requests, if not heartbeat timeouts.

Actions #4

Updated by Neha Ojha about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 39234
Actions #5

Updated by Neha Ojha about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49134: pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder makes osds crash added
Actions #8

Updated by Neha Ojha about 3 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF