Actions
Bug #49064
closedtest_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder makes osds crash
% Done:
0%
Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-01-27T01:27:45.028 INFO:tasks.workunit.client.0.smithi003.stdout:[ RUN ] EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder 2021-01-27T01:27:47.095 INFO:tasks.workunit.client.0.smithi003.stdout:Test size : loop(64); bulk_size(32768) 2021-01-27T01:31:31.136 INFO:tasks.ceph.osd.1.smithi003.stderr:*** Caught signal (Aborted) ** 2021-01-27T01:31:31.193 INFO:tasks.ceph.osd.1.smithi003.stderr: in thread 7f31a1356700 thread_name:tp_osd_tp 2021-01-27T01:31:31.194 INFO:tasks.ceph.osd.2.smithi003.stderr:*** Caught signal (Aborted) ** 2021-01-27T01:31:31.194 INFO:tasks.ceph.osd.2.smithi003.stderr: in thread 7fd431786700 thread_name:tp_osd_tp 2021-01-27T01:31:35.287 INFO:tasks.ceph.osd.0.smithi003.stderr:2021-01-27T01:31:16.871+0000 7fb13ded4700 -1 osd.0 67 heartbeat_check: no reply from 172.21.15.3:6816 osd.1 since back 2021-01-27T01:27:57.654292+0000 front 2021-01-27T01:27:57.654556+0000 (oldest deadline 2021-01-27T01:28:24.986702+0000) 2021-01-27T01:31:35.288 INFO:tasks.ceph.osd.0.smithi003.stderr:*** Caught signal (Aborted) ** 2021-01-27T01:31:35.288 INFO:tasks.ceph.osd.0.smithi003.stderr: in thread 7fb1236eb700 thread_name:tp_osd_tp 2021-01-27T01:31:50.280 INFO:tasks.ceph.osd.0.smithi003.stderr:2021-01-27T01:31:25.975+0000 7fb13ded4700 -1 osd.0 67 heartbeat_check: no reply from 172.21.15.3:6817 osd.2 since back 2021-01-27T01:27:57.654657+0000 front 2021-01-27T01:27:57.655005+0000 (oldest deadline 2021-01-27T01:28:24.986702+0000) 2021-01-27T01:31:52.024 INFO:tasks.ceph.osd.1.smithi003.stderr:daemon-helper: command crashed with signal 6 2021-01-27T01:31:52.713 INFO:tasks.ceph.osd.2.smithi003.stderr: ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc) 2021-01-27T01:31:52.714 INFO:tasks.ceph.osd.2.smithi003.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0] 2021-01-27T01:31:52.714 INFO:tasks.ceph.osd.2.smithi003.stderr: 2: pthread_kill() 2021-01-27T01:31:52.715 INFO:tasks.ceph.osd.2.smithi003.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c] 2021-01-27T01:31:52.715 INFO:tasks.ceph.osd.2.smithi003.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e] 2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820] 2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4] 2021-01-27T01:31:52.716 INFO:tasks.ceph.osd.2.smithi003.stderr: 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de] 2021-01-27T01:31:52.717 INFO:tasks.ceph.osd.2.smithi003.stderr: 8: clone() 2021-01-27T01:31:55.231 INFO:tasks.ceph.osd.2.smithi003.stderr:2021-01-27T01:31:54.059+0000 7fd431786700 -1 *** Caught signal (Aborted) ** 2021-01-27T01:31:55.231 INFO:tasks.ceph.osd.2.smithi003.stderr: in thread 7fd431786700 thread_name:tp_osd_tp 2021-01-27T01:31:55.232 INFO:tasks.ceph.osd.2.smithi003.stderr: 2021-01-27T01:31:55.232 INFO:tasks.ceph.osd.2.smithi003.stderr: ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc) 2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0] 2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 2: pthread_kill() 2021-01-27T01:31:55.233 INFO:tasks.ceph.osd.2.smithi003.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c] 2021-01-27T01:31:55.234 INFO:tasks.ceph.osd.2.smithi003.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e] 2021-01-27T01:31:55.234 INFO:tasks.ceph.osd.2.smithi003.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820] 2021-01-27T01:31:55.235 INFO:tasks.ceph.osd.2.smithi003.stderr: 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4] 2021-01-27T01:31:55.235 INFO:tasks.ceph.osd.2.smithi003.stderr: 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de] 2021-01-27T01:31:55.236 INFO:tasks.ceph.osd.2.smithi003.stderr: 8: clone()
/a/yuriw-2021-01-26_18:26:10-rados-wip-yuri7-testing-2021-01-26-0840-pacific-distro-basic-smithi/5831177
Seems to be failing intermittently - https://sentry.ceph.com/organizations/ceph/issues/2221/events/5848d46b186e47d2991569eada895318/events/?project=2
Updated by Neha Ojha about 3 years ago
-3013> 2021-01-27T01:31:19.264+0000 7fd431786700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fd431786700' had timed out after 15.000000954s -2988> 2021-01-27T01:31:21.594+0000 7fd431786700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fd431786700' had suicide timed out after 150.000000000s -69> 2021-01-27T01:31:54.059+0000 7fd431786700 -1 *** Caught signal (Aborted) ** in thread 7fd431786700 thread_name:tp_osd_tp ceph version 16.1.0-18-g6ae6c340 (6ae6c340188bb4cda209cbc795db104d877b4516) pacific (rc) 1: /lib64/libpthread.so.0(+0x12dc0) [0x7fd4555d1dc0] 2: pthread_kill() 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48c) [0x560fe0029e1c] 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x560fe002a20e] 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x560fe004a820] 6: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560fe004d4d4] 7: /lib64/libpthread.so.0(+0x82de) [0x7fd4555c72de] 8: clone()
Updated by Neha Ojha about 3 years ago
- Status changed from New to In Progress
- Assignee set to Neha Ojha
Updated by Neha Ojha about 3 years ago
Trying to reproduce this in https://pulpito.ceph.com/nojha-2021-02-01_22:48:44-rados:singleton-master-distro-basic-smithi/ - we at least see slow requests, if not heartbeat timeouts.
Updated by Neha Ojha about 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 39234
Updated by Neha Ojha about 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 3 years ago
- Copied to Backport #49134: pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder makes osds crash added
Updated by Yuri Weinstein about 3 years ago
Updated by Neha Ojha about 3 years ago
- Status changed from Pending Backport to Resolved
Actions