Project

General

Profile

Actions

Bug #64670

open

LibRadosAioEC.RoundTrip2 hang and pkill

Added by Laura Flores 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576303

2024-02-29T00:17:48.953 INFO:tasks.workunit.client.0.smithi003.stdout:              api_tier_pp: checking for 188pleWrite
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [       OK ] LibRadosAioEC.SimpleWrite (68572 ms)
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [ RUN      ] LibRadosAioEC.WaitForComplete
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [       OK ] LibRadosAioEC.WaitForComplete (7219 ms)
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [ RUN      ] LibRadosAioEC.RoundTrip
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [       OK ] LibRadosAioEC.RoundTrip (17817 ms)
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stdout:                  api_aio: [ RUN      ] LibRadosAioEC.RoundTrip2
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stderr:bash: line 1: 40233 Alarm clock             ceph_test_rados_api_aio 2>&1
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stderr:     40235 Done                    | tee ceph_test_rados_api_aio.log
2024-02-29T00:17:48.954 INFO:tasks.workunit.client.0.smithi003.stderr:     40236 Done                    | sed "s/^/                  api_aio: /" 
2024-02-29T05:30:03.276 INFO:tasks.workunit.client.0.smithi003.stderr:++ cleanup
2024-02-29T05:30:03.298 INFO:tasks.workunit.client.0.smithi003.stderr:++ pkill -P 40162
2024-02-29T05:30:03.299 DEBUG:teuthology.orchestra.run:got remote process result: 124
2024-02-29T05:30:03.299 INFO:tasks.workunit.client.0.smithi003.stderr:++ true
2024-02-29T05:30:03.300 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...


Related issues 2 (2 open0 closed)

Related to RADOS - Bug #58130: LibRadosAio.SimpleWrite hang and pkillIn ProgressNitzan Mordechai

Actions
Related to RADOS - Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osdNew

Actions
Actions #1

Updated by Laura Flores 2 months ago

  • Related to Bug #58130: LibRadosAio.SimpleWrite hang and pkill added
Actions #2

Updated by Laura Flores 2 months ago

  • Backport set to reef
Actions #3

Updated by Radoslaw Zarzynski 2 months ago

Might be something new. Bump up and observe.

Actions #4

Updated by Radoslaw Zarzynski about 2 months ago

Bump up.

Actions #5

Updated by Radoslaw Zarzynski about 2 months ago

Nothing new but still observing. Bump up.

Actions #6

Updated by Nitzan Mordechai about 1 month ago

  • Related to Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd added
Actions #7

Updated by Nitzan Mordechai about 1 month ago

2024-02-28T23:40:28.274 INFO:tasks.ceph.osd.7.smithi114.stderr: ceph version 18.2.1-605-g6483f82d (6483f82dab01025088079e7d73a9a77cf43321b2) reef (stable)
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 1: /lib64/libc.so.6(+0x54db0) [0x549fdb0]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 2: /lib64/libc.so.6(+0xa154c) [0x54ec54c]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9b716]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x70) [0xb9b810]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x248) [0xbabdd8]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 6: ceph-osd(+0xaa4354) [0xbac354]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 7: /lib64/libc.so.6(+0x9f802) [0x54ea802]
2024-02-28T23:40:28.275 INFO:tasks.ceph.osd.7.smithi114.stderr: 8: clone()

osd.7 terminated, that causing the test to stuck and Alarmed

Actions #8

Updated by Radoslaw Zarzynski about 1 month ago

Looks like a starvation?

Actions

Also available in: Atom PDF