Bug #57751
LibRadosAio.SimpleWritePP hang and pkill
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/nmordech-2022-10-02_08:27:55-rados:verify-wip-nm-51282-distro-default-smithi/7051967/
2022-10-02T11:07:47.250 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [==========] Running 54 tests from 4 test suites. 2022-10-02T11:07:47.251 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [----------] Global test environment set-up. 2022-10-02T11:07:47.251 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [----------] 29 tests from LibRadosAio 2022-10-02T11:07:47.252 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [ RUN ] LibRadosAio.TooBigPP 2022-10-02T11:07:47.252 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [ OK ] LibRadosAio.TooBigPP (4688 ms) 2022-10-02T11:07:47.253 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [ RUN ] LibRadosAio.PoolQuotaPP 2022-10-02T11:07:47.253 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [ OK ] LibRadosAio.PoolQuotaPP (500615 ms) 2022-10-02T11:07:47.254 INFO:tasks.workunit.client.0.smithi088.stdout: api_aio_pp: [ RUN ] LibRadosAio.SimpleWritePP 2022-10-02T11:07:47.255 INFO:tasks.workunit.client.0.smithi088.stderr:bash: line 1: 116814 Alarm clock ceph_test_rados_api_aio_pp 2>&1 2022-10-02T11:07:47.255 INFO:tasks.workunit.client.0.smithi088.stderr: 116815 Done | tee ceph_test_rados_api_aio_pp.log 2022-10-02T11:07:47.256 INFO:tasks.workunit.client.0.smithi088.stderr: 116816 Done | sed "s/^/ api_aio_pp: /"
The watcher never released, we are trying to write object (with space name) but we calculate the acting osd wrongly.
2022-10-02T10:27:47.246+0000 7f2283fff700 10 client.5271.objecter ms_dispatch 0x555e17e6d640 osd_map(419..419 src has 1..419) v4 2022-10-02T10:27:47.247+0000 7f2283fff700 3 client.5271.objecter handle_osd_map got epochs [419,419] > 418 2022-10-02T10:27:47.247+0000 7f2283fff700 3 client.5271.objecter handle_osd_map decoding incremental epoch 419 2022-10-02T10:27:47.247+0000 7f2283fff700 20 client.5271.objecter dump_active .. 0 homeless 2022-10-02T10:27:47.247+0000 7f2295b48880 20 librados: aio_write SimpleWritePP 0~128 snapc=0=[] snap_seq=head 2022-10-02T10:27:47.247+0000 7f2295b48880 20 librados: queue_aio_write 0x7f228c00c910 completion 0x7f226c086a50 write_seq 1 2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _op_submit op 0x555e18003e40 2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _calc_target epoch 419 base SimpleWritePP @48;nspace precalc_pgid 0 pgid 0.0 is_write 2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _calc_target target SimpleWritePP @48;nspace -> pgid 48.e8089f1a 2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _calc_target raw pgid 48.e8089f1a -> actual 48.1a acting [0] primary 0 2022-10-02T10:27:47.247+0000 7f2295b48880 1 --2- 172.21.15.88:0/1632739727 >> v2:172.21.15.88:6801/106736 conn(0x555e18004840 0x555e17f621f0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect 2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _get_session s=0x7f228400a130 osd=0 3 2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _op_submit oid SimpleWritePP '@48;nspace' '@48;nspace' [write 0~128 in=128b] tid 2 osd.0 2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter get_session s=0x7f228400a130 osd=0 3 2022-10-02T10:27:47.247+0000 7f2295b48880 15 client.5271.objecter _session_op_assign 0 2 2022-10-02T10:27:47.247+0000 7f2295b48880 15 client.5271.objecter _send_op 2 to 48.1a on osd.0 2022-10-02T10:27:47.247+0000 7f2295b48880 1 -- 172.21.15.88:0/1632739727 --> v2:172.21.15.88:6801/106736 -- osd_op(unknown.0.0:2 48.1a 48:58f91017:nspace::SimpleWritePP:head [write 0~128 in=128b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e419) v8 -- 0x555e17f62730 con 0x555e18004840 2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter put_session s=0x7f228400a130 osd=0 4 2022-10-02T10:27:47.247+0000 7f2295b48880 5 client.5271.objecter 1 in flight 2022-10-02T10:27:48.183+0000 7f2282ffd700 10 monclient: tick
it supposed to be osd.3 according to the pgid. Last year changes https://github.com/ceph/ceph/commit/552e707c4ba13ba8a29aeee5f39de86357984a75
Related issues
History
#1 Updated by Nitzan Mordechai about 1 year ago
This is not an issue with the test, not all the osd are up, and we are waiting (valgrind report memory leak from rocksdb https://tracker.ceph.com/issues/52136 and https://tracker.ceph.com/issues/53575)
#2 Updated by Nitzan Mordechai about 1 year ago
- Related to Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. added
#3 Updated by Nitzan Mordechai about 1 year ago
- Related to Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 added
#4 Updated by Nitzan Mordechai about 1 year ago
- Status changed from New to In Progress
#5 Updated by Nitzan Mordechai about 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 48641
#6 Updated by Nitzan Mordechai about 1 year ago
- Related to Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify) added
#7 Updated by Brad Hubbard 12 months ago
possibly 58130 is related
#8 Updated by Laura Flores 12 months ago
- Related to Bug #58130: LibRadosAio.SimpleWrite hang and pkill added
#9 Updated by Yuri Weinstein 12 months ago
#10 Updated by Radoslaw Zarzynski 12 months ago
- Status changed from Fix Under Review to Resolved