Project

General

Profile

Actions

Bug #57751

closed

LibRadosAio.SimpleWritePP hang and pkill

Added by Nitzan Mordechai over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/nmordech-2022-10-02_08:27:55-rados:verify-wip-nm-51282-distro-default-smithi/7051967/

2022-10-02T11:07:47.250 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [==========] Running 54 tests from 4 test suites.
2022-10-02T11:07:47.251 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [----------] Global test environment set-up.
2022-10-02T11:07:47.251 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [----------] 29 tests from LibRadosAio
2022-10-02T11:07:47.252 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [ RUN      ] LibRadosAio.TooBigPP
2022-10-02T11:07:47.252 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [       OK ] LibRadosAio.TooBigPP (4688 ms)
2022-10-02T11:07:47.253 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [ RUN      ] LibRadosAio.PoolQuotaPP
2022-10-02T11:07:47.253 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [       OK ] LibRadosAio.PoolQuotaPP (500615 ms)
2022-10-02T11:07:47.254 INFO:tasks.workunit.client.0.smithi088.stdout:               api_aio_pp: [ RUN      ] LibRadosAio.SimpleWritePP
2022-10-02T11:07:47.255 INFO:tasks.workunit.client.0.smithi088.stderr:bash: line 1: 116814 Alarm clock             ceph_test_rados_api_aio_pp 2>&1
2022-10-02T11:07:47.255 INFO:tasks.workunit.client.0.smithi088.stderr:     116815 Done                    | tee ceph_test_rados_api_aio_pp.log
2022-10-02T11:07:47.256 INFO:tasks.workunit.client.0.smithi088.stderr:     116816 Done                    | sed "s/^/               api_aio_pp: /" 

The watcher never released, we are trying to write object (with space name) but we calculate the acting osd wrongly.

2022-10-02T10:27:47.246+0000 7f2283fff700 10 client.5271.objecter ms_dispatch 0x555e17e6d640 osd_map(419..419 src has 1..419) v4
2022-10-02T10:27:47.247+0000 7f2283fff700  3 client.5271.objecter handle_osd_map got epochs [419,419] > 418
2022-10-02T10:27:47.247+0000 7f2283fff700  3 client.5271.objecter handle_osd_map decoding incremental epoch 419
2022-10-02T10:27:47.247+0000 7f2283fff700 20 client.5271.objecter dump_active .. 0 homeless
2022-10-02T10:27:47.247+0000 7f2295b48880 20 librados: aio_write SimpleWritePP 0~128 snapc=0=[] snap_seq=head
2022-10-02T10:27:47.247+0000 7f2295b48880 20 librados: queue_aio_write 0x7f228c00c910 completion 0x7f226c086a50 write_seq 1
2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _op_submit op 0x555e18003e40
2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _calc_target epoch 419 base SimpleWritePP @48;nspace precalc_pgid 0 pgid 0.0 is_write
2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _calc_target target SimpleWritePP @48;nspace -> pgid 48.e8089f1a
2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _calc_target  raw pgid 48.e8089f1a -> actual 48.1a acting [0] primary 0
2022-10-02T10:27:47.247+0000 7f2295b48880  1 --2- 172.21.15.88:0/1632739727 >> v2:172.21.15.88:6801/106736 conn(0x555e18004840 0x555e17f621f0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter _get_session s=0x7f228400a130 osd=0 3
2022-10-02T10:27:47.247+0000 7f2295b48880 10 client.5271.objecter _op_submit oid SimpleWritePP '@48;nspace' '@48;nspace' [write 0~128 in=128b] tid 2 osd.0
2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter get_session s=0x7f228400a130 osd=0 3
2022-10-02T10:27:47.247+0000 7f2295b48880 15 client.5271.objecter _session_op_assign 0 2
2022-10-02T10:27:47.247+0000 7f2295b48880 15 client.5271.objecter _send_op 2 to 48.1a on osd.0
2022-10-02T10:27:47.247+0000 7f2295b48880  1 -- 172.21.15.88:0/1632739727 --> v2:172.21.15.88:6801/106736 -- osd_op(unknown.0.0:2 48.1a 48:58f91017:nspace::SimpleWritePP:head [write 0~128 in=128b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e419) v8 -- 0x555e17f62730 con 0x555e18004840
2022-10-02T10:27:47.247+0000 7f2295b48880 20 client.5271.objecter put_session s=0x7f228400a130 osd=0 4
2022-10-02T10:27:47.247+0000 7f2295b48880  5 client.5271.objecter 1 in flight
2022-10-02T10:27:48.183+0000 7f2282ffd700 10 monclient: tick

it supposed to be osd.3 according to the pgid. Last year changes https://github.com/ceph/ceph/commit/552e707c4ba13ba8a29aeee5f39de86357984a75


Related issues 4 (1 open3 closed)

Related to RADOS - Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.ResolvedNitzan Mordechai

Actions
Related to RADOS - Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64ResolvedNitzan Mordechai

Actions
Related to RADOS - Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)ResolvedNitzan Mordechai

Actions
Related to RADOS - Bug #58130: LibRadosAio.SimpleWrite hang and pkillIn ProgressNitzan Mordechai

Actions
Actions #1

Updated by Nitzan Mordechai over 1 year ago

This is not an issue with the test, not all the osd are up, and we are waiting (valgrind report memory leak from rocksdb https://tracker.ceph.com/issues/52136 and https://tracker.ceph.com/issues/53575)

Actions #2

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. added
Actions #3

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 added
Actions #4

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from New to In Progress
Actions #5

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 48641
Actions #6

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify) added
Actions #7

Updated by Brad Hubbard over 1 year ago

possibly 58130 is related

Actions #8

Updated by Laura Flores over 1 year ago

  • Related to Bug #58130: LibRadosAio.SimpleWrite hang and pkill added
Actions #10

Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF