Bug #64866
open
rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
Added by Sridhar Seshasayee 2 months ago.
Updated 4 days ago.
Description
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587349
There was a similar failure reported in https://tracker.ceph.com/issues/47025. The PR associated
with it is merged and therefore this new tracker is raised for the new failure.
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: List watches
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: notify2
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: handle_notify cookie 94576533816384 notify_id 25035364368385 notifier_gid 45581
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: handle_error cookie 94576533816384 err -107
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: notify2 done
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1875-g831bc188/rpm/el9/BUILD/ceph-19.0.0-1875-g831bc188/src/test/librados/watch_notify_cxx.cc:400: Failure
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Expected equality of these values:
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: 1u
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Which is: 1
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: reply_map.size()
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Which is: 0
2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache" (16749 ms)
- Assignee set to Nitzan Mordechai
Hi Nitzan! Would you mind taking a look?
- Status changed from New to In Progress
the client log shows cookie 94576533816384
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking linger op 26
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_read
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 78
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_read
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 79
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 80
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 81
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter dump_active .. 0 homeless
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 78 149.f0bcca4d osd.6 foo [notify-ack in=25b]
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 79 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384]
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 80 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384]
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 81 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384]
2024-03-09T20:29:21.268+0000 7f4370296640 1 -- 172.21.15.5:0/1247768562 <== osd.6 v2:172.21.15.191:6824/1548572424 17 ==== watch-notify(disconnect (3) cookie 94576533816384 notify 0 ret 0) v3 ==== 42+0+0 (crc 0 0 0) 0x7f4344005530 con 0x5604509ea950
2024-03-09T20:29:21.268+0000 7f4370296640 10 client.45581.objecter ms_dispatch 0x5604509cba00 watch-notify(disconnect (3) cookie 94576533816384 notify 0 ret 0) v3
2024-03-09T20:29:21.268+0000 7f4370296640 1 -- 172.21.15.5:0/1247768562 <== osd.6 v2:172.21.15.191:6824/1548572424 18 ==== watch-notify(notify_complete (2) cookie 94576533836096 notify 25035364368385 ret 0) v3 ==== 42+0+8 (crc 0 0 0) 0x7f4344005530 con 0x5604509ea950
2024-03-09T20:29:21.268+0000 7f436d030640 10 librados: handle_error cookie 94576533816384 err -107
checking osd.6 shows the reason of -107 - Timeout of the watch:
2024-03-09T20:29:18.473+0000 7fc83955c640 10 osd.6 pg_epoch: 5836 pg[149.1(unlocked)] -- Watch(94576533816384,client.45581) HandleWatchTimeout
recreate it easily using notify_sleep in the unit-tes, we are not getting any error that say the watch was timeout. i wonder if that's the correct behavior.
after checking deeply, only watch_check will give us the timeout return code.
the client log also shows that we waited more than 13 seconds which caused the watch to time out and the following failure happened, the only fix we can do here is to increase the timeout.
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56334
Seen on Squid:
/a/yuriw-2024-05-14_00:32:08-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7705408
Also available in: Atom
PDF