Bug #64866
openrados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
0%
Description
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587349
There was a similar failure reported in https://tracker.ceph.com/issues/47025. The PR associated
with it is merged and therefore this new tracker is raised for the new failure.
2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: List watches 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: notify2 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: handle_notify cookie 94576533816384 notify_id 25035364368385 notifier_gid 45581 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: handle_error cookie 94576533816384 err -107 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: notify2 done 2024-03-09T20:29:24.634 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1875-g831bc188/rpm/el9/BUILD/ceph-19.0.0-1875-g831bc188/src/test/librados/watch_notify_cxx.cc:400: Failure 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Expected equality of these values: 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: 1u 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Which is: 1 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: reply_map.size() 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: Which is: 0 2024-03-09T20:29:24.635 INFO:tasks.workunit.client.0.smithi005.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache" (16749 ms)
Updated by Radoslaw Zarzynski 2 months ago
- Assignee set to Nitzan Mordechai
Hi Nitzan! Would you mind taking a look?
Updated by Nitzan Mordechai 2 months ago
- Status changed from New to In Progress
the client log shows cookie 94576533816384
2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking linger op 26 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_read 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 78 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_read 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 79 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 80 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 10 client.45581.objecter checking op 81 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target epoch 5837 base foo @148;test-rados-api-smithi005-48191-19 precalc_pgid 0 pgid 0.0 is_write 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter _calc_target target foo @149;test-rados-api-smithi005-48191-19 -> pgid 149.f0bcca4d 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter dump_active .. 0 homeless 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 78 149.f0bcca4d osd.6 foo [notify-ack in=25b] 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 79 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384] 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 80 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384] 2024-03-09T20:29:21.122+0000 7f435f7fe640 20 client.45581.objecter 81 149.f0bcca4d osd.6 foo [watch ping cookie 94576533816384] 2024-03-09T20:29:21.268+0000 7f4370296640 1 -- 172.21.15.5:0/1247768562 <== osd.6 v2:172.21.15.191:6824/1548572424 17 ==== watch-notify(disconnect (3) cookie 94576533816384 notify 0 ret 0) v3 ==== 42+0+0 (crc 0 0 0) 0x7f4344005530 con 0x5604509ea950 2024-03-09T20:29:21.268+0000 7f4370296640 10 client.45581.objecter ms_dispatch 0x5604509cba00 watch-notify(disconnect (3) cookie 94576533816384 notify 0 ret 0) v3 2024-03-09T20:29:21.268+0000 7f4370296640 1 -- 172.21.15.5:0/1247768562 <== osd.6 v2:172.21.15.191:6824/1548572424 18 ==== watch-notify(notify_complete (2) cookie 94576533836096 notify 25035364368385 ret 0) v3 ==== 42+0+8 (crc 0 0 0) 0x7f4344005530 con 0x5604509ea950 2024-03-09T20:29:21.268+0000 7f436d030640 10 librados: handle_error cookie 94576533816384 err -107
checking osd.6 shows the reason of -107 - Timeout of the watch:
2024-03-09T20:29:18.473+0000 7fc83955c640 10 osd.6 pg_epoch: 5836 pg[149.1(unlocked)] -- Watch(94576533816384,client.45581) HandleWatchTimeout
recreate it easily using notify_sleep in the unit-tes, we are not getting any error that say the watch was timeout. i wonder if that's the correct behavior.
Updated by Nitzan Mordechai 2 months ago
after checking deeply, only watch_check will give us the timeout return code.
the client log also shows that we waited more than 13 seconds which caused the watch to time out and the following failure happened, the only fix we can do here is to increase the timeout.
Updated by Nitzan Mordechai 2 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56334
Updated by Sridhar Seshasayee 4 days ago
Seen on Squid:
/a/yuriw-2024-05-14_00:32:08-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7705408