Bug #47025
rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
0%
Description
2020-08-14T23:23:58.787 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyECPP.WatchNotify 2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-4375-g4fea9a8be3/rpm/el8/BUILD/ceph-16.0.0-4375-g4fea9a8be3/src/test/librados/watch_notify_cxx.cc:124: Failure 2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: Expected equality of these values: 2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: 1u 2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: Which is: 1 2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: watches.size() 2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: Which is: 0 2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify (159389 ms)
/a/teuthology-2020-08-14_07:01:01-rados-master-distro-basic-smithi/5346903
Related issues
History
#1 Updated by Brad Hubbard over 2 years ago
- Duplicated by Bug #45424: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify watch_notify_cxx.cc:124 added
#2 Updated by Brad Hubbard over 2 years ago
- Related to Bug #50042: rados/test.sh: api_watch_notify failures added
#3 Updated by Brad Hubbard over 2 years ago
There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#note-17 and https://tracker.ceph.com/issues/50042#note-18 where the tests are failing due to watches timing out.
The timeouts seen so far are:
src/test/librados/watch_notify_cxx.cc:124 TEST_F(LibRadosWatchNotifyECPP, WatchNotify)
src/test/librados/watch_notify_cxx.cc:234 TEST_P(LibRadosWatchNotifyPP, AioWatchNotify2)
src/test/librados/watch_notify_cxx.cc:345 TEST_P(LibRadosWatchNotifyPP, WatchNotify3)
#4 Updated by Sridhar Seshasayee over 2 years ago
Observed on master:
/a/sseshasa-2021-07-14_10:37:09-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6270035
#5 Updated by Brad Hubbard over 2 years ago
- Assignee set to Brad Hubbard
#6 Updated by Brad Hubbard over 2 years ago
Sridhar Seshasayee wrote:
Observed on master:
/a/sseshasa-2021-07-14_10:37:09-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6270035
$ find . name ceph-osd.\*.log.gz -print -exec zgrep HandleWatchTimeout {} \; Watch(94061549223856,client.4914) HandleWatchTimeout
./remote/smithi002/log/ceph-osd.3.log.gz
./remote/smithi002/log/ceph-osd.2.log.gz
2021-07-14T12:01:33.748+0000 2d379700 10 osd.2 pg_epoch: 99 pg[29.1(unlocked)] -
./remote/smithi002/log/ceph-osd.1.log.gz
./remote/smithi002/log/ceph-osd.0.log.gz
./remote/smithi188/log/ceph-osd.4.log.gz
./remote/smithi188/log/ceph-osd.5.log.gz
./remote/smithi188/log/ceph-osd.6.log.gz
./remote/smithi188/log/ceph-osd.7.log.gz
2021-07-14T12:02:37.032+0000 2cdb0700 10 osd.7 pg_epoch: 149 pg[17.18(unlocked)] -- Watch(94588445876080,client.4902) HandleWatchTimeout
Same as we've seen previously. Looking further into the cause of the timeouts.
#7 Updated by Laura Flores over 1 year ago
Instance of LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3:
/a/yuriw-2022-04-01_17:44:32-rados-wip-yuri3-testing-2022-04-01-0659-distro-default-smithi/6772767
2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1
2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: List watches
2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: notify2
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: handle_notify cookie 94783737222992 notify_id 3238405341191 notifier_gid 14126
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: handle_error cookie 94783737222992 err -107
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: notify2 done
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11422-g15f77bd3/rpm/el8/BUILD/ceph-17.0.0-11422-g15f77bd3/src/test/librados/watch_notify_cxx.cc:362: Failure
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: Expected equality of these values:
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: 1u
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: Which is: 1
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: reply_map.size()
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: Which is: 0
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache" (18873 ms)
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [----------] 14 tests from LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP (864178 ms total)
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp:
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [----------] Global test environment tear-down
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [==========] 16 tests from 2 test suites ran. (1136377 ms total)
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [ PASSED ] 15 tests.
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [ FAILED ] 1 test, listed below:
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache"
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp:
2022-04-01T22:27:22.376 INFO:tasks.workunit.client.0.smithi092.stdout: api_watch_notify_pp: 1 FAILED TEST
2022-04-01T22:27:22.376 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700 1 -- 172.21.15.92:0/348300405 shutdown_connections
2022-04-01T22:27:22.377 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700 1 -- 172.21.15.92:0/348300405 wait complete.
2022-04-01T22:27:22.377 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700 1 librados: shutdown
#8 Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2022-04-07_20:00:39-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6781441
2022-04-07T21:31:24.706 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyECPP.WatchNotify 2022-04-07T21:31:24.706 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/test/librados/watch_notify_cxx.cc:124: Failure 2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: Expected equality of these values: 2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: 1u 2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: Which is: 1 2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: watches.size() 2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: Which is: 0 2022-04-07T21:31:24.708 INFO:tasks.workunit.client.0.smithi081.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify (3247087 ms)
#9 Updated by Nitzan Mordechai over 1 year ago
Laura Flores wrote:
Instance of LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3:
/a/yuriw-2022-04-01_17:44:32-rados-wip-yuri3-testing-2022-04-01-0659-distro-default-smithi/6772767
[...]
we can fix that by modifying the error callback handler and reconnect the session
#10 Updated by Nitzan Mordechai over 1 year ago
the error in LibRadosWatchNotifyECPP.WatchNotify happened after calling watch, and i think that's the one that we are calling:
// old, deprecated versions
int watch(const std::string& o, uint64_t ver, uint64_t *cookie,
librados::WatchCtx *ctx) attribute ((deprecated));
can we use the newer version with a callback?
WatchCtx doesn't have an error callback, and the callback happened only after we use Notify, we cant check the watch before that and that is the reason watches size is 0
#11 Updated by Radoslaw Zarzynski over 1 year ago
Laura, @Nitzan: the assertion failure in the comment #7 is about @reply_map.size()
while other occurrences mention watches,size()
. Do you think they are familiar enough or maybe we should dissect #7 into a dedicated ticket?
UPDATE: hmm, it looks https://tracker.ceph.com/issues/45868 might the one for reply_map
issues (and it's under review already).
#12 Updated by Laura Flores over 1 year ago
- Backport set to pacific
This is from the 16.2.8 run.
/a/yuriw-2022-05-04_20:09:21-rados-pacific-distro-default-smithi/6821705
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyECPP.WatchNotify
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7-981-g73636a1b/rpm/el8/BUILD/ceph-16.2.7-981-g73636a1b/src/test/librados/watch_notify_cxx.cc:124: Failure
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: Expected equality of these values:
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: 1u
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: Which is: 1
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: watches.size()
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: Which is: 0
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify (243732 ms)
#13 Updated by Radoslaw Zarzynski over 1 year ago
- Backport changed from pacific to pacific,quincy
#14 Updated by Brad Hubbard 12 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 49109