Project

General

Profile

Actions

Bug #47025

open

rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed

Added by Neha Ojha over 3 years ago. Updated over 1 year ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-08-14T23:23:58.787 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyECPP.WatchNotify
2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-4375-g4fea9a8be3/rpm/el8/BUILD/ceph-16.0.0-4375-g4fea9a8be3/src/test/librados/watch_notify_cxx.cc:124: Failure
2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp: Expected equality of these values:
2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp:   1u
2020-08-14T23:23:58.788 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp:     Which is: 1
2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp:   watches.size()
2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp:     Which is: 0
2020-08-14T23:23:58.789 INFO:tasks.workunit.client.0.smithi112.stdout:      api_watch_notify_pp: [  FAILED  ] LibRadosWatchNotifyECPP.WatchNotify (159389 ms)

/a/teuthology-2020-08-14_07:01:01-rados-master-distro-basic-smithi/5346903


Related issues 2 (1 open1 closed)

Related to RADOS - Bug #50042: rados/test.sh: api_watch_notify failuresResolvedNitzan Mordechai

Actions
Has duplicate RADOS - Bug #45424: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify watch_notify_cxx.cc:124New

Actions
Actions #1

Updated by Brad Hubbard almost 3 years ago

  • Has duplicate Bug #45424: api_watch_notify_pp: [ FAILED ] LibRadosWatchNotifyECPP.WatchNotify watch_notify_cxx.cc:124 added
Actions #2

Updated by Brad Hubbard almost 3 years ago

  • Related to Bug #50042: rados/test.sh: api_watch_notify failures added
Actions #3

Updated by Brad Hubbard almost 3 years ago

There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#note-17 and https://tracker.ceph.com/issues/50042#note-18 where the tests are failing due to watches timing out.

The timeouts seen so far are:

src/test/librados/watch_notify_cxx.cc:124 TEST_F(LibRadosWatchNotifyECPP, WatchNotify)

src/test/librados/watch_notify_cxx.cc:234 TEST_P(LibRadosWatchNotifyPP, AioWatchNotify2)

src/test/librados/watch_notify_cxx.cc:345 TEST_P(LibRadosWatchNotifyPP, WatchNotify3)

Actions #4

Updated by Sridhar Seshasayee almost 3 years ago

Observed on master:
/a/sseshasa-2021-07-14_10:37:09-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6270035

Actions #5

Updated by Brad Hubbard over 2 years ago

  • Assignee set to Brad Hubbard
Actions #6

Updated by Brad Hubbard over 2 years ago

Sridhar Seshasayee wrote:

Observed on master:
/a/sseshasa-2021-07-14_10:37:09-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6270035

$ find . name ceph-osd.\*.log.gz -print -exec zgrep HandleWatchTimeout {} \;
./remote/smithi002/log/ceph-osd.3.log.gz
./remote/smithi002/log/ceph-osd.2.log.gz
2021-07-14T12:01:33.748+0000 2d379700 10 osd.2 pg_epoch: 99 pg[29.1(unlocked)] -
Watch(94061549223856,client.4914) HandleWatchTimeout
./remote/smithi002/log/ceph-osd.1.log.gz
./remote/smithi002/log/ceph-osd.0.log.gz
./remote/smithi188/log/ceph-osd.4.log.gz
./remote/smithi188/log/ceph-osd.5.log.gz
./remote/smithi188/log/ceph-osd.6.log.gz
./remote/smithi188/log/ceph-osd.7.log.gz
2021-07-14T12:02:37.032+0000 2cdb0700 10 osd.7 pg_epoch: 149 pg[17.18(unlocked)] -- Watch(94588445876080,client.4902) HandleWatchTimeout

Same as we've seen previously. Looking further into the cause of the timeouts.

Actions #7

Updated by Laura Flores about 2 years ago

Instance of LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3:

/a/yuriw-2022-04-01_17:44:32-rados-wip-yuri3-testing-2022-04-01-0659-distro-default-smithi/6772767

2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1
2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: List watches
2022-04-01T22:27:22.371 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: notify2
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: handle_notify cookie 94783737222992 notify_id 3238405341191 notifier_gid 14126
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: handle_error cookie 94783737222992 err -107
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: notify2 done
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11422-g15f77bd3/rpm/el8/BUILD/ceph-17.0.0-11422-g15f77bd3/src/test/librados/watch_notify_cxx.cc:362: Failure
2022-04-01T22:27:22.372 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: Expected equality of these values:
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:   1u
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:     Which is: 1
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:   reply_map.size()
2022-04-01T22:27:22.373 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:     Which is: 0
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [  FAILED  ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache" (18873 ms)
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [----------] 14 tests from LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP (864178 ms total)
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [----------] Global test environment tear-down
2022-04-01T22:27:22.374 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [==========] 16 tests from 2 test suites ran. (1136377 ms total)
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [  PASSED  ] 15 tests.
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [  FAILED  ] 1 test, listed below:
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp: [  FAILED  ] LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1, where GetParam() = "cache" 
2022-04-01T22:27:22.375 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:
2022-04-01T22:27:22.376 INFO:tasks.workunit.client.0.smithi092.stdout:      api_watch_notify_pp:  1 FAILED TEST
2022-04-01T22:27:22.376 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700  1 -- 172.21.15.92:0/348300405 shutdown_connections
2022-04-01T22:27:22.377 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700  1 -- 172.21.15.92:0/348300405 wait complete.
2022-04-01T22:27:22.377 INFO:teuthology.orchestra.run.smithi092.stderr:2022-04-01T22:27:18.728+0000 7f9d24da3700  1 librados: shutdown

Actions #8

Updated by Aishwarya Mathuria about 2 years ago

/a/yuriw-2022-04-07_20:00:39-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6781441

2022-04-07T21:31:24.706 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyECPP.WatchNotify
2022-04-07T21:31:24.706 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/test/librados/watch_notify_cxx.cc:124: Failure
2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp: Expected equality of these values:
2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp:   1u
2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp:     Which is: 1
2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp:   watches.size()
2022-04-07T21:31:24.707 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp:     Which is: 0
2022-04-07T21:31:24.708 INFO:tasks.workunit.client.0.smithi081.stdout:      api_watch_notify_pp: [  FAILED  ] LibRadosWatchNotifyECPP.WatchNotify (3247087 ms)
Actions #9

Updated by Nitzan Mordechai about 2 years ago

Laura Flores wrote:

Instance of LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3:

/a/yuriw-2022-04-01_17:44:32-rados-wip-yuri3-testing-2022-04-01-0659-distro-default-smithi/6772767
[...]

we can fix that by modifying the error callback handler and reconnect the session

Actions #10

Updated by Nitzan Mordechai about 2 years ago

the error in LibRadosWatchNotifyECPP.WatchNotify happened after calling watch, and i think that's the one that we are calling:

// old, deprecated versions
int watch(const std::string& o, uint64_t ver, uint64_t *cookie,
librados::WatchCtx *ctx) attribute ((deprecated));

can we use the newer version with a callback?
WatchCtx doesn't have an error callback, and the callback happened only after we use Notify, we cant check the watch before that and that is the reason watches size is 0

Actions #11

Updated by Radoslaw Zarzynski about 2 years ago

Laura, @Nitzan: the assertion failure in the comment #7 is about @reply_map.size() while other occurrences mention watches,size(). Do you think they are familiar enough or maybe we should dissect #7 into a dedicated ticket?

UPDATE: hmm, it looks https://tracker.ceph.com/issues/45868 might the one for reply_map issues (and it's under review already).

Actions #12

Updated by Laura Flores almost 2 years ago

  • Backport set to pacific

This is from the 16.2.8 run.

/a/yuriw-2022-05-04_20:09:21-rados-pacific-distro-default-smithi/6821705

2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyECPP.WatchNotify
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7-981-g73636a1b/rpm/el8/BUILD/ceph-16.2.7-981-g73636a1b/src/test/librados/watch_notify_cxx.cc:124: Failure
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp: Expected equality of these values:
2022-05-05T00:23:33.440 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp:   1u
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp:     Which is: 1
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp:   watches.size()
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp:     Which is: 0
2022-05-05T00:23:33.441 INFO:tasks.workunit.client.0.smithi136.stdout:      api_watch_notify_pp: [  FAILED  ] LibRadosWatchNotifyECPP.WatchNotify (243732 ms)

Actions #13

Updated by Radoslaw Zarzynski almost 2 years ago

  • Backport changed from pacific to pacific,quincy
Actions #14

Updated by Brad Hubbard over 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 49109
Actions

Also available in: Atom PDF