Project

General

Profile

Actions

Bug #51307

closed

LibRadosWatchNotify.Watch2Delete fails

Added by Sage Weil almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-06-20T20:11:53.203 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: [ RUN      ] LibRadosWatchNotify.Watch2Delete
2021-06-20T20:11:53.204 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: waiting up to 300 for disconnect notification ...
2021-06-20T20:11:53.204 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: watch_notify2_test_errcb cookie 94538226300544 err -107
2021-06-20T20:11:53.204 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-5376-ge1b58156/rpm/el8/BUILD/ceph-17.0.0-5376-ge1b58156/src/test/librados/watch_notify.cc:165: Failure
2021-06-20T20:11:53.204 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: Expected equality of these values:
2021-06-20T20:11:53.204 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify:   -107
2021-06-20T20:11:53.205 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify:   rados_watch_check(ioctx, handle)
2021-06-20T20:11:53.205 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify:     Which is: -2
2021-06-20T20:11:53.205 INFO:tasks.workunit.client.0.smithi079.stdout:         api_watch_notify: [  FAILED  ] LibRadosWatchNotify.Watch2Delete (69145 ms)

/a/sage-2021-06-20_15:30:30-rados-wip-sage-testing-2021-06-20-1007-distro-basic-smithi/6181813

Related issues 2 (0 open2 closed)

Related to RADOS - Bug #50042: rados/test.sh: api_watch_notify failuresResolvedNitzan Mordechai

Actions
Copied to RADOS - Backport #55021: quincy: LibRadosWatchNotify.Watch2Delete failsResolvedNitzan MordechaiActions
Actions #1

Updated by Neha Ojha almost 3 years ago

  • Related to Bug #50042: rados/test.sh: api_watch_notify failures added
Actions #3

Updated by Neha Ojha over 2 years ago

/a/yuriw-2021-07-27_17:19:39-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6297201

Actions #4

Updated by Sridhar Seshasayee over 2 years ago

/a/yuriw-2021-12-03_15:27:18-rados-wip-yuri11-testing-2021-12-02-1451-distro-default-smithi/6542889

2021-12-03T19:18:27.907 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: [ RUN      ] LibRadosWatchNotify.Watch2Delete
2021-12-03T19:18:27.907 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: waiting up to 300 for disconnect notification ...
2021-12-03T19:18:27.907 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: watch_notify2_test_errcb cookie 94744431035984 err -107
2021-12-03T19:18:27.907 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-9426-ge65eeb2b/rpm/el8/BUILD/ceph-17.0.0-9426-ge65eeb2b/src/test/librados/watch_notify.cc:165: Failure
2021-12-03T19:18:27.908 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: Expected equality of these values:
2021-12-03T19:18:27.908 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify:   -107
2021-12-03T19:18:27.908 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify:   rados_watch_check(ioctx, handle)
2021-12-03T19:18:27.908 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify:     Which is: -2
2021-12-03T19:18:27.908 INFO:tasks.workunit.client.0.smithi058.stdout:         api_watch_notify: [  FAILED  ] LibRadosWatchNotify.Watch2Delete (143755 ms)
Actions #5

Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687338

Actions #6

Updated by Nitzan Mordechai about 2 years ago

Laura Flores wrote:

/a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687338

i think we need to have the following change in watch_notify.cc
int rados_watch_check_err = rados_watch_check(ioctx, handle);
// We may hit ENOENT due to socket failure injection and a forced reconnect
EXPECT_TRUE(rados_watch_check_err -ENOTCONN || rados_watch_check_err -ENOENT)

i couldn't find any socket failure injection, but reconnect is present around the time of the error

Actions #7

Updated by Radoslaw Zarzynski about 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Nitzan Mordechai
Actions #8

Updated by Nitzan Mordechai about 2 years ago

In that case it was not injection socket failure, it was:

2022-02-16T09:56:22.598+0000 15af4700 1 -- [v2:172.21.15.124:3300/0,v1:172.21.15.124:6789/0] >> conn(0x1aa6a930 msgr2=0x18b77000 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe

that will cause the reconnect and return of -102
I'll add my suggestion to the code

Actions #9

Updated by Nitzan Mordechai about 2 years ago

  • Status changed from In Progress to Fix Under Review
Actions #10

Updated by Neha Ojha about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to quincy
Actions #11

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55021: quincy: LibRadosWatchNotify.Watch2Delete fails added
Actions #12

Updated by Nitzan Mordechai about 2 years ago

  • Pull request ID set to 45366
Actions #13

Updated by Neha Ojha about 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF