Project

General

Profile

Actions

Bug #41196

open

osd: there is no client app running but a watcher remains in OSD

Added by yu feng over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are running ceph 13.2.5 on Centos Linux 7.5.1804, and got an issue about the watcher.

For the image fAEYXC in pool D6Z1QvgX(which is a replicated pool and got only one OSD, and the pool'size is set to one), we can see there is a watcher registered by client.85217. After all the client apps which may opened the image are not running for more than one day, the watcher still remains.

[root@b01 3233621]# rbd status D6Z1QvgX/fAEYXC
Watchers:
watcher=10.0.100.181:0/2636644228 client.85217 cookie=93848830076800

We know that the OSD will remove the watchers if it dose not receive CEPH_OSD_WATCH_OP_PING message after a 30s. Also, we tried add the client into blacklist, but it did not work.

[root@b01 3233621]# ceph osd blacklist add "10.0.100.181:0/2636644228" 20;
blacklisting 10.0.100.181:0/2636644228 until 2019-08-12 14:38:03.437125 (20 sec)
[root@b01 3233621]# ceph osd blacklist ls
listed 1 entries
10.0.100.181:0/2636644228 2019-08-12 14:38:03.437125

[root@b01 3233621]# rbd status D6Z1QvgX/fAEYXC
Watchers:
watcher=10.0.100.181:0/2636644228 client.85217 cookie=93848830076800

The watcher information return by command 'rbd status' is from obs.oi.watchers, while the PrimaryLogPG::check_blacklisted_obc_watchers function checks watcher in struct ObjectContext.watchers. I promted the 'debug_osd' to 30 and found that there are no watcher in ObjectContext.watchers.

Anyone have any idea?

Actions #1

Updated by Josh Durgin over 4 years ago

Hmm, could we be failing to update the oi or the cache of it when removing the watcher?

Actions #2

Updated by yu feng over 4 years ago

Josh Durgin wrote:

Hmm, could we be failing to update the oi or the cache of it when removing the watcher?

I restarted the OSD, and in the log file, it can be seen that the watcher was populated into ObjectContext, and then after 30s it's removed. So I think the problem is that it failed to update the oi.

Actions

Also available in: Atom PDF