Bug #41196
openosd: there is no client app running but a watcher remains in OSD
0%
Description
We are running ceph 13.2.5 on Centos Linux 7.5.1804, and got an issue about the watcher.
For the image fAEYXC in pool D6Z1QvgX(which is a replicated pool and got only one OSD, and the pool'size is set to one), we can see there is a watcher registered by client.85217. After all the client apps which may opened the image are not running for more than one day, the watcher still remains.
[root@b01 3233621]# rbd status D6Z1QvgX/fAEYXC
Watchers:
watcher=10.0.100.181:0/2636644228 client.85217 cookie=93848830076800
We know that the OSD will remove the watchers if it dose not receive CEPH_OSD_WATCH_OP_PING message after a 30s. Also, we tried add the client into blacklist, but it did not work.
[root@b01 3233621]# ceph osd blacklist add "10.0.100.181:0/2636644228" 20;
blacklisting 10.0.100.181:0/2636644228 until 2019-08-12 14:38:03.437125 (20 sec)
[root@b01 3233621]# ceph osd blacklist ls
listed 1 entries
10.0.100.181:0/2636644228 2019-08-12 14:38:03.437125
[root@b01 3233621]# rbd status D6Z1QvgX/fAEYXC
Watchers:
watcher=10.0.100.181:0/2636644228 client.85217 cookie=93848830076800
The watcher information return by command 'rbd status' is from obs.oi.watchers, while the PrimaryLogPG::check_blacklisted_obc_watchers function checks watcher in struct ObjectContext.watchers. I promted the 'debug_osd' to 30 and found that there are no watcher in ObjectContext.watchers.
Anyone have any idea?
Updated by Josh Durgin over 4 years ago
Hmm, could we be failing to update the oi or the cache of it when removing the watcher?
Updated by yu feng over 4 years ago
Josh Durgin wrote:
Hmm, could we be failing to update the oi or the cache of it when removing the watcher?
I restarted the OSD, and in the log file, it can be seen that the watcher was populated into ObjectContext, and then after 30s it's removed. So I think the problem is that it failed to update the oi.