Bug #12254
closedkrbd image watch
0%
Description
there are 3 osd3, we create a rbd image named "image_xxx" in rbd pool.
using "rbd map image_xxx -p rbd", so rados listwatchers image_xxx -p rbd would return something tell us there are a watcher for image_xxx.
so we stop the primary for object image_xxx.rbd, such as osd.a.
so rados listwatchers image_xxx -p rbd would return someting like above.
And we restart osd.a and wait for about 30s, rados listwatchers image_xxx -p rbd would return emtpy.
Summary, if we stop primary osd.a and start it again after some seconds, image watcher would be delete by osd.a.
Updated by Zheng Yan almost 9 years ago
- Assignee set to Ilya Dryomov
__kick_linger_request() is triggered by osd_reset(). let's assume the rbd head is mapped to [osd0, osd1].
When stopping osd0, the kclient calls osd_reset(). osd_reset() calls __kick_linger_request() to re-send the watch request to osd1.
When restarting osd0, the connection between kclient and osd1 is not reset. So the kclient does not send watch to osd0. This will cause the corresponding watcher on osd0 timeout.
Updated by Ilya Dryomov almost 9 years ago
- Status changed from New to Closed
Yeah, that's because we are not subscribing to new osdmaps in all the cases we should. If you do any I/O after you bring osd0 back up we will get a new osdmap and reestablish the watch on osd0. If your client is idle, the same will happen after userspace resets the connection after idle timeout. I'm working on #9779 which will resolve this - linking this ticket as another test case.