Bug #51820: rbd-nbd: handle SIGTERM gracefully - rbd - Ceph

Actions

Copy link

Bug #51820

open

rbd-nbd: handle SIGTERM gracefully

Added by Prasanna Kumar Kalever over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v17.0.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

rbd-nbd handle SIGTERM, but we have noticed that the mount point is entering into Read-Only when the rbd-nbd process is terminated.

The use case is a bit special with ceph-csi here, where the rbd-nbd runs inside a container. On a pod deletion to terminate the container gracefully the Kubernetes is sending SIGTERM to rbd-nbd process and waiting for ~30 seconds and after the grace period (30 sec), sends a follow-up SIGKILL as part of the termination sequence.

Here is a request to:
1. Make sure we are handling the SIGTERM gracefully at rbd-nbd (double-check if we are flushing all the data correctly) to avoid any fs corruption.
2. As mentioned in the above case we get ~30 sec for rbd-nbd to handle the pending IO after SIGTERM is sent to the pod, and then encounter a follow-up SIGKILL. Can we give more thoughts on this? how to align to the lifecycle of the pod so that the data is not lost? How can we ensure that the IO is handled in the grace period so that we don't have to go through ungrateful kills?

Related issues: https://github.com/ceph/ceph-csi/issues/2204
Related discussions (were we had good initial discussions and workarounds): https://github.com/ceph/ceph-csi/pull/2298 and https://github.com/ceph/ceph-csi/pull/2313

No data to display

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #51820

rbd-nbd: handle SIGTERM gracefully