Project

General

Profile

Actions

Bug #47562

open

RBD image is stuck after iSCSI disconnection

Added by Fabio Durieux Lopes over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a problem I'm having a hard time finding out the cause. I have an RBD image being used by a Windows Server 2019. On production, after connecting and writing successfully, SOMETIMES this image gets stuck right after disconnecting said server from iSCSI interface. By stuck I mean this image cannot be used inside ceph. I can't remove it from my iSCSI target, I can't resize it, I can't enable/discable features nor mount it even if it is on a Linux box using rbd.

Three times I left a robocopy syncing some data (write to ceph) that, after some days, caused my Windows Server to freeze and forced me to reboot it.

I was not able to reproduce this using an "identical" test VM (also Windows 2019 Server), but I managed to reproduce multiple times on my production server.

"ceph -s" says HEALTH_OK. Nothing really happening in my cluster as it is still in testing phase, so I have only one client at a time.

Linux is CentOS 7, Linux ceph02 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Ceph is:
ceph -v
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)

I have some tcmu-runner log here:
https://pastebin.com/ShLmDdE2


Files

ceph.log-20200918.gz (891 KB) ceph.log-20200918.gz Fabio Durieux Lopes, 09/23/2020 07:58 PM
ceph.log-20200917.gz (808 KB) ceph.log-20200917.gz Fabio Durieux Lopes, 09/23/2020 07:58 PM
Actions

Also available in: Atom PDF