Project

General

Profile

Actions

Bug #2316

closed

rbd: restart of OSD leeds to stale qem-VM's with "ceph version 0.45-207-g3053e47"

Added by Oliver Francke about 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
librbd
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

in my current test-setup all four VM's are started with rbd_cache parm. After all VM's are started and began their stress-testing, some of them got stale, no ping, no VNC, no communication to qemu-process at all. So I think, the new caching needs some more robustness in relation to qemu?
All I do on the node is some "kill -SIGTOP, -9 ..." to the ceph-osd process. And let enough time to settle them down again after restart.
What I recognized is, that the time from "stale+... degraded ( 50%)" to clean take way longer than before with ceph-0.44.

Even if all OSD are "active+clean;" again, no more response from the VM's, they need to be killed and next starts leads to fsck with errors and so on...
Logfiles attached.

Regards,

Oliver.


Files

ceph.osd.0-stale.log.bz2 (24 MB) ceph.osd.0-stale.log.bz2 Oliver Francke, 04/19/2012 03:19 AM
ceph.osd.1-stale.log.bz2 (24.1 MB) ceph.osd.1-stale.log.bz2 Oliver Francke, 04/19/2012 03:19 AM
hung_vms.tar (56.6 MB) hung_vms.tar Josh Durgin, 05/07/2012 06:05 PM
906_test.log (16.5 MB) 906_test.log Josh Durgin, 05/11/2012 09:03 PM

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #2393: objecter: dropping messages (old connection being used)Duplicate05/09/2012

Actions
Actions

Also available in: Atom PDF