Project

General

Profile

Bug #8495

osd: bad state machine event on backfill request

Added by Dmitry Smirnov almost 10 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Three OSDs crashed together for no apparent reason during routine backfilling/remapping.
Situation: 12 OSDs on 5 hosts; 1 OSD down/in; 1 OSD up/out.

I've decided to replace one OSD so I've used command 'ceph osd out 3'.
Some hours later 3 OSDs crashed all together.

See attached logs.

ceph-osd.0.log.xz (133 KB) Dmitry Smirnov, 05/31/2014 02:52 AM

ceph-osd.1.log.xz (137 KB) Dmitry Smirnov, 05/31/2014 02:52 AM

ceph-osd.6.log.xz (140 KB) Dmitry Smirnov, 05/31/2014 02:52 AM

crushmap (972 Bytes) Dmitry Smirnov, 05/31/2014 02:52 AM


Related issues

Related to Ceph - Bug #7922: osd: multi-backfill reservation does not release on reject Resolved 03/31/2014

History

#1 Updated by Samuel Just almost 10 years ago

Can you reproduce this with
debug osd = 20
debug ms = 1
debug filestore = 20

on all osds and attach all of the logs?

#2 Updated by Sage Weil almost 10 years ago

  • Subject changed from 0.80.1: multiple simultaneious OSD crashes to osd: bad state machine event on backfill request

#3 Updated by Samuel Just almost 10 years ago

Also, is there any chance that this is a mixed cluster?

#4 Updated by Dmitry Smirnov almost 10 years ago

Samuel Just wrote:

Also, is there any chance that this is a mixed cluster?

I don't know what "mixed cluster" is. My cluster configuration is very straightforward -- there are only three replicated pools. All hosts are in one subnet so crush map is flat and simple, etc.

#5 Updated by Sage Weil almost 10 years ago

Dmitry Smirnov wrote:

Samuel Just wrote:

Also, is there any chance that this is a mixed cluster?

I don't know what "mixed cluster" is. My cluster configuration is very straightforward -- there are only three replicated pools. All hosts are in one subnet so crush map is flat and simple, etc.

Sam is referring to mixed versions. So, different versions of ceph-osd daemons participating in the same cluster, vs every single daemon (and client) running the same release.

You can check this with

for f in `ceph osd ls`; do ceph osd metadata $f ; done | grep ceph_version

#6 Updated by Dmitry Smirnov almost 10 years ago

I, see... Nice command by the way, thanks.

All cluster components are v0.80.1.

#7 Updated by Sage Weil over 9 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF