Project

General

Profile

Actions

Bug #4159

closed

after setting pool size to zero, osd's segv and apparently can't recover

Added by Dan Mick about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While it's probably the case that the various tools should disallow setting pool size to 0, currently it's possible; however, if you do so, bad things happen. I think it caused the osd's to both die, but it certainly prevents them from starting up with a segv in

0x000000000123d782 in pg_interval_t::check_new_interval (
    old_acting=std::vector of length 0, capacity 0, 
    new_acting=std::vector of length 2, capacity 2 = {...}, 
    old_up=std::vector of length 0, capacity 0, 
    new_up=std::vector of length 2, capacity 2 = {...}, 
    same_interval_since=377, last_epoch_clean=375, osdmap=
    std::tr1::shared_ptr (count 13, weak 1) 0x20d2b60, 
    lastmap=std::tr1::shared_ptr (count 1033, weak 1) 0x20d2680, pool_id=2, 
    pgid=..., past_intervals=0x2976130, out=0x0) at osd/osd_types.cc:1602

I suspect there are some places that a "if !v.empty()" should be added.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #4160: it should be illegal to set pool size to 0ResolvedJoao Eduardo Luis02/15/2013

Actions
Actions #1

Updated by Sage Weil about 11 years ago

I wasn't able to reproduce this. Dan, what was the sequence to reproduce?

Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Dan Mick
Actions #3

Updated by Sage Weil about 11 years ago

  • Priority changed from Immediate to High
Actions #4

Updated by Dan Mick about 11 years ago

reproduction: ceph osd set pool rbd size zero actually didn't kill the OSDs; ./rados -p rbd ls hung, unsurprisingly, but OSDs were still alive. Then I set the size back to 2, and that seemed
to kill them; setting min_size to 1 and attempting restart, they die in check_new_interval.

Actions #5

Updated by Sage Weil about 11 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF