Project

General

Profile

Actions

Bug #1145

closed

monitor assert fails due to ENOSPC

Added by Sam Lang almost 13 years ago. Updated almost 13 years ago.

Status:
Won't Fix
Priority:
Low
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I put the monitor storage on a boot disk that was already 100% full, which is a user error on my part, but I'm not sure that the monitor should assert fail on this error, as it seems to hang the rest of the cluster. I'm using 3 monitor nodes in this case.

2011-06-07 14:18:55.928703 7ff6e6942700 store(/data//mon.beta) MonitorStore::put_int: failed to write to '/data//mon.beta/election_epoch.new': error 28: No space left on device
../../src/mon/MonitorStore.cc: In function 'void MonitorStore::put_int(version_t, const char*, const char*, bool)', in thread '0x7ff6e6942700'
../../src/mon/MonitorStore.cc: 210: FAILED assert(0)
ceph version (commit:)
1: (MonitorStore::put_int(unsigned long, char const*, char const*, bool)+0x445) [0x4e8b55]
2: (Elector::bump_epoch(unsigned int)+0x50) [0x4e3c90]
3: (Elector::handle_victory(MMonElection*)+0x148) [0x4e4258]
4: (Elector::dispatch(Message*)+0x6bc) [0x4e59bc]
5: (Monitor::_ms_dispatch(Message*)+0xc12) [0x46ffe2]
6: (Monitor::ms_dispatch(Message*)+0x7d) [0x479efd]
7: (SimpleMessenger::dispatch_entry()+0x7ea) [0x45193a]
8: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x446d4c]
9: (()+0x6d8c) [0x7ff6e95f4d8c]
10: (clone()+0x6d) [0x7ff6e8b9f04d]
ceph version (commit:)

Actions #1

Updated by Sage Weil almost 13 years ago

  • Status changed from New to Won't Fix

Hi Sam-

This is done deliberately so that the system doesn't continue thinking it wrote something when it didn't. Someday we will hopefully fail in a more graceful way (nice message to error log, no core dump) but beyond that there isn't much we can do if the disk is full without getting too crazy!

If you no longer have a majority of cmon's running, the system should hang (by design). (If you're seeing a hang with a majority still running, that's something else!)

Actions

Also available in: Atom PDF