Bug #1145
closedmonitor assert fails due to ENOSPC
0%
Description
I put the monitor storage on a boot disk that was already 100% full, which is a user error on my part, but I'm not sure that the monitor should assert fail on this error, as it seems to hang the rest of the cluster. I'm using 3 monitor nodes in this case.
2011-06-07 14:18:55.928703 7ff6e6942700 store(/data//mon.beta) MonitorStore::put_int: failed to write to '/data//mon.beta/election_epoch.new': error 28: No space left on device
../../src/mon/MonitorStore.cc: In function 'void MonitorStore::put_int(version_t, const char*, const char*, bool)', in thread '0x7ff6e6942700'
../../src/mon/MonitorStore.cc: 210: FAILED assert(0)
ceph version (commit:)
1: (MonitorStore::put_int(unsigned long, char const*, char const*, bool)+0x445) [0x4e8b55]
2: (Elector::bump_epoch(unsigned int)+0x50) [0x4e3c90]
3: (Elector::handle_victory(MMonElection*)+0x148) [0x4e4258]
4: (Elector::dispatch(Message*)+0x6bc) [0x4e59bc]
5: (Monitor::_ms_dispatch(Message*)+0xc12) [0x46ffe2]
6: (Monitor::ms_dispatch(Message*)+0x7d) [0x479efd]
7: (SimpleMessenger::dispatch_entry()+0x7ea) [0x45193a]
8: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x446d4c]
9: (()+0x6d8c) [0x7ff6e95f4d8c]
10: (clone()+0x6d) [0x7ff6e8b9f04d]
ceph version (commit:)
Updated by Sage Weil almost 13 years ago
- Status changed from New to Won't Fix
Hi Sam-
This is done deliberately so that the system doesn't continue thinking it wrote something when it didn't. Someday we will hopefully fail in a more graceful way (nice message to error log, no core dump) but beyond that there isn't much we can do if the disk is full without getting too crazy!
If you no longer have a majority of cmon's running, the system should hang (by design). (If you're seeing a hang with a majority still running, that's something else!)