Actions
Bug #623
closedMDS: MDSTable::load_2
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On a small test machine I have a Ceph RC cluster running (Which was running a old unstable before), after my upgrade I saw a MDS crash.
I saw:
2010-12-02 13:57:08.952173 7f9d92cbe710 mds0.8 MDS::ms_get_authorizer type=osd 2010-12-02 13:57:08.952242 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045 2010-12-02 13:57:08.952494 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128 2010-12-02 13:57:08.953187 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd2 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128 1 ==== osd_op_reply(5 200.00000000 [read 0~0] = -23 (Too many open files in system)) v1 ==== 98+0+0 (2432835435 0 0) 0x153b1c0 2010-12-02 13:57:09.325890 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6801/1975 1 ==== osd_op_reply(1 mds0_inotable [read 0~0] = -23 (Too many open files in system)) v1 ==== 99+0+0 (1732481521 0 0) 0x153bc40 2010-12-02 13:57:09.325971 7f9d94fc5710 mds0.inotable: load_2 found no table mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)': mds/MDSTable.cc:148: FAILED assert(0) ceph version 0.24~rc (commit:78a14622438addcd5c337c4924cce1f67d053ee9) 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x5be) [0x61582e] 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x674) [0x665bc4] 3: (MDS::_dispatch(Message*)+0x20b4) [0x4ab924] 4: (MDS::ms_dispatch(Message*)+0x6d) [0x4abefd] 5: (SimpleMessenger::dispatch_entry()+0x759) [0x4812c9] 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4790bc] 7: (Thread::_entry_func(void*)+0xa) [0x48d96a] 8: (()+0x69ca) [0x7f9d977299ca] 9: (clone()+0x6d) [0x7f9d966e170d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-23 (Too many open files in system)), that caught my attention, but trying to raise it to 64.000 wouldn't help.
root@noisy:~# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 64000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited root@noisy:~#
The cluster isn't busy at all, and not much data / objects on it:
root@noisy:~# ceph -s 2010-12-02 14:19:04.984055 pg v1008: 792 pgs: 792 active+clean; 5672 MB data, 10782 MB used, 283 GB / 300 GB avail 2010-12-02 14:19:04.986335 mds e29: 1/1/1 up {0=up:replay(laggy or crashed)} 2010-12-02 14:19:04.986376 osd e48: 3 osds: 3 up, 3 in 2010-12-02 14:19:04.986444 log 2010-12-02 14:17:39.411756 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045 50 : [INF] 3.1p1 scrub ok 2010-12-02 14:19:04.986555 class rbd (v1.3 [x86-64]) 2010-12-02 14:19:04.986578 mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0} root@noisy:~#
Is this due to the number of open files?
Actions