Project

General

Profile

Actions

Bug #623

closed

MDS: MDSTable::load_2

Added by Wido den Hollander over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On a small test machine I have a Ceph RC cluster running (Which was running a old unstable before), after my upgrade I saw a MDS crash.

I saw:

2010-12-02 13:57:08.952173 7f9d92cbe710 mds0.8 MDS::ms_get_authorizer type=osd
2010-12-02 13:57:08.952242 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045
2010-12-02 13:57:08.952494 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128
2010-12-02 13:57:08.953187 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd2 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128 1 ==== osd_op_reply(5 200.00000000 [read 0~0] = -23 (Too many open files in system)) v1 ==== 98+0+0 (2432835435 0 0) 0x153b1c0
2010-12-02 13:57:09.325890 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6801/1975 1 ==== osd_op_reply(1 mds0_inotable [read 0~0] = -23 (Too many open files in system)) v1 ==== 99+0+0 (1732481521 0 0) 0x153bc40
2010-12-02 13:57:09.325971 7f9d94fc5710 mds0.inotable: load_2 found no table
mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)':
mds/MDSTable.cc:148: FAILED assert(0)
 ceph version 0.24~rc (commit:78a14622438addcd5c337c4924cce1f67d053ee9)
 1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x5be) [0x61582e]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x674) [0x665bc4]
 3: (MDS::_dispatch(Message*)+0x20b4) [0x4ab924]
 4: (MDS::ms_dispatch(Message*)+0x6d) [0x4abefd]
 5: (SimpleMessenger::dispatch_entry()+0x759) [0x4812c9]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4790bc]
 7: (Thread::_entry_func(void*)+0xa) [0x48d96a]
 8: (()+0x69ca) [0x7f9d977299ca]
 9: (clone()+0x6d) [0x7f9d966e170d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

-23 (Too many open files in system)), that caught my attention, but trying to raise it to 64.000 wouldn't help.

root@noisy:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 64000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
root@noisy:~# 

The cluster isn't busy at all, and not much data / objects on it:

root@noisy:~# ceph -s
2010-12-02 14:19:04.984055    pg v1008: 792 pgs: 792 active+clean; 5672 MB data, 10782 MB used, 283 GB / 300 GB avail
2010-12-02 14:19:04.986335   mds e29: 1/1/1 up {0=up:replay(laggy or crashed)}
2010-12-02 14:19:04.986376   osd e48: 3 osds: 3 up, 3 in
2010-12-02 14:19:04.986444   log 2010-12-02 14:17:39.411756 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045 50 : [INF] 3.1p1 scrub ok
2010-12-02 14:19:04.986555   class rbd (v1.3 [x86-64])
2010-12-02 14:19:04.986578   mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0}
root@noisy:~# 

Is this due to the number of open files?

Actions

Also available in: Atom PDF