Actions
Tasks #11847
openOSD crashes under cached cluster benchmark
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Tags:
Reviewed:
Affected Versions:
Pull request ID:
Description
I set up a cluster with around 204 OSDs. During continuous benchmarking (set up cache tier, move around hosts in crushmap, wait for HEALTH_OK, tear down cache, loop) several OSDs go down. I checked on the OSD hosts one of the logs:
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-osd() [0xac51f2] 2: (()+0xf130) [0x7fe049204130] 3: (gsignal()+0x37) [0x7fe047c1e5d7] 4: (abort()+0x148) [0x7fe047c1fcc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fe0485229b5] 6: (()+0x5e926) [0x7fe048520926] 7: (()+0x5e953) [0x7fe048520953] 8: (()+0x5eb73) [0x7fe048520b73] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbc53ea] 10: (Thread::create(unsigned long)+0x8a) [0xba93ba] 11: (Pipe::accept()+0x3883) [0xca6663] 12: (Pipe::reader()+0x1a1f) [0xcaa11f] 13: (Pipe::Reader::entry()+0xd) [0xcacd5d] 14: (()+0x7df5) [0x7fe0491fcdf5] 15: (clone()+0x6d) [0x7fe047cdf1ad]
The system:
# lsb_release --all LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.1 (Maipo) Release: 7.1 Codename: Maipo # rpm -qa | grep ^ceph- ceph-0.94.1-0.el7.x86_64 ceph-common-0.94.1-0.el7.x86_64 ceph-radosgw-0.94.1-0.el7.x86_64
ObjDump: https://drive.google.com/open?id=0B93VwrIsrOpHZzZMdGt4WTFCY2s&authuser=0
Updated by Loïc Dachary almost 9 years ago
- Project changed from Stable releases to Ceph
Updated by Haomai Wang almost 9 years ago
I guess it should be os thread limit. You need to increase thread limit for osd
Updated by Kefu Chai almost 9 years ago
yeah, i have the same guess as Haomai. Mark, it looks like a dup of #10988 . probably we should find a way to throttle the usage of thread #.
Actions