Project

General

Profile

Actions

Bug #16118

closed

Thread::try_create(): pthread_create failed with error 11common/Thread.cc

Added by chih wei yu almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We built an OpenStack environment with two controller nodes, two network nodes and two CEPH nodes.
The CEPH configuration was 12 osds per node and found a network node as the third monitor.
One day, we got the error message in syslog when we tried to delete 10 volumes at the same time.
After that, we can only send one request to CEPH on the same time, otherwise those requests will failed.
How can I fix this problem?

The CEPH.conf:
[global]
fsid = 8964eed8-0e19-4a49-804b-af77af5f7243
mon_initial_members = network14, ceph-113, ceph-114
mon_host = 10.2.2.104,10.2.2.113,10.2.2.114
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
log_file = /var/log/ceph/$cluster-$name.log
log_to_stderr = false
op_thread = 10

The error message:
May 24 09:51:52 ceph-113 cinder-volume13855: Thread::try_create(): pthread_create failed with error 112016-05-24 09:51:52.020 13888 DEBUG cinder.volume.drivers.rbd [req-dda2ad7b-be14-47bc-b59c-68763e699ed2 OpenStack-admin f787ce94530445deb691961ac581902c - - -] deleting rbd volume volume-5d21cf93-d024-4247-97d1-b0f827a08e89 delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:718
May 24 09:51:52 ceph-113 cinder-volume13855: 2016-05-24 09:51:52.022 13888 DEBUG cinder.volume.drivers.rbd [req-48d5c8cb-0c20-4d4d-8715-f1c2db6d955e OpenStack-admin f787ce94530445deb691961ac581902c - - -] opening connection to ceph cluster (timeout=-1). connect_to_rados /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:322
May 24 09:51:52 ceph-113 cinder-volume13855: Thread::try_create(): pthread_create failed with error 11Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb365313700 time 2016-05-24 09:51:52.015839
May 24 09:51:52 ceph-113 cinder-volume13855: common/Thread.cc: 160: FAILED assert(ret 0)
May 24 09:51:52 ceph-113 cinder-volume[13855]: ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
May 24 09:51:52 ceph-113 cinder-volume[13855]: 1: (()+0x16ad90) [0x7fb6d9efdd90]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 2: (()+0x18e31a) [0x7fb6d9f2131a]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 3: (()+0x333432) [0x7fb6da0c6432]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 4: (()+0x3352c4) [0x7fb6da0c82c4]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 5: (()+0x33893d) [0x7fb6da0cb93d]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 6: (()+0x76fa) [0x7fb6f04746fa]
May 24 09:51:52 ceph-113 cinder-volume[13855]: 7: (clone()+0x6d) [0x7fb6f01aab5d]
May 24 09:51:52 ceph-113 cinder-volume[13855]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
===============================
May 24 09:54:17 ceph-113 cinder-volume13855: Thread::try_create(): pthread_create failed with error 11Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb367fff700 time 2016-05-24 09:54:17.780670
May 24 09:54:17 ceph-113 cinder-volume13855: common/Thread.cc: 160: FAILED assert(ret 0)
May 24 09:54:17 ceph-113 cinder-volume[13855]: ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
May 24 09:54:17 ceph-113 cinder-volume[13855]: 1: (()+0x16ad90) [0x7fb6d9efdd90]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 2: (()+0x18e31a) [0x7fb6d9f2131a]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 3: (()+0x33dc2f) [0x7fb6da0d0c2f]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 4: (()+0x33e39b) [0x7fb6da0d139b]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 5: (()+0xccfd4) [0x7fb6d9e5ffd4]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 6: (()+0xcdc87) [0x7fb6d9e60c87]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 7: (()+0xd84df) [0x7fb6d9e6b4df]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 8: (()+0xd875e) [0x7fb6d9e6b75e]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 9: (()+0xa184b) [0x7fb6d9e3484b]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 10: (librados::IoCtx::aio_remove(std::
_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::AioCompletion*)+0x4e) [0x7fb6d9df0b5e]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 11: (()+0x140cd8) [0x7fb6ce4e9cd8]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 12: (()+0x7cfe9) [0x7fb6ce425fe9]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 13: (()+0x7d18f) [0x7fb6ce42618f]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 14: (()+0x140888) [0x7fb6ce4e9888]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 15: (()+0x95b3d) [0x7fb6d9e28b3d]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 16: (()+0x80779) [0x7fb6d9e13779]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 17: (()+0x16a10e) [0x7fb6d9efd10e]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 18: (()+0x76fa) [0x7fb6f04746fa]
May 24 09:54:17 ceph-113 cinder-volume[13855]: 19: (clone()+0x6d) [0x7fb6f01aab5d]
May 24 09:54:17 ceph-113 cinder-volume[13855]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 24 09:54:17 ceph-113 cinder-volume[13855]: common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb37cff9700 time 2016-05-24 09:54:17.782000
May 24 09:54:17 ceph-113 cinder-volume[13855]: common/Thread.cc: 160: FAILED assert(ret 0)

Actions #1

Updated by Brad Hubbard almost 8 years ago

Try bumping up kernel.pid_max and use https://access.redhat.com/labs/cephpgc/ to check you don't have too many pgs.

At first glance you appear to be running out of threads on this client and this appears to be a duplicate of other such issues.

Actions #2

Updated by chih wei yu almost 8 years ago

The task number is limited by systemd (default value is 512). I increase the TasksMax to infinity and the issue is fixed. Thank you.

Actions #3

Updated by Xiaoxi Chen almost 8 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF