Project

General

Profile

Actions

Bug #38498

closed

common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

Added by lin zhou about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, guys

So far, there have been 10 osd service exit because of this error.
the error messages are all the same.

2019-02-27 17:14:59.757146 7f89925ff700 0 -- 10.191.175.15:6886/192803 >> 10.191.175.49:6833/188731 pipe(0x55ebba819400 sd=741 :6886 s=0 pgs=0 cs=0 l=0 c=0x55ebbb8ba900).accept connect_seq 3912 vs existing 3911 state standby
2019-02-27 17:15:05.858802 7f89d9856700 -1 common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7f89d9856700 time 2019-02-27 17:15:05.806607
common/Thread.cc: 160: FAILED assert(ret 0)

ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x55eb7a849e12]
2: (Thread::create(char const*, unsigned long)+0xba) [0x55eb7a82c14a]
3: (SimpleMessenger::add_accept_pipe(int)+0x6f) [0x55eb7a8203ef]
4: (Accepter::entry()+0x379) [0x55eb7a8f3ee9]
5: (()+0x8064) [0x7f89ecf76064]
6: (clone()+0x6d) [0x7f89eb07762d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
10000> 2019-02-27 17:14:50.999276 7f893e811700 1 - 10.191.175.15:0/192803 < osd.850 10.191.175.46:6837/190855 6953447 ==== osd_ping(ping_reply e17846 stamp 2019-02-27 17:14:50.995043) v3 ==== 2004+0+0 (3980167553 0 0) 0x55eba12b7400 con 0x55eb96ada600

detail logs in the attachment.

when I restart these osd services, it looks works well. But I do not know if it will happen in the other osds.
And I can not find any error log in the system except the following dmesg info:

[三 1月 30 08:14:11 2019] megasas: Command pool (fusion) empty!
[三 1月 30 08:14:11 2019] Couldn't build MFI pass thru cmd
[三 1月 30 08:14:11 2019] Couldn't issue MFI pass thru cmd
[三 1月 30 08:14:11 2019] megasas: Command pool empty!
[三 1月 30 08:14:11 2019] megasas: Failed to get a cmd packet
[三 1月 30 08:14:11 2019] megasas: Command pool empty!
[三 1月 30 08:14:11 2019] megasas: Failed to get a cmd packet
[三 1月 30 08:14:11 2019] megasas: Command pool empty!
[三 1月 30 08:14:11 2019] megasas: Failed to get a cmd packet
[三 1月 30 08:14:11 2019] megasas: Command pool (fusion) empty!
[三 1月 30 08:14:11 2019] megasas: Err returned from build_and_issue_cmd
[三 1月 30 08:14:11 2019] megasas: Command pool (fusion) empty!

this cluster only used aas rbd cluster,ceph status is below:
root@cld-osd5-44:~# ceph -s
cluster 2bec9425-ea5f-4a48-b56a-fe88e126bced
health HEALTH_WARN
noout flag(s) set
monmap e1: 3 mons at {a=10.191.175.249:6789/0,b=10.191.175.250:6789/0,c=10.191.175.251:6789/0}
election epoch 26, quorum 0,1,2 a,b,c
osdmap e17856: 1080 osds: 1080 up, 1080 in
flags noout,sortbitwise,require_jewel_osds
pgmap v25160475: 90112 pgs, 3 pools, 43911 GB data, 17618 kobjects
139 TB used, 1579 TB / 1718 TB avail
90108 active+clean
3 active+clean+scrubbing+deep
1 active+clean+scrubbing
client io 107 MB/s rd, 212 MB/s wr, 1621 op/s rd, 7555 op/s wr


Files

ceph-osd.97.log.tar.xz (270 KB) ceph-osd.97.log.tar.xz lin zhou, 02/27/2019 10:25 AM
Actions #1

Updated by lin zhou about 5 years ago

sorry, it is my system threads-max reached.

Actions #2

Updated by Zheng Yan about 5 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF