Project

General

Profile

Actions

Bug #1981

closed

pthread_create failed with error 11: common/Thread.cc: 140: FAILED assert(ret == 0)

Added by James Oakley about 12 years ago. Updated about 12 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just upgraded one of my boxes from 0.39 to 0.40, and I am getting this shortly after starting the osds:

Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)', in thread '7fa4e985d700'
common/Thread.cc: 140: FAILED assert(ret == 0)
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: (Thread::create(unsigned long)+0x8a) [0x5c69ca]
 2: (SimpleMessenger::Pipe::connect()+0x2aa0) [0x5bdce0]
 3: (SimpleMessenger::Pipe::writer()+0x456) [0x5be7d6]
 4: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x4b171d]
 5: (()+0x7f05) [0x7fa685db4f05]
 6: (clone()+0x6d) [0x7fa684d6f63d]
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: (Thread::create(unsigned long)+0x8a) [0x5c69ca]
 2: (SimpleMessenger::Pipe::connect()+0x2aa0) [0x5bdce0]
 3: (SimpleMessenger::Pipe::writer()+0x456) [0x5be7d6]
 4: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x4b171d]
 5: (()+0x7f05) [0x7fa685db4f05]
 6: (clone()+0x6d) [0x7fa684d6f63d]
*** Caught signal (Aborted) **
 in thread 7fa4e985d700
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: /usr/bin/ceph-osd() [0x5fab26]
 2: (()+0xfd00) [0x7fa685dbcd00]
 3: (gsignal()+0x35) [0x7fa684ccad95]
 4: (abort()+0x17b) [0x7fa684ccc2ab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa685550f2d]
 6: (()+0xbd0d6) [0x7fa68554f0d6]
 7: (()+0xbd103) [0x7fa68554f103]
 8: (()+0xbd1fe) [0x7fa68554f1fe]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x193) [0x5cd003]
 10: (Thread::create(unsigned long)+0x8a) [0x5c69ca]
 11: (SimpleMessenger::Pipe::connect()+0x2aa0) [0x5bdce0]
 12: (SimpleMessenger::Pipe::writer()+0x456) [0x5be7d6]
 13: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x4b171d]
 14: (()+0x7f05) [0x7fa685db4f05]
 15: (clone()+0x6d) [0x7fa684d6f63d]

A 33G core file is left in / and the system only has 16G of RAM.


Files

osd.0.log-2012-01-24.gz (6.74 MB) osd.0.log-2012-01-24.gz OSD log James Oakley, 01/24/2012 11:44 AM
osd.4.log-2012-01-24.gz (2.4 MB) osd.4.log-2012-01-24.gz OSD 4 log James Oakley, 01/24/2012 04:59 PM
osd.5.log-2012-01-24.gz (1.96 MB) osd.5.log-2012-01-24.gz OSD 5 log James Oakley, 01/24/2012 04:59 PM
Actions #1

Updated by Greg Farnum about 12 years ago

A 33GB core file is...very large! Did you have any logging enabled that might let us see what happened?

When you say you upgraded one of your boxes, does that mean that you left the rest of the cluster on 0.39?

Actions #2

Updated by Dirk Meister about 12 years ago

Greg Farnum wrote:

A 33GB core file is...very large! Did you have any logging enabled that might let us see what happened?

When you say you upgraded one of your boxes, does that mean that you left the rest of the cluster on 0.39?

Hello, there are not so many reasons for pthread_create to fail. One is that there is not enough memory. Given the size of the core file, may be even the swap space ran out. The question would then be where is all the memory gone? However, this should be visible in the core file.

Actions #3

Updated by Greg Farnum about 12 years ago

Dirk Meister wrote:

Greg Farnum wrote:

A 33GB core file is...very large! Did you have any logging enabled that might let us see what happened?

When you say you upgraded one of your boxes, does that mean that you left the rest of the cluster on 0.39?

Hello, there are not so many reasons for pthread_create to fail. One is that there is not enough memory. Given the size of the core file, may be even the swap space ran out. The question would then be where is all the memory gone? However, this should be visible in the core file.

Yes indeed. Probably the OSD somehow got stuck "looping" and that will be easier to pull out of logs than a core, though. :)

Actions #4

Updated by James Oakley about 12 years ago

I am attaching the logs for today. I did shut this box down yesterday to upgrade it, but due to build system delays, I upgraded and brought it back up today.

The rest of the cluster is still on 0.39. Is that not an appropriate upgrade strategy? The release notes didn't indicate that this would be problematic, and I couldn't find anything describing the "right" way to upgrade a cluster in the documentation or mailing lists.

Actions #5

Updated by Greg Farnum about 12 years ago

That log contains enough to get started on; yesterday's won't matter. If you could make sure that your logs from the other daemons will be available for a couple days they might help too (not sure yet). :)

In terms of upgrading, our goal is that you can do what you like but we don't test it very well right now (we're doing some work on setting that up right now, actually). The safest bet is to turn off the daemons, upgrade everybody at once, and turn on the monitors first.

Actions #6

Updated by Greg Farnum about 12 years ago

Actually James, could you tell me what actions you took that resulted in this log? It looks like:
started OSD at 10:00:38, log entries end at 10:01:14
started OSD again at 10:17, it stopped for some reason (there's just about no logging from this run)
started OSD again at 10:23:30, logging ends at 10:24:01.

My guess is that each of these crashed and the logging got lost due to the extremely high RAM usage, but I'd prefer to guess about as little as possible. ;)

Actions #7

Updated by James Oakley about 12 years ago

Greg Farnum wrote:

Actually James, could you tell me what actions you took that resulted in this log? It looks like:
started OSD at 10:00:38, log entries end at 10:01:14

The machine was rebooted after the upgrade, this is the first run on startup.

started OSD again at 10:17, it stopped for some reason (there's just about no logging from this run)

The OSDs on this particular machine had gone down and I found this in the mon log:

2012-01-24 10:06:42.048445 7fb1128a2700 log [INF] : osd.0 out (down for 304.654334)                                                                                                
2012-01-24 10:06:42.048500 7fb1128a2700 log [INF] : osd.1 out (down for 304.654331)                                                                                                

I restarted via the init script at 10:17 and waited a bit, so I checked with ceph -s:

2012-01-24 10:22:05.473085    pg v107096: 1188 pgs: 1188 active+clean; 26711 MB data, 67702 MB used, 5514 GB / 5589 GB avail
2012-01-24 10:22:05.477203   mds e52: 1/1/1 up {0=1=up:active}, 1 up:standby
2012-01-24 10:22:05.477259   osd e156: 6 osds: 4 up, 4 in
2012-01-24 10:22:05.477399   log 2012-01-24 10:17:55.676184 mon.1 192.168.41.51:6789/0 48 : [INF] mon.1@1 won leader election with quorum 1,2
2012-01-24 10:22:05.477536   mon e1: 3 mons at {0=192.168.41.50:6789/0,1=192.168.41.51:6789/0,2=192.168.41.52:6789/0}

Then with init script:

[/etc/ceph/fetch_config /tmp/fetched.ceph.conf.27114]
=== mon.0 === 
mon.0: not running.
=== mds.0 === 
mds.0: not running.
=== osd.0 === 
osd.0: not running.
=== osd.1 === 
osd.1: not running.

started OSD again at 10:23:30, logging ends at 10:24:01.

I started again and watched with ceph -w:

2012-01-24 10:23:50.851870    pg v107105: 1188 pgs: 129 active, 1028 active+clean, 31 peering; 26711 MB data, 91055 MB used, 8281 GB / 8383 GB avail; 4707/85014 degraded (5.537%); 1/42507 unfound (0.002%)
2012-01-24 10:23:50.855026   mds e53: 1/1/1 up {0=1=up:active}, 2 up:standby
2012-01-24 10:23:50.855062   osd e161: 6 osds: 6 up, 6 in
2012-01-24 10:23:50.855158   log 2012-01-24 10:23:34.192750 mon.0 192.168.41.50:6789/0 6 : [INF] mds.? 192.168.41.50:6800/27496 up:boot
2012-01-24 10:23:50.855248   mon e1: 3 mons at {0=192.168.41.50:6789/0,1=192.168.41.51:6789/0,2=192.168.41.52:6789/0}
2012-01-24 10:23:51.206285    pg v107106: 1188 pgs: 129 active, 1028 active+clean, 31 peering; 26711 MB data, 91056 MB used, 8281 GB / 8383 GB avail; 4707/85014 degraded (5.537%); 1/42507 unfound (0.002%)
2012-01-24 10:23:53.460226    pg v107107: 1188 pgs: 129 active, 1028 active+clean, 31 peering; 26711 MB data, 91051 MB used, 8281 GB / 8383 GB avail; 4707/85014 degraded (5.537%); 1/42507 unfound (0.002%)
2012-01-24 10:23:56.206886    pg v107108: 1188 pgs: 129 active, 1028 active+clean, 31 peering; 26711 MB data, 91046 MB used, 8281 GB / 8383 GB avail; 4707/85014 degraded (5.537%); 1/42507 unfound (0.002%)
2012-01-24 10:24:23.619394   log 2012-01-24 10:24:23.562049 mon.0 192.168.41.50:6789/0 7 : [INF] osd.0 192.168.41.50:6801/27594 failed (by osd.4 192.168.41.52:6801/3175)
2012-01-24 10:24:23.619394   log 2012-01-24 10:24:23.562279 mon.0 192.168.41.50:6789/0 8 : [INF] osd.1 192.168.41.50:6804/27661 failed (by osd.4 192.168.41.52:6801/3175)
2012-01-24 10:24:23.619394   log 2012-01-24 10:24:23.595168 mon.0 192.168.41.50:6789/0 9 : [INF] osd.0 192.168.41.50:6801/27594 failed (by osd.3 192.168.41.51:6804/3146)
2012-01-24 10:24:23.619394   log 2012-01-24 10:24:23.595244 mon.0 192.168.41.50:6789/0 10 : [INF] osd.1 192.168.41.50:6804/27661 failed (by osd.3 192.168.41.51:6804/3146)
2012-01-24 10:24:23.745746   osd e162: 6 osds: 4 up, 6 in
2012-01-24 10:24:23.763239    pg v107109: 1188 pgs: 129 active, 1028 active+clean, 31 peering; 26711 MB data, 91046 MB used, 8281 GB / 8383 GB avail; 4707/85014 degraded (5.537%); 1/42507 unfound (0.002%)
2012-01-24 10:24:24.624234   log 2012-01-24 10:24:23.655294 mon.0 192.168.41.50:6789/0 11 : [INF] osd.0 192.168.41.50:6801/27594 failed (by osd.2 192.168.41.51:6801/3062)
2012-01-24 10:24:24.624234   log 2012-01-24 10:24:23.655396 mon.0 192.168.41.50:6789/0 12 : [INF] osd.1 192.168.41.50:6804/27661 failed (by osd.2 192.168.41.51:6801/3062)
2012-01-24 10:24:24.747043   osd e163: 6 osds: 4 up, 6 in

My guess is that each of these crashed and the logging got lost due to the extremely high RAM usage, but I'd prefer to guess about as little as possible. ;)

I have the default logging for OSDs at this time, I was running with debug settings before, but it was filling my /var too quickly.

Actions #8

Updated by Greg Farnum about 12 years ago

Okay, I see the bug which is the immediate cause of the OOM1, but I haven't yet tracked down the actual triggering bug. Could you post the corresponding log for your osd on host 192.168.41.52?

[1]: Another OSD is repeatedly trying to connect, but is failing to do so (our seqs are matched but I win the tiebreaker, so he gets a wait) and we don't clean up the freshly-created Pipe. Creating several thousand of these (>6k for one OSD, and there's more than one doing this) uses up a lot of memory that never gets cleaned up.

Updated by James Oakley about 12 years ago

Greg Farnum wrote:

Okay, I see the bug which is the immediate cause of the OOM1, but I haven't yet tracked down the actual triggering bug. Could you post the corresponding log for your osd on host 192.168.41.52?

I have 2 OSDs on that host. I'm attaching both logs

Actions #10

Updated by Greg Farnum about 12 years ago

  • Status changed from New to 4

Okay, that's a cross-version message incompatibility. You should be able to resolve your issue by just upgrading the rest of your daemons, and the OOM bug is in the queue. Let me know if upgrading doesn't fix it!

Actions #11

Updated by James Oakley about 12 years ago

Greg Farnum wrote:

Okay, that's a cross-version message incompatibility. You should be able to resolve your issue by just upgrading the rest of your daemons, and the OOM bug is in the queue. Let me know if upgrading doesn't fix it!

I just upgraded the other nodes and everything is running perfectly now. Thanks!

Actions #12

Updated by Greg Farnum about 12 years ago

  • Status changed from 4 to Won't Fix

Glad to hear it!

Actions

Also available in: Atom PDF