Project

General

Profile

Bug #673

cmon: SimpleMessenger::Pipe::discard_queue

Added by Wido den Hollander over 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On noisy I saw my monitor go down with:

2010-12-26 17:02:48.022439 7f73b827b700 cephx keyserverdata: get_caps: name=client.admin
2010-12-26 17:02:48.022450 7f73b827b700 cephx keyserverdata: get_secret: num of caps=3
2010-12-26 17:02:49.624874 7f73b5f66700 -- [2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0 >> [2a00:f10:113:1:230:48ff:fe8d:a21e]:0/16670 pipe(0x7f73b003e4c0 sd=14 pgs=0 cs=0 l=1).accept replacing existing (lossy) channel (new one lossy=1)
2010-12-26 17:02:49.625393 7f73b827b700 cephx server client.admin: start_session server_challenge f5e52511e0fcade1
2010-12-26 17:02:49.625660 7f73b5e65700 -- [2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0 >> [2a00:f10:113:1:230:48ff:fe8d:a21e]:0/16670 pipe(0x7f73b003fa20 sd=9 pgs=0 cs=0 l=1).accept replacing existing (lossy) channel (new one lossy=1)
2010-12-26 17:02:51.063371 7f73b7a7a700 cephx keyserver: _check_rotating_secrets
./common/Mutex.h: In function 'void Mutex::Lock(bool)':
./common/Mutex.h:118: FAILED assert(r == 0)
 ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632)
 1: /usr/bin/cmon() [0x456cbd]
 2: (SimpleMessenger::Pipe::discard_queue()+0x53) [0x45aaf3]
 3: (SimpleMessenger::Pipe::fail()+0x48) [0x45b448]
 4: (SimpleMessenger::Pipe::fault(bool, bool)+0x14d) [0x45b6cd]
 5: (SimpleMessenger::Pipe::reader()+0x206) [0x467b46]
 6: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4519bd]
 7: (Thread::_entry_func(void*)+0xa) [0x46885a]
 8: (()+0x69ca) [0x7f73ba4719ca]
 9: (clone()+0x6d) [0x7f73b936470d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

When the monitor went down, I had a test daemon running with phprados for a few days already. This is just a simple PHP script with a while(true) loop which does some things with a RADOS pool.

I forgot to run close_pool($pool); after my actions ended, so after some time my script crashed with:

2010-12-27 00:49:54.356977 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:49:54.358626 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:54.360105 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:49:54.361630 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:54.363124 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:49:54.364653 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:54.366149 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:49:54.377674 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:57.025346 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:57.039557 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:49:57.048030 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:49:58.131340 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:50:00.023876 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:50:00.038083 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:50:00.048088 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:50:03.023996 7f7b93fff700 monclient: hunting for new mon
2010-12-27 00:50:03.037463 7f7b9a242700 monclient: hunting for new mon
2010-12-27 00:50:03.037560 7f7b16d31700 -- [2a00:f10:113:1:230:48ff:fe8d:a21e]:0/19791 >> [2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0 pipe(0x7f7b94348a70 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()':
msg/SimpleMessenger.cc:999: FAILED assert(0)
 ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632)
 1: (SimpleMessenger::Pipe::connect()+0x671) [0x7f7b9b2b6f51]
 2: (SimpleMessenger::Pipe::writer()+0x5dc) [0x7f7b9b2b8cbc]
 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x7f7b9b2a1c1d]
 4: (Thread::_entry_func(void*)+0xa) [0x7f7b9b2bd32a]
 5: (()+0x69ca) [0x7f7b9dee09ca]
 6: (clone()+0x6d) [0x7f7b9e99070d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted

I'm not sure if it is related to eachother, but this is what I found.

Logging of my monitor was low, so what I posted up here is everything I got.

History

#1 Updated by Sage Weil about 13 years ago

  • Target version set to v0.24.1

#2 Updated by Sage Weil about 13 years ago

  • Assignee set to Greg Farnum

#3 Updated by Greg Farnum about 13 years ago

  • Status changed from New to 7

I believe this should be fixed by 20593b0d38d5357c89b93fac8c06e2083fa56df9.

#4 Updated by Sage Weil about 13 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF