Project

General

Profile

Bug #2110

osdc/Journaler.cc: 360: FAILED assert(r >= 0)

Added by Matthew Roy almost 12 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Assert in MDS. This cluster was running a CephFS home directory workload with one active MDS and one MDS in standby replay. mds.b is designated as the standby, but may have been active since the MDSes had both been individually restarted recently.

./mds.a.log-osdc/Journaler.cc: In function 'void Journaler::_finish_write_head(int, Journaler::Header&, Context*)' thread 7f299df90700 time 2012-02-26 14:39:40.295642
./mds.a.log:osdc/Journaler.cc: 360: FAILED assert(r >= 0)
./mds.a.log- ceph version 0.42.2 (commit:732f3ec94e39d458230b7728b2a936d431e19322)
./mds.a.log- 1: (Journaler::_finish_write_head(int, Journaler::Header&, Context*)+0x1e1) [0x6a2271]
./mds.a.log- 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11b7) [0x688587]
./mds.a.log- 3: (MDS::handle_core_message(Message*)+0x987) [0x4c49a7]
./mds.a.log- 4: (MDS::_dispatch(Message*)+0x2f) [0x4c4b3f]
./mds.a.log- 5: (MDS::ms_dispatch(Message*)+0x70) [0x4c6280]
./mds.a.log- 6: (SimpleMessenger::dispatch_entry()+0x783) [0x720d83]
./mds.a.log- 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4a409c]
./mds.a.log- 8: (()+0x7efc) [0x7f29a1a5aefc]
./mds.a.log- 9: (clone()+0x6d) [0x7f29a028f89d]
./mds.a.log- ceph version 0.42.2 (commit:732f3ec94e39d458230b7728b2a936d431e19322)
./mds.a.log- 1: (Journaler::_finish_write_head(int, Journaler::Header&, Context*)+0x1e1) [0x6a2271]
./mds.a.log- 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11b7) [0x688587]
./mds.a.log- 3: (MDS::handle_core_message(Message*)+0x987) [0x4c49a7]
./mds.a.log- 4: (MDS::_dispatch(Message*)+0x2f) [0x4c4b3f]
./mds.a.log- 5: (MDS::ms_dispatch(Message*)+0x70) [0x4c6280]
./mds.a.log- 6: (SimpleMessenger::dispatch_entry()+0x783) [0x720d83]
./mds.a.log- 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4a409c]
./mds.a.log- 8: (()+0x7efc) [0x7f29a1a5aefc]
./mds.a.log- 9: (clone()+0x6d) [0x7f29a028f89d]
./mds.a.log-*** Caught signal (Aborted) **
./mds.a.log- in thread 7f299df90700
./mds.a.log- ceph version 0.42.2 (commit:732f3ec94e39d458230b7728b2a936d431e19322)
./mds.a.log- 1: /usr/bin/ceph-mds() [0x79b0d6]
./mds.a.log- 2: (()+0x10060) [0x7f29a1a63060]
./mds.a.log- 3: (gsignal()+0x35) [0x7f29a01e43a5]
./mds.a.log- 4: (abort()+0x17b) [0x7f29a01e7b0b]
./mds.a.log- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f29a0aa2d7d]
./mds.a.log- 6: (()+0xb9f26) [0x7f29a0aa0f26]
./mds.a.log- 7: (()+0xb9f53) [0x7f29a0aa0f53]
./mds.a.log- 8: (()+0xba04e) [0x7f29a0aa104e]
./mds.a.log: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x200) [0x7384f0]
./mds.a.log- 10: (Journaler::_finish_write_head(int, Journaler::Header&, Context*)+0x1e1) [0x6a2271]
./mds.a.log- 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11b7) [0x688587]
./mds.a.log- 12: (MDS::handle_core_message(Message*)+0x987) [0x4c49a7]
./mds.a.log- 13: (MDS::_dispatch(Message*)+0x2f) [0x4c4b3f]
./mds.a.log- 14: (MDS::ms_dispatch(Message*)+0x70) [0x4c6280]
./mds.a.log- 15: (SimpleMessenger::dispatch_entry()+0x783) [0x720d83]
./mds.a.log- 16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4a409c]
./mds.a.log- 17: (()+0x7efc) [0x7f29a1a5aefc]
./mds.a.log- 18: (clone()+0x6d) [0x7f29a028f89d]

mds.a.log.grep - MDS log for 300 lines on either side of the assert (75.6 KB) Matthew Roy, 02/27/2012 11:40 AM

logsNearMDSAAssert1439.grep - Lines from other log files around the same time (1.04 MB) Matthew Roy, 02/27/2012 11:40 AM

core.mdsAssert1439.gz - Core dump for crash. (49.7 MB) Matthew Roy, 02/27/2012 12:00 PM

History

#1 Updated by Sage Weil almost 12 years ago

Do you have a core file? I'm curious what the value of 'r' is.

#2 Updated by Matthew Roy almost 12 years ago

Sage Weil wrote:

Do you have a core file? I'm curious what the value of 'r' is.

Attached. Probably. (datetime matches, I didn't make the naming change suggested on the wiki yet)

#3 Updated by Sage Weil almost 12 years ago

can you attach ceph-mds too? or better yet, fire up gdb ceph-mds core and print out the value of r from that frame. (I've had poor luck making gdb give me anything useful in a mismatched environment.) we can help with that in the #ceph irc channel...

#4 Updated by Sage Weil almost 12 years ago

  • Status changed from New to Duplicate

#5 Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Also available in: Atom PDF