Project

General

Profile

Actions

Bug #4753

closed

mds/Locker.cc: 4167: FAILED assert(0)

Added by Denis kaganovich about 11 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Every mds crashed after some startup checks: "mds/Locker.cc: 4167: FAILED assert(0)":

mds/Locker.cc: 4167: FAILED assert(0)

ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: (Locker::scatter_mix(ScatterLock*, bool*)+0x1cb) [0x6303fb]
2: (Locker::file_eval(ScatterLock*, bool*)+0x6e5) [0x6346b5]
3: (Locker::eval(CInode*, int, bool)+0x8d5) [0x63cd75]
4: (MDCache::reissue_all_caps()+0x333) [0x58b8b3]
5: (MDS::recovery_done()+0xca) [0x4c1a0a]
6: (MDS::handle_mds_map(MMDSMap*)+0x29ab) [0x4d372b]
7: (MDS::handle_core_message(Message*)+0xb93) [0x4d4873]
8: (MDS::_dispatch(Message*)+0x36) [0x4d4a46]
9: (MDS::ms_dispatch(Message*)+0x19b) [0x4d6a3b]
10: (DispatchQueue::entry()+0x319) [0x8b56b9]
11: (DispatchQueue::DispatchThread::entry()+0xd) [0x80fd1d]
12: (()+0x84d8) [0x7fc0dcacd4d8]
13: (clone()+0x6d) [0x7fc0dafc0e8d]

Full level 10 log here: http://mahatma.bspu.unibel.by/download/transit/ceph-mds.4.log.gz

Actions #1

Updated by Greg Farnum about 11 years ago

file_eval is trying to move ifile from "scan" to "mixed" in order to serve up the client caps, and scatter_mix doesn't think that's a valid transition. It's probably correct as we need to determine inode state before doing other things with them?

Actions #2

Updated by Sage Weil about 11 years ago

yeah, that transition doesn't make sense. i think it should do nothing in the scan state..

Actions #3

Updated by Greg Farnum about 11 years ago

  • Priority changed from Normal to High

You mean file_eval should just short-circuit if it's scanning? That seems like the most sensible place for it, but I'm not quite sure, and there are a few other states and transitions around it so I'd need to spend some time diving in to figure it out.

However, this is a single-MDS issue on recovery which doesn't go away with a reboot, so High it is!

Actions #4

Updated by Denis kaganovich about 11 years ago

Additional: I resolve it runtime, changing assert(0) to some lock (IMHO first in this case) on one node and found forgotten ctdbd process (trying to lock own file) on one of machines. I kill him and restart all mds with assert(0) again, now all good.

Actions #5

Updated by Greg Farnum about 11 years ago

  • Project changed from Ceph to CephFS
  • Category set to 47
Actions #6

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Resolved

fixed this in commit:482733e9603e47a3a427b17bfb9b9189dacd5109

Actions #7

Updated by Greg Farnum almost 8 years ago

  • Component(FS) MDS added
Actions

Also available in: Atom PDF