Project

General

Profile

Bug #23518

mds: crash when failover

Added by wei jin 12 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
03/30/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multimds
Pull request ID:

Description

2018-03-29 10:25:04.719502 7f5ae5ad2700 -1 /build/ceph-12.2.4/src/mds/MDCache.cc: In function 'void MDCache::handle_cache_rejoin_ack(MMDSCacheRejoin*)' thread 7f5ae5ad2700 time 2018-03-29 10:
25:04.716917
/build/ceph-12.2.4/src/mds/MDCache.cc: 5087: FAILED assert(session)

ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55ba1428d8d2]
2: (MDCache::handle_cache_rejoin_ack(MMDSCacheRejoin*)+0x2422) [0x55ba14071542]
3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x233) [0x55ba1407def3]
4: (MDCache::dispatch(Message*)+0xa5) [0x55ba1407e045]
5: (MDSRank::handle_deferrable_message(Message*)+0x5bc) [0x55ba13f6aecc]
6: (MDSRank::_dispatch(Message*, bool)+0x1db) [0x55ba13f7858b]
7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55ba13f79355]
8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55ba13f62b13]
9: (DispatchQueue::entry()+0x7ca) [0x55ba1458ceda]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55ba143125ad]
11: (()+0x8064) [0x7f5aea8aa064]
12: (clone()+0x6d) [0x7f5ae999562d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues

Related to fs - Bug #23503: mds: crash during pressure test Duplicate 03/29/2018
Copied to fs - Backport #23946: luminous: mds: crash when failover Resolved

History

#1 Updated by Patrick Donnelly 12 months ago

  • Status changed from New to Need More Info

Did you evict the client session during this time?

#2 Updated by wei jin 12 months ago

No. I did nothing.
During pressure test, I ran into two crashes, another one is #23503.

#3 Updated by Patrick Donnelly 12 months ago

Are you still hitting the issue or has it gone away? If so `debug mds = 20` logs would be helpful..

#4 Updated by Patrick Donnelly 12 months ago

  • Category set to Correctness/Safety
  • Target version set to v13.0.0
  • Source set to Community (user)
  • Backport set to luminous
  • Severity changed from 3 - minor to 2 - major
  • Component(FS) MDS added

#5 Updated by Patrick Donnelly 12 months ago

  • Tags set to crash

#6 Updated by Zheng Yan 11 months ago

  • Related to Bug #23503: mds: crash during pressure test added

#7 Updated by Zheng Yan 11 months ago

This one is related to http://tracker.ceph.com/issues/23503. #23503 can explain why session was evicted

#8 Updated by Zheng Yan 11 months ago

  • Status changed from Need More Info to In Progress

#9 Updated by Zheng Yan 11 months ago

  • Assignee set to Zheng Yan

#10 Updated by Zheng Yan 11 months ago

  • Status changed from In Progress to Need Review

#11 Updated by Patrick Donnelly 11 months ago

  • Status changed from Need Review to Pending Backport
  • Tags deleted (crash)
  • Labels (FS) crash, multimds added

#12 Updated by Nathan Cutler 11 months ago

#13 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF