Project

General

Profile

Bug #910

Multi-MDS Ceph does not pass fsstress

Added by Greg Farnum over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Following on from #859, which handled the single-MDS bugs.

Hopefully we'll just have the one new bug, otherwise this bug will serve as an umbrella for the new ones we pick up on.

mds/Server.cc: In function 'void Server::handle_slave_rename_prep(MDRequest*)', in thread '0x7f07bbe41710'
mds/Server.cc: 5187: FAILED assert(r == 0)
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 2: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 3: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 4: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 5: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 6: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 8: (()+0x68ba) [0x7f07be4968ba]
 9: (clone()+0x6d) [0x7f07bd12b02d]
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 2: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 3: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 4: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 5: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 6: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 8: (()+0x68ba) [0x7f07be4968ba]
 9: (clone()+0x6d) [0x7f07bd12b02d]
*** Caught signal (Aborted) **
 in thread 0x7f07bbe41710
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: ./cmds() [0x7180ac]
 2: (()+0xef60) [0x7f07be49ef60]
 3: (gsignal()+0x35) [0x7f07bd08e165]
 4: (abort()+0x180) [0x7f07bd090f70]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f07bd921dc5]
 6: (()+0xcb166) [0x7f07bd920166]
 7: (()+0xcb193) [0x7f07bd920193]
 8: (()+0xcb28e) [0x7f07bd92028e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6fed43]
 10: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 11: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 12: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 13: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 14: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 15: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 17: (()+0x68ba) [0x7f07be4968ba]
 18: (clone()+0x6d) [0x7f07bd12b02d]

Based on the log output we're (not) getting, it looks like path_traverse is failing out in one of the "return -ESTALE" paths. :/


Subtasks

Tasks #916: fsstress results in failed CDentry assertResolvedGreg Farnum

Tasks #921: Snaprealm issueResolvedGreg Farnum

fs - Tasks #922: fsstress: Request ping-pongs when dentry and inode auth are separateResolvedGreg Farnum

fs - Tasks #923: Waiter does not get woken in fsstressResolvedGreg Farnum

fs - Tasks #928: Assert failure on replica: has auth pins in _logged_slave_renameResolvedGreg Farnum

fs - Tasks #934: auth_unpin assert fail in Locker::xlock_finishResolvedGreg Farnum

fs - Tasks #973: Dir failing to freezeResolvedGreg Farnum

fs - Tasks #1002: Assert failure in Locker::handle_file_lockResolvedGreg Farnum

Tasks #1005: xlock is not unpinning during rename across MDSesResolvedGreg Farnum

fs - Tasks #1039: cfuse: requests max_size from non-auth MDSResolvedGreg Farnum


Related issues

Related to Ceph - Bug #859: Ceph does not pass fsstress Resolved 03/04/2011

History

#1 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.25.2 to v0.25.3

#2 Updated by Greg Farnum over 9 years ago

  • Status changed from New to In Progress

Think we got the current issue in commit:4ced40f227ce818bdcd99ad0017c7e1ee864688d but this is obviously turning up other bugs too. :(

#3 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.25.3 to v0.27
  • translation missing: en.field_position set to 326

#4 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_story_points changed from 2 to 5
  • translation missing: en.field_position deleted (326)
  • translation missing: en.field_position set to 326

#5 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_story_points changed from 5 to 8
  • translation missing: en.field_position deleted (326)
  • translation missing: en.field_position set to 326

#6 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (327)
  • translation missing: en.field_position set to 339

#7 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (337)
  • translation missing: en.field_position set to 338

#8 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (337)
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 607

#9 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (609)
  • translation missing: en.field_position set to 601

#10 Updated by Greg Farnum over 9 years ago

I pushed a bunch of my work on this in, but am dropping work on this for a bit to look at the rstat issue.

#11 Updated by Greg Farnum over 9 years ago

Okay, been pecking away at this and I think I'm now down to one bug that Sage says he's fixed in his branch, and #1002. Or at least those are the only bugs I've seen that I haven't fixed yet.

#12 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.27 to v0.28

#13 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (609)
  • translation missing: en.field_position set to 48

#14 Updated by Greg Farnum over 9 years ago

  • translation missing: en.field_position deleted (45)
  • translation missing: en.field_position set to 641

#15 Updated by Greg Farnum over 9 years ago

  • translation missing: en.field_position deleted (641)
  • translation missing: en.field_position set to 24

#16 Updated by Greg Farnum over 9 years ago

  • Status changed from In Progress to Resolved

I've been unable to break this under cfuse in master all day.

#17 Updated by Greg Farnum over 9 years ago

  • Status changed from Resolved to In Progress

Ran across at least one issue again....

#18 Updated by Greg Farnum over 9 years ago

  • Status changed from In Progress to Resolved

Haven't seen any new issues!

Also available in: Atom PDF