Project

General

Profile

Actions

Bug #910

closed

Multi-MDS Ceph does not pass fsstress

Added by Greg Farnum about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Following on from #859, which handled the single-MDS bugs.

Hopefully we'll just have the one new bug, otherwise this bug will serve as an umbrella for the new ones we pick up on.

mds/Server.cc: In function 'void Server::handle_slave_rename_prep(MDRequest*)', in thread '0x7f07bbe41710'
mds/Server.cc: 5187: FAILED assert(r == 0)
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 2: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 3: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 4: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 5: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 6: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 8: (()+0x68ba) [0x7f07be4968ba]
 9: (clone()+0x6d) [0x7f07bd12b02d]
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 2: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 3: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 4: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 5: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 6: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 8: (()+0x68ba) [0x7f07be4968ba]
 9: (clone()+0x6d) [0x7f07bd12b02d]
*** Caught signal (Aborted) **
 in thread 0x7f07bbe41710
 ceph version 0.25-298-g157e6bd (commit:157e6bdc52aac6aa00a84edbd596ff49fe60e086)
 1: ./cmds() [0x7180ac]
 2: (()+0xef60) [0x7f07be49ef60]
 3: (gsignal()+0x35) [0x7f07bd08e165]
 4: (abort()+0x180) [0x7f07bd090f70]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f07bd921dc5]
 6: (()+0xcb166) [0x7f07bd920166]
 7: (()+0xcb193) [0x7f07bd920193]
 8: (()+0xcb28e) [0x7f07bd92028e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6fed43]
 10: (Server::handle_slave_rename_prep(MDRequest*)+0x1b9c) [0x51fe2c]
 11: (Server::dispatch_slave_request(MDRequest*)+0x32b) [0x5252eb]
 12: (Server::handle_slave_request(MMDSSlaveRequest*)+0x148) [0x525718]
 13: (MDS::_dispatch(Message*)+0x2508) [0x4b9b28]
 14: (MDS::ms_dispatch(Message*)+0x5b) [0x4ba3fb]
 15: (SimpleMessenger::dispatch_entry()+0x89a) [0x48955a]
 16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48222c]
 17: (()+0x68ba) [0x7f07be4968ba]
 18: (clone()+0x6d) [0x7f07bd12b02d]

Based on the log output we're (not) getting, it looks like path_traverse is failing out in one of the "return -ESTALE" paths. :/


Subtasks 10 (0 open10 closed)

Tasks #916: fsstress results in failed CDentry assertResolvedGreg Farnum03/22/2011

Actions
Tasks #921: Snaprealm issueResolvedGreg Farnum03/23/2011

Actions
CephFS - Tasks #922: fsstress: Request ping-pongs when dentry and inode auth are separateResolvedGreg Farnum03/23/2011

Actions
CephFS - Tasks #923: Waiter does not get woken in fsstressResolvedGreg Farnum03/23/2011

Actions
CephFS - Tasks #928: Assert failure on replica: has auth pins in _logged_slave_renameResolvedGreg Farnum03/23/2011

Actions
CephFS - Tasks #934: auth_unpin assert fail in Locker::xlock_finishResolvedGreg Farnum03/24/2011

Actions
CephFS - Tasks #973: Dir failing to freezeResolvedGreg Farnum04/04/2011

Actions
CephFS - Tasks #1002: Assert failure in Locker::handle_file_lockResolvedGreg Farnum04/13/2011

Actions
Tasks #1005: xlock is not unpinning during rename across MDSesResolvedGreg Farnum04/15/2011

Actions
CephFS - Tasks #1039: cfuse: requests max_size from non-auth MDSResolvedGreg Farnum05/02/2011

Actions

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #859: Ceph does not pass fsstressResolved03/04/2011

Actions
Actions #1

Updated by Sage Weil about 13 years ago

  • Target version changed from v0.25.2 to v0.25.3
Actions #2

Updated by Greg Farnum about 13 years ago

  • Status changed from New to In Progress

Think we got the current issue in commit:4ced40f227ce818bdcd99ad0017c7e1ee864688d but this is obviously turning up other bugs too. :(

Actions #3

Updated by Sage Weil about 13 years ago

  • Target version changed from v0.25.3 to v0.27
  • Translation missing: en.field_position set to 326
Actions #4

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_story_points changed from 2 to 5
  • Translation missing: en.field_position deleted (326)
  • Translation missing: en.field_position set to 326
Actions #5

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_story_points changed from 5 to 8
  • Translation missing: en.field_position deleted (326)
  • Translation missing: en.field_position set to 326
Actions #6

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (327)
  • Translation missing: en.field_position set to 339
Actions #7

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (337)
  • Translation missing: en.field_position set to 338
Actions #8

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (337)
  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 607
Actions #9

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (609)
  • Translation missing: en.field_position set to 601
Actions #10

Updated by Greg Farnum about 13 years ago

I pushed a bunch of my work on this in, but am dropping work on this for a bit to look at the rstat issue.

Actions #11

Updated by Greg Farnum about 13 years ago

Okay, been pecking away at this and I think I'm now down to one bug that Sage says he's fixed in his branch, and #1002. Or at least those are the only bugs I've seen that I haven't fixed yet.

Actions #12

Updated by Sage Weil about 13 years ago

  • Target version changed from v0.27 to v0.28
Actions #13

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (609)
  • Translation missing: en.field_position set to 48
Actions #14

Updated by Greg Farnum about 13 years ago

  • Translation missing: en.field_position deleted (45)
  • Translation missing: en.field_position set to 641
Actions #15

Updated by Greg Farnum about 13 years ago

  • Translation missing: en.field_position deleted (641)
  • Translation missing: en.field_position set to 24
Actions #16

Updated by Greg Farnum about 13 years ago

  • Status changed from In Progress to Resolved

I've been unable to break this under cfuse in master all day.

Actions #17

Updated by Greg Farnum about 13 years ago

  • Status changed from Resolved to In Progress

Ran across at least one issue again....

Actions #18

Updated by Greg Farnum almost 13 years ago

  • Status changed from In Progress to Resolved

Haven't seen any new issues!

Actions

Also available in: Atom PDF