Project

General

Profile

Bug #434

mds: clustered mds pjd failures

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
10/16/2010
Due date:
10/16/2010
% Done:

0%

Estimated time:
1.00 h
Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description


../pjd-fstest-20080816/tests/chmod/03.......ok                               
../pjd-fstest-20080816/tests/chmod/04.......ok                               
../pjd-fstest-20080816/tests/chmod/05.......FAILED tests 5-6, 10-11          
        Failed 4/14 tests, 71.43% okay
../pjd-fstest-20080816/tests/chmod/06.......ok                               
../pjd-fstest-20080816/tests/chmod/07.......FAILED tests 5-6, 8, 11          
        Failed 4/14 tests, 71.43% okay
../pjd-fstest-20080816/tests/chmod/08.......ok                               
../pjd-fstest-20080816/tests/chmod/09.......ok                               
../pjd-fstest-20080816/tests/chmod/10.......ok                               
../pjd-fstest-20080816/tests/chmod/11.......ok                               
../pjd-fstest-20080816/tests/chown/00.......FAILED test 96                   
        Failed 1/171 tests, 99.42% okay
../pjd-fstest-20080816/tests/chown/01.......ok                               
../pjd-fstest-20080816/tests/chown/02.......ok                               
../pjd-fstest-20080816/tests/chown/03.......ok                               
../pjd-fstest-20080816/tests/chown/04.......ok                               
../pjd-fstest-20080816/tests/chown/05.......FAILED tests 5-6, 10-12          
        Failed 5/15 tests, 66.67% okay
../pjd-fstest-20080816/tests/chown/06.......ok                               
../pjd-fstest-20080816/tests/chown/07.......ok                               
../pjd-fstest-20080816/tests/chown/08.......ok                               

and from another run
../pjd-fstest-20080816/tests/chmod/04.......ok                               
../pjd-fstest-20080816/tests/chmod/05.......FAILED tests 5-6, 10-11          
        Failed 4/14 tests, 71.43% okay
../pjd-fstest-20080816/tests/chmod/06.......ok                               
../pjd-fstest-20080816/tests/chmod/07.......FAILED tests 5-6, 8, 11          
        Failed 4/14 tests, 71.43% okay
../pjd-fstest-20080816/tests/chmod/08.......ok                               

History

#1 Updated by Greg Farnum over 8 years ago

  • Status changed from New to In Progress
  • Assignee changed from Sage Weil to Greg Farnum

Looking at this now.

#2 Updated by Greg Farnum over 8 years ago

To reproduce, you need to turn on mds thrashing (mds thrash exports = 1 in ceph.conf).
However, I've yet to get these errors since so far I've been hitting different crashes and assert failures. Been fixing them as I go.
I think I'm nearing the end, though, as #472 is popping up a lot and it's getting harder to observe other crashes.

#3 Updated by Sage Weil over 8 years ago

  • Target version changed from v0.22 to v0.23

#4 Updated by Greg Farnum over 8 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Sage has taken over the clustered MDS stuff for now, so here's the bug!

#5 Updated by Sage Weil over 8 years ago

Just saw this again:

../pjd-fstest-20080816/tests/chmod/06.......ok                               
../pjd-fstest-20080816/tests/chmod/07.......FAILED tests 5-6, 8, 11          
        Failed 4/14 tests, 71.43% okay
../pjd-fstest-20080816/tests/chmod/08.......ok                               
../pjd-fstest-20080816/tests/chmod/09.......ok                               
../pjd-fstest-20080816/tests/chmod/10.......ok                               
../pjd-fstest-20080816/tests/chmod/11.......ok                               
../pjd-fstest-20080816/tests/chown/00.......FAILED test 97                   
        Failed 1/171 tests, 99.42% okay
../pjd-fstest-20080816/tests/chown/01.......ok                               
../pjd-fstest-20080816/tests/chown/02.......ok                               

and a bit later,
../pjd-fstest-20080816/tests/chown/05.......FAILED tests 5-6, 10-12          
        Failed 5/15 tests, 66.67% okay

#6 Updated by Sage Weil over 8 years ago

  • Status changed from In Progress to Resolved

this was a kclient problem caused by bad uid/gid in resent requests. fixed by commit:cb4276cca4695670916a82e359f2e3776f0a9138

#7 Updated by Sage Weil over 8 years ago

  • Project changed from Ceph to Linux kernel client
  • Category deleted (1)
  • Target version deleted (v0.23)

#8 Updated by Sage Weil over 8 years ago

  • Target version set to v2.6.37
  • Estimated time set to 1.00 h

#9 Updated by Sage Weil over 8 years ago

a few more fixes here on inode updates version check and mtime.

Also available in: Atom PDF