Project

General

Profile

Actions

Bug #1158

closed

Unfinished freeze hangs fsstress

Added by Greg Farnum almost 13 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've got a freeze that doesn't finish blocking fsstress. Logs in kai:~gregf/logs/fsstress/freeze_not_finishing.

Haven't diagnosed it any further than that.

Actions #1

Updated by Greg Farnum almost 13 years ago

Although actually based on how long fsstress is taking on this disk maybe nothing was blocked and it was just going slow. Ugh.

Actions #2

Updated by Greg Farnum almost 13 years ago

  • Assignee set to Greg Farnum
Actions #3

Updated by Greg Farnum almost 13 years ago

I managed to reproduce this on my mds_rename branch.

Actions #4

Updated by Greg Farnum almost 13 years ago

Well, it's a nested auth pin.

gregf@kai:~/logs/fsstress/freeze_not_finishing2$ grep -o "dir(10000000002) adjust_nested_auth_pins [-0-9]*/[-0-9]*" out/mds.as | sort | uniq -c
    658 dir(10000000002) adjust_nested_auth_pins -1/-1
    200 dir(10000000002) adjust_nested_auth_pins -1/0
    201 dir(10000000002) adjust_nested_auth_pins 1/0
    658 dir(10000000002) adjust_nested_auth_pins 1/1

Further analysis forthcoming, when I figure out a good way to do it...

Actions #5

Updated by Sage Weil almost 13 years ago

If you can reproduce, you can enable the auth pin set define in mdstypes.h, which tracks who the pinners are.

//#define MDS_AUTHPIN_SET // define me for debugging auth pin leaks

Actions #6

Updated by Greg Farnum almost 13 years ago

Unfortunately, adjust_nested_auth_pins never sees the person who actually grabbed it. The others print out the grabbing pointer so it's easy enough to grep | sort | uniq to figure out who's taking and not dropping even without the set being kept in-memory (assuming you've got debugging cranked up).

Actions #7

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.30 to v0.31
Actions #8

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_story_points set to 5
  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 695
Actions #9

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position deleted (695)
  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 700
Actions #10

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position deleted (704)
  • Translation missing: en.field_position set to 714
Actions #11

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.31 to v0.32
Actions #12

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.32 to v0.33
  • Translation missing: en.field_position deleted (739)
  • Translation missing: en.field_position set to 2
Actions #13

Updated by Sage Weil almost 13 years ago

  • Status changed from New to Can't reproduce

FWIW I've hit several of these over the past two weeks and they've all boiled down to unstable locks, usually due to issues with client revocation or cap migration.

Enough has changed I think it makes sense to close it out unless/until we see it on current master.

Actions #14

Updated by Sage Weil almost 13 years ago

  • Target version deleted (v0.33)
Actions #15

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF