Project

General

Profile

Actions

Bug #9264

closed

mds: occasionally log segments can't trim

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

it happened with latest lab mds restart yesterday; we have the logs (for another 6 days or so)

root@mira040:~# ceph mds dump 618
dumped mdsmap epoch 618
epoch   618
flags   0
created 2013-08-14 03:19:58.297184
modified        2014-08-27 10:05:56.352173
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    467
last_failure_osd_epoch  354013
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap}
max_mds 1
in      0
up      {0=1716410}
failed
stopped
data_pools      0
metadata_pool   1
inline_data     disabled
1716410:        10.214.134.10:6800/33114 'burnupi21' mds.0.38 up:replay seq 1

Related issues 1 (0 open1 closed)

Related to CephFS - Bug #8962: kcephfs: client does not release revoked capResolved07/29/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

2014-08-27 00:54:32.894038 7f0a35232700  6 mds.0.journal LogSegment(1566182478170).try_to_expire


is the segment that gets stuck:
2014-08-27 02:34:08.364557 7f0a35232700 10 mds.0.log trim 172 / 30 segments, 215078 / -1 events, 1 (1046) expiring, 140 (175941) expired
2014-08-27 02:34:08.364561 7f0a35232700  5 mds.0.log trim already expiring segment 1566182478170, 1046 events
2014-08-27 02:34:08.364565 7f0a35232700  5 mds.0.log trim already expired segment 1566186672593, 1109 events
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to 12
2014-08-27 00:54:32.901022 7f0a35232700 10 mds.0.journal try_to_expire waiting for nest flush on [inode 605 [...2,head] ~mds0/stray5/ auth v25125019 ap=3+1 f(v13 m2014-08-27 00:38:56.778628 443=418+25) n(v566 rc2014-08-27 00:38:56.778628 b395656374 a1 444=418+26) (inest lock->sync w=1 dirty) (ifile lock w=2) (iversion lock) | d
irtyscattered=1 lock=2 dirfrag=1 stickydirs=1 stray=1 dirtyrstat=0 dirty=1 waiter=1 authpin=1 0x10c98e00]
2014-08-27 00:54:32.901039 7f0a35232700 10 mds.0.locker scatter_nudge auth, waiting for stable (inest lock->sync w=1 dirty) on [inode 605 [...2,head] ~mds0/stray5/ auth v25125019 ap=3+1 f(v13 m2014-08-27 00:38:56.778628 443=418+25) n(v566 rc2014-08-27 00:38:56.778628 b395656374 a1 444=418+26) (inest lock->sync w=1 dirty) (ifile
 lock w=2) (iversion lock) | dirtyscattered=1 lock=2 dirfrag=1 stickydirs=1 stray=1 dirtyrstat=0 dirty=1 waiter=1 authpin=1 0x10c98e00]
2014-08-27 00:54:32.901054 7f0a35232700 10 mds.0.cache.ino(605) add_waiter tag 400000000000 0x1e6f3230 !ambig 1 !frozen 1 !freezing 1

it's waiting for the wrlock to release.. which is held by a request that is stuck revoking caps due to bug #8962.

Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF