Project

General

Profile

Actions

Bug #8962

closed

kcephfs: client does not release revoked cap

Added by Sage Weil over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

several instances where the mds tries to revoke a cap (Ls and Fs have been observed so far) and the client doesn't release. this causes requests to hang.

restarting the mds clears it up.


Related issues 2 (0 open2 closed)

Related to CephFS - Bug #8964: kcephfs: client does not resend requests on mds restartResolved07/29/2014

Actions
Related to CephFS - Bug #9264: mds: occasionally log segments can't trimDuplicate08/28/2014

Actions
Actions #1

Updated by Zheng Yan over 9 years ago

I saw similar hang a few weeks ago. In that case, all OSDs were down, the MDS couldn't submit log event.

Actions #2

Updated by Sage Weil over 9 years ago

  • Priority changed from Immediate to Urgent
Actions #3

Updated by Sage Weil over 9 years ago

here is the final state of the directory:

ubuntu@teuthology:/a/zack-2014-07-29_14:05:07-ceph-deploy-firefly-testing-basic-multi/384761$ find remote/plana62/log
remote/plana62/log
remote/plana62/log/ceph-client.admin.10109.log.gz
remote/plana62/log/ceph-client.admin.11178.log.gz
remote/plana62/log/ceph-client.admin.8373.log.gz
remote/plana62/log/ceph-client.admin.8422.log.gz
remote/plana62/log/ceph-client.admin.9230.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.10560.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.10589.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.10648.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.8969.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.8998.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.9056.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.9617.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.9646.log.gz
remote/plana62/log/ceph-client.bootstrap-osd.9707.log.gz
remote/plana62/log/ceph-mon.plana62.log.gz
remote/plana62/log/ceph-osd.0.log.gz
remote/plana62/log/ceph-osd.1.log.gz
remote/plana62/log/ceph-osd.2.log.gz
remote/plana62/log/ceph.log.gz
remote/plana62/log/tmpMXKzyM

and the request that hanged is
30884112        mds0    unlink   #100017f166c/tmpMXKzyM (teuthology-archive/zack-2014-07-29_14:05:07-ceph-deploy-firefly-testing-basic-multi/384761/remote/plana62/log/tmpMXKzyM) 

Actions #4

Updated by Sage Weil over 9 years ago

and the code that did it is in teuthology.git/teuthology/misc.py:

def pull_directory(remote, remotedir, localdir):
    """ 
    Copy a remote directory to a local directory.
    """ 
    log.debug('Transferring archived files from %s:%s to %s',
              remote.shortname, remotedir, localdir)
    if not os.path.exists(localdir):
        os.mkdir(localdir)
    _, local_tarfile = tempfile.mkstemp(dir=localdir)
    remote.get_tar(remotedir, local_tarfile, sudo=True)
    with open(local_tarfile, 'r+') as fb1:
        tar = tarfile.open(mode='r|gz', fileobj=fb1)
        while True:
            ti = tar.next()
            if ti is None:
                break

            if ti.isdir():
                # ignore silently; easier to just create leading dirs below
                pass
            elif ti.isfile():
                sub = safepath.munge(ti.name)
                safepath.makedirs(root=localdir, path=os.path.dirname(sub))
                tar.makefile(ti, targetpath=os.path.join(localdir, sub))
            else:
                if ti.isdev():
                    type_ = 'device'
                elif ti.issym():
                    type_ = 'symlink'
                elif ti.islnk():
                    type_ = 'hard link'
                else:
                    type_ = 'unknown'
                    log.info('Ignoring tar entry: %r type %r', ti.name, type_)
                    continue
    os.remove(local_tarfile)

where the final os.remove() is what hanged

Actions #5

Updated by Zheng Yan over 9 years ago

no clue what happened. please dump the mds cache when it happens next time

Actions #6

Updated by Sage Weil over 9 years ago

Zheng Yan wrote:

no clue what happened. please dump the mds cache when it happens next time

We have a dump, actually... see ~ubuntu/mds.cache.dump

Actions #7

Updated by Zheng Yan over 9 years ago

Sage Weil wrote:

Zheng Yan wrote:

no clue what happened. please dump the mds cache when it happens next time

We have a dump, actually... see ~ubuntu/mds.cache.dump

can't find it on machine teuthology

Actions #8

Updated by Sage Weil over 9 years ago

Zheng Yan wrote:

Sage Weil wrote:

Zheng Yan wrote:

no clue what happened. please dump the mds cache when it happens next time

We have a dump, actually... see ~ubuntu/mds.cache.dump

can't find it on machine teuthology

sorry, mira060:~ubuntu/mds.cache.dump

Actions #9

Updated by Zheng Yan over 9 years ago

[inode 100012f9a68 [2,head] /teuthology-archive/sage-2014-07-17_16:03:19-rados-master-testing-basic-plana/367941/remote/plana41/log/ceph-client.admin.31671.log.gz auth v18862 ap=2+0 s=48 n(v0 b48 1=1+0) (ilink sync->lock) (iversion lock w=1 last_client=317444) caps={204886=pAsXsFscr/pAsLsXsFscr/-@3,317444=p/-@1} | lock=2 caps=1 waiter=1 authpin=1 0x5f623500]

looks suspicious. But no idea why client didn't release Ls cap

Actions #10

Updated by Sage Weil over 9 years ago

another similar hang:

# cat /sys/kernel/debug/ceph/28f7427e-5558-4ffd-ae1a-51ec3042759a.client333715/mdsc
13216547        mds0    create   #100019c52b7/.teuthology.log.swp (teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/.teuthology.log.swp)
13218990        mds0    getattr  #100019c52b7
13220035        mds0    getattr  #100019c52b7
2014-08-03 18:04:55.910700 7f9ad82fe700 10 mds.0.locker wrlock_start (ifile sync) on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=1+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (iversion lock) caps={191425=pAsLsXsFs/-@2,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 authpin=1 0x3b9e5c00]
2014-08-03 18:04:55.910711 7f9ad82fe700  7 mds.0.locker simple_lock on (ifile sync) on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=1+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (iversion lock) caps={191425=pAsLsXsFs/-@2,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 authpin=1 0x3b9e5c00]
2014-08-03 18:04:55.910727 7f9ad82fe700  7 mds.0.locker issue_caps allowed=pAsLsXsFc, xlocker allowed=pAsLsXsFc on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=1+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXsFs/-@2,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 authpin=1 0x3b9e5c00]
2014-08-03 18:04:55.910742 7f9ad82fe700  7 mds.0.locker    sending MClientCaps to client.191425 seq 3 new pending pAsLsXs was pAsLsXsFs
2014-08-03 18:04:55.910748 7f9ad82fe700 10 mds.0.30 send_message_client_counted client.191425 seq 295 client_caps(revoke ino 100019c52b7 408287541 seq 3 caps=pAsLsXs dirty=- wanted=- follows 0 size 0/0 ts 1 mtime 2014-08-01 23:18:03.469804) v4
2014-08-03 18:04:55.910755 7f9ad82fe700  1 -- 10.214.134.10:6800/9063 --> 10.214.137.25:0/3163831945 -- client_caps(revoke ino 100019c52b7 408287541 seq 3 caps=pAsLsXs dirty=- wanted=- follows 0 size 0/0 ts 1 mtime 2014-08-01 23:18:03.469804) v4 -- ?+0 0xcd43c00 con 0x50a5b00
2014-08-03 18:04:55.910768 7f9ad82fe700 10 mds.0.cache.ino(100019c52b7) auth_pin by 0x3b9e63f0 on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=2+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 authpin=1 0x3b9e5c00] now 2+1
2014-08-03 18:04:55.910788 7f9ad82fe700  7 mds.0.locker wrlock_start waiting on (ifile sync->lock) on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=2+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 authpin=1 0x3b9e5c00]
2014-08-03 18:04:55.910801 7f9ad82fe700 10 mds.0.cache.ino(100019c52b7) add_waiter tag 40000000 0x54f86200 !ambig 1 !frozen 1 !freezing 1
2014-08-03 18:04:55.910805 7f9ad82fe700 10 mds.0.locker nudge_log (ifile sync->lock) on [inode 100019c52b7 [...2,head] /teuthology-archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392050/ auth v35450 ap=2+1 f(v0 m2014-08-01 23:18:03.469804 10=8+2) n(v6 rc2014-08-01 23:18:03.469804 b3646332 299=294+5) (isnap sync r=1) (inest lock) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,333715=pAsLsXs/p@5} | request=0 lock=1 dirfrag=1 caps=1 waiter=1 authpin=1 0x3b9e5c00]

debug mds = 10 logs are on burnupi21 (careful, they are very big!)

the above entry is at offset /var/log/ceph/ceph-mds.burnupi21.log byte 395877069361/408914568239 97% (press RETURN)

Actions #11

Updated by Sage Weil over 9 years ago

35139797        mds0    mkdir    #1000207b106/421099 (teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/421099)
35151800        mds0    mkdir    #100020b68d0/422304 (teuthology-archive/sage-2014-08-13_12:02:27-rgw-wip-mds-contexts-testing-basic-mira/422304)

restarted at 2014-08-13 13:22:35.617241, so these ops are a bit before that.

client is 551831

Actions #12

Updated by Sage Weil over 9 years ago

2014-08-13 12:42:09.831606 7ff844faa700  7 mds.0.server dispatch_client_request client_request(client.551831:35139797 mkdir #1000207b106/421099)
...
2014-08-13 12:42:09.832104 7ff844faa700  7 mds.0.locker simple_lock on (ifile sync) on [inode 1000207b106 [...2,head] /teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/ auth v6956066 ap=1+1 f(v0 m2014-08-13 12:41:24.208187 558=0+558) n(v4275 rc2014-08-13 12:42:09.350741 b107169850288 266688=262748+3940) (isnap sync r=1) (inest lock dirty) (iversion lock) caps={191425=pAsLsXsFs/-@2,204886=pAsLsXs/-@1,551831=pAsLsXs/p@72781} | dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=0 authpin=1 0xcb1e2e00]
2014-08-13 12:42:09.832121 7ff844faa700  7 mds.0.locker issue_caps allowed=pAsLsXsFc, xlocker allowed=pAsLsXsFc on [inode 1000207b106 [...2,head] /teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/ auth v6956066 ap=1+1 f(v0 m2014-08-13 12:41:24.208187 558=0+558) n(v4275 rc2014-08-13 12:42:09.350741 b107169850288 266688=262748+3940) (isnap sync r=1) (inest lock dirty) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXsFs/-@2,204886=pAsLsXs/-@1,551831=pAsLsXs/p@72781} | dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=0 authpin=1 0xcb1e2e00]
2014-08-13 12:42:09.832140 7ff844faa700  7 mds.0.locker    sending MClientCaps to client.191425 seq 3 new pending pAsLsXs was pAsLsXsFs
2014-08-13 12:42:09.832145 7ff844faa700 10 mds.0.31 send_message_client_counted client.191425 seq 3297 client_caps(revoke ino 1000207b106 546031007 seq 3 caps=pAsLsXs dirty=- wanted=- follows 0 size 0/0 ts 1 mtime 2014-08-13 12:41:24.208187) v4
2014-08-13 12:42:09.832155 7ff844faa700  1 -- 10.214.134.10:6801/40004 --> 10.214.137.25:0/3163831945 -- client_caps(revoke ino 1000207b106 546031007 seq 3 caps=pAsLsXs dirty=- wanted=- follows 0 size 0/0 ts 1 mtime 2014-08-13 12:41:24.208187) v4 -- ?+0 0x25cb2800 con 0x47929a0
2014-08-13 12:42:09.832172 7ff844faa700 10 mds.0.cache.ino(1000207b106) auth_pin by 0xcb1e35f0 on [inode 1000207b106 [...2,head] /teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/ auth v6956066 ap=2+1 f(v0 m2014-08-13 12:41:24.208187 558=0+558) n(v4275 rc2014-08-13 12:42:09.350741 b107169850288 266688=262748+3940) (isnap sync r=1) (inest lock dirty) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,204886=pAsLsXs/-@1,551831=pAsLsXs/p@72781} | dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=0 authpin=1 0xcb1e2e00] now 2+1
2014-08-13 12:42:09.832192 7ff844faa700  7 mds.0.locker wrlock_start waiting on (ifile sync->lock) on [inode 1000207b106 [...2,head] /teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/ auth v6956066 ap=2+1 f(v0 m2014-08-13 12:41:24.208187 558=0+558) n(v4275 rc2014-08-13 12:42:09.350741 b107169850288 266688=262748+3940) (isnap sync r=1) (inest lock dirty) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,204886=pAsLsXs/-@1,551831=pAsLsXs/p@72781} | dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=0 authpin=1 0xcb1e2e00]
2014-08-13 12:42:09.832211 7ff844faa700 10 mds.0.cache.ino(1000207b106) add_waiter tag 40000000 0x2189d3e0 !ambig 1 !frozen 1 !freezing 1
2014-08-13 12:42:09.832214 7ff844faa700 10 mds.0.locker nudge_log (ifile sync->lock) on [inode 1000207b106 [...2,head] /teuthology-archive/samuelj-2014-08-12_21:17:59-rados-wip-9054-testing-basic-multi/ auth v6956066 ap=2+1 f(v0 m2014-08-13 12:41:24.208187 558=0+558) n(v4275 rc2014-08-13 12:42:09.350741 b107169850288 266688=262748+3940) (isnap sync r=1) (inest lock dirty) (ifile sync->lock) (iversion lock) caps={191425=pAsLsXs/pAsLsXsFs/-@3,204886=pAsLsXs/-@1,551831=pAsLsXs/p@72781} | dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=1 authpin=1 0xcb1e2e00]

...and client isn't releasing the Fs cap.

Actions #13

Updated by Sage Weil over 9 years ago

  • Subject changed from mds: unlink hangs due to inest lock hang in stray dir to kcephfs: client does not release revoked cap
  • Description updated (diff)
  • Category changed from 47 to 53
Actions #14

Updated by Sage Weil over 9 years ago

27898660        mds0    unlink   #10001a2e027/ceph-client.admin.10021.log.gz (teuthology-archive/teuthology-2014-08-02_09:30:01-upgrade:firefly-firefly-testing-basic-vps/394383/remote/ubuntu@vpm114.front.sepia.ceph.com/log/ceph-client.
admin.10021.log.gz)
33236750        mds0    unlink   #10001a5d579/ceph-client.admin.17160.log.gz (teuthology-archive/teuthology-2014-08-03_02:30:01-rados-next-testing-basic-plana/394888/remote/plana79/log/ceph-client.admin.17160.log.gz)
34531973        mds0    unlink   #10002ac81a2/.results.log.swx (teuthology-archive/john-2014-08-26_17:50:53-fs:basic-wip-9152-testing-basic-multi/.results.log.swx)
35586069        mds0    unlink   #10002acf844/tmpS1rm84 (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452015/remote/plana15/log/tmpS1rm84)
35586109        mds0    unlink   #10002acf680/tmphc4wEK (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452057/remote/plana93/tmphc4wEK)
35586150        mds0    unlink   #10002acf8b3/tmpxetA_M (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452124/remote/plana69/log/tmpxetA_M)
35586184        mds0    unlink   #10002acf8d6/tmpuzeuEn (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452120/remote/burnupi38/log/tmpuzeuEn)
35586199        mds0    unlink   #10002acf747/tmpbX2W_s (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452093/remote/burnupi49/tmpbX2W_s)
35586204        mds0    unlink   #10002acf795/tmpItnN8V (teuthology-archive/sage-2014-08-26_16:54:45-smoke-master-testing-basic-multi/453263/remote/plana58/tmpItnN8V)
35586217        mds0    unlink   #10002acf76e/tmpYJrkNt (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452113/remote/mira072/tmpYJrkNt)
35593280        mds0    unlink   #10002acfd27/tmptwtITj (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452108/remote/plana19/log/tmptwtITj)
35593352        mds0    unlink   #10002acfd6c/tmphgMufE (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452163/remote/plana09/log/tmphgMufE)
35593469        mds0    unlink   #10002acfd8e/tmpGvNVyb (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452175/remote/plana68/log/tmpGvNVyb)
35593601        mds0    unlink   #10002acfe01/tmp4W0qqi (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452142/remote/plana49/log/tmp4W0qqi)
35595950        mds0    unlink   #10002ad041b/tmpyxaMjK (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452133/remote/mira094/log/tmpyxaMjK)
35595969        mds0    unlink   #10002ad045e/tmpBsj_I5 (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452206/remote/plana65/tmpBsj_I5)
35596074        mds0    unlink   #10002ad045b/tmp04raza (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452141/remote/plana85/log/tmp04raza)
35601947        mds0    getattr  #10001a2e027
35613016        mds0    unlink   #10002ad1c50/tmpMSzLYm (teuthology-archive/samuelj-2014-08-26_19:30:53-rados-wip-sam-testing-testing-basic-multi/454102/remote/plana18/tmpMSzLYm)
35613047        mds0    unlink   #10002ad2177/tmpivUOro (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452171/remote/plana70/log/tmpivUOro)
35613095        mds0    unlink   #10002ad21a7/tmpwRVfX0 (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452099/remote/plana63/log/tmpwRVfX0)
35613921        mds0    symlink (unsafe) #10002ad21d1/worker.log (teuthology-archive/yehudasa-2014-08-26_09:38:51-rgw-wip-9148-testing-basic-multi/452252/worker.log) /var/lib/teuthworker/archive/worker_logs/worker.multi.4768


restarting now (Wed Aug 27 10:06:26 PDT 2014)
Actions #15

Updated by Sage Weil over 9 years ago

ubuntu@burnupi21:/var/log/ceph$ zgrep :27898660 ceph-mds.burnupi21.log.?.gz
ceph-mds.burnupi21.log.1.gz:2014-08-26 00:12:29.373631 7f0a37337700  1 -- 10.214.134.10:6801/19153 <== client.764753 10.214.137.23:0/3344997032 1803770 ==== client_request(client.764753:27898660 unlink #10001a2e027/ceph-client.admin.10021.log.gz) v2 ==== 248+0+0 (4113235506 0 0) 0x1ab06000 con 0x7eeb420
ceph-mds.burnupi21.log.1.gz:2014-08-26 00:12:29.373641 7f0a37337700  4 mds.0.server handle_client_request client_request(client.764753:27898660 unlink #10001a2e027/ceph-client.admin.10021.log.gz) v2
ceph-mds.burnupi21.log.1.gz:2014-08-26 00:12:29.373645 7f0a37337700  7 mds.0.cache request_start request(client.764753:27898660 cr=0x1ab06000)
ceph-mds.burnupi21.log.1.gz:2014-08-26 00:12:29.373789 7f0a37337700  7 mds.0.server dispatch_client_request client_request(client.764753:27898660 unlink #10001a2e027/ceph-client.admin.10021.log.gz) v2
ceph-mds.burnupi21.log.1.gz:2014-08-26 00:12:29.373829 7f0a37337700 10 mds.0.locker acquire_locks request(client.764753:27898660 cr=0x1ab06000)

Actions #16

Updated by Zheng Yan over 9 years ago

2014-08-26 00:12:29.374924 7f0a37337700 10 mds.0.locker nudge_log (ilink sync->lock) on [inode 10001a2e17b [2,head] /teuthology-archive/teuthology-2014-08-02_09:30:01-upgrade:firefly-firefly-testing-basic-vps/394383/remote/ubuntu@vpm114.front.sepia.ceph.com/log/ceph-client.admin.10021.log.gz auth v3370 ap=2+0 s=48 n(v0 b48 1=1+0) (ilink sync->lock) (iversion lock w=1 last_client=764753) caps={191425=pAsXsFscr/pAsLsXsFscr/-@3,764753=p/-@1} | lock=2 caps=1 waiter=1 authpin=1 0x2a4f3b00]

client.191425 is gitbuilder-archive, which runs 3.13.0-rc3-ceph-00078-g125d725 kernel. Its dmesg has:

[18222160.521420]  front: 0003e510: ff ff ff ff 00 00 00 00 b6 3e 06 00 00 00 00 00  .........>......
[18222160.521421]  front: 0003e520: 00 00 00 00 00 00 00 00 55 0d 00 00 00 00 00 00  ........U.......
[18222160.521423]  front: 0003e530: fc 8f 76 27 00 00 00 00 02 00 00 00 00 00 00 00  ..v'............
[18222160.521424]  front: 0003e540: 01 00 00 00 00 00 00 00 01 00 00 40 00 01 00 00  ...........@....
...

I think this bug should be fixed by "commit 54008399 ceph: preallocate buffer for readdir reply"

Actions #17

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Resolved

old bug; restarted those clients.

Actions #18

Updated by Greg Farnum almost 8 years ago

  • Component(FS) kceph added
Actions

Also available in: Atom PDF