Project

General

Profile

Bug #23714

slow ceph_ll_sync_inode calls after setattr

Added by Jeff Layton about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
04/13/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:

Related issues

Copied to fs - Backport #23802: luminous: slow ceph_ll_sync_inode calls after setattr Resolved

History

#1 Updated by Jeff Layton about 1 year ago

We recently added some calls to ceph_ll_sync_inode in ganesha, to be done after a setattr request. Testing with cthon04, I'm finding that we're seeing reproducible ~5s stalls in some of those calls. Snippet from client log:

2018-04-13 14:22:40.254 7fdb947f8700  3 client.124124 ll_setattrx 0x100000002ad.head = 0
2018-04-13 14:22:40.254 7fdb947f8700 10 client.124124 put_inode on 0x100000002ad.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={2=0} mode=100666 size=0/4194304 btime=2018-04-13 14:22:32.881656 mtime=2018-04-13 14:22:32.881656 ctime=2018-04-13 14:22:40.253499 caps=pLsXsFscr(0=pLsXsFscr) objectset[0x100000002ad ts 0/0 objects 0 dirty_or_tx 0] parents=0x100000002aa.head["file.2"] 0x7fdb6c01b210)
2018-04-13 14:22:40.254 7fdb947f8700 15 inode.put on 0x7fdb6c01b210 0x100000002ad.head now 4
2018-04-13 14:22:40.254 7fdb947f8700  3 client.124124 ll_sync_inode 0x100000002ad.head(faked_ino=0 ref=4 ll_ref=1 cap_refs={} open={2=0} mode=100666 size=0/4194304 btime=2018-04-13 14:22:32.881656 mtime=2018-04-13 14:22:32.881656 ctime=2018-04-13 14:22:40.253499 caps=pLsXsFscr(0=pLsXsFscr) objectset[0x100000002ad ts 0/0 objects 0 dirty_or_tx 0] parents=0x100000002aa.head["file.2"] 0x7fdb6c01b210)
2018-04-13 14:22:40.254 7fdb947f8700  3 client.124124 _fsync on 0x100000002ad.head(faked_ino=0 ref=4 ll_ref=1 cap_refs={} open={2=0} mode=100666 size=0/4194304 btime=2018-04-13 14:22:32.881656 mtime=2018-04-13 14:22:32.881656 ctime=2018-04-13 14:22:40.253499 caps=pLsXsFscr(0=pLsXsFscr) objectset[0x100000002ad ts 0/0 objects 0 dirty_or_tx 0] parents=0x100000002aa.head["file.2"] 0x7fdb6c01b210) (data+metadata)
2018-04-13 14:22:40.254 7fdb947f8700 15 inode.get on 0x7fdb6c01b210 0x100000002ad.head now 5
2018-04-13 14:22:40.254 7fdb947f8700 10 client.124124 _flush 0x100000002ad.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={2=0} mode=100666 size=0/4194304 btime=2018-04-13 14:22:32.881656 mtime=2018-04-13 14:22:32.881656 ctime=2018-04-13 14:22:40.253499 caps=pLsXsFscr(0=pLsXsFscr) objectset[0x100000002ad ts 0/0 objects 0 dirty_or_tx 0] parents=0x100000002aa.head["file.2"] 0x7fdb6c01b210)
2018-04-13 14:22:40.254 7fdb947f8700 10 client.124124  nothing to flush         
2018-04-13 14:22:40.254 7fdb947f8700 15 client.124124 using return-valued form of _fsync
2018-04-13 14:22:40.254 7fdb947f8700 10 client.124124 no metadata needs to commit
2018-04-13 14:22:40.254 7fdb947f8700 15 client.124124 waiting on unsafe requests, last tid 492
2018-04-13 14:22:40.375 7fdb94ff9700 20 client.124124 trim_cache size 11 max 16384
2018-04-13 14:22:41.375 7fdb94ff9700 10 client.124124 renew_caps()              
2018-04-13 14:22:41.375 7fdb94ff9700 15 client.124124 renew_caps requesting from mds.0
2018-04-13 14:22:41.375 7fdb94ff9700 10 client.124124 renew_caps mds.0          
2018-04-13 14:22:41.375 7fdb94ff9700 20 client.124124 trim_cache size 11 max 16384
2018-04-13 14:22:41.375 7fdb777fe700 10 client.124124 handle_client_session client_session(renewcaps seq 6) v1 from mds.0
2018-04-13 14:22:42.375 7fdb94ff9700 20 client.124124 trim_cache size 11 max 16384
2018-04-13 14:22:43.375 7fdb94ff9700 20 client.124124 trim_cache size 11 max 16384
2018-04-13 14:22:44.375 7fdb94ff9700 20 client.124124 trim_cache size 11 max 16384
2018-04-13 14:22:45.179 7fdb777fe700 20 client.124124 handle_client_reply got a reply. Safe:1 tid 492
2018-04-13 14:22:45.179 7fdb777fe700 10 client.124124 insert_trace from 2018-04-13 14:22:40.253564 mds.0 is_target=0 is_dentry=0
2018-04-13 14:22:45.179 7fdb777fe700 10 client.124124 insert_trace -- already got unsafe; ignoring
2018-04-13 14:22:45.179 7fdb947f8700 10 client.124124 put_inode on 0x100000002ad.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={2=0} mode=100666 size=0/4194304 btime=2018-04-13 14:22:32.881656 mtime=2018-04-13 14:22:32.881656 

...and mds log:

2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.locker set_xlocks_done on (iauth xlock x=1 by 0x55ab21213700) [inode 0x100000002ad [2,head] /nfsclnt.test/file.2 auth v32 pv34 ap=2+0 dirtyparent s=0 n(v0 rc2018-04-13 14:22:40.222450 1=1+0) (iauth xlock x=1 by 0x55ab21213700) (iversion lock w=1 last_client=124124) cr={124124=0-4194304@1} caps={124124=pLsXsFscr/-@4} | request=1 lock=3 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55ab2085ca00]
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.server early_reply 0 ((0) Success) client_request(client.124124:492 setattr mode=0666 #0x100000002ad 2018-04-13 14:22:40.253499 caller_uid=4447, caller_gid=4447{10,4447,}) v4
2018-04-13 14:22:40.253 7ff0e80ef700 20 mds.0.server set_trace_dist snapid head 
2018-04-13 14:22:40.253 7ff0e80ef700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55ab20304780) have_past_parents_open [1,head]
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.server set_trace_dist snaprealm snaprealm(0x1 seq 1 lc 0 cr 0 cps 1 snaps={} past_parent_snaps= 0x55ab20304780) len=48
2018-04-13 14:22:40.253 7ff0e80ef700 20 mds.0.cache.ino(0x100000002ad)  pfile 0 pauth 1 plink 0 pxattr 0 plocal 1 ctime 2018-04-13 14:22:40.253499 valid=1
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.cache.ino(0x100000002ad) encode_inodestat issuing pLsXsFscr seq 5
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.cache.ino(0x100000002ad) encode_inodestat caps pLsXsFscr seq 5 mseq 0 xattrv 0 len 0
2018-04-13 14:22:40.253 7ff0e80ef700 20 mds.0.server set_trace_dist added in   [inode 0x100000002ad [2,head] /nfsclnt.test/file.2 auth v32 pv34 ap=2+0 dirtyparent s=0 n(v0 rc2018-04-13 14:22:40.222450 1=1+0) (iauth xlockdone x=1) (iversion lock w=1 last_client=124124) cr={124124=0-4194304@1} caps={124124=pLsXsFscr/-@5} | request=1 lock=3 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55ab2085ca00]
2018-04-13 14:22:40.253 7ff0e80ef700 20 mds.0.server lat 0.000655               
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.cache create_subtree_map 2 subtrees, 2 fullauth
2018-04-13 14:22:40.253 7ff0e80ef700 15 mds.0.cache show_subtrees               
2018-04-13 14:22:40.253 7ff0e80ef700 10 mds.0.cache |__ 0    auth [dir 0x100 ~mds0/ [2,head] auth v=1392 cv=0/0 dir_auth=0 state=1610612737|complete f(v0 10=0+10) n(v7 rc2018-04-13 14:22:32.818877 237=161+76) hs=10+0,ss=0+0 dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x55ab2071c000]
2018-04-13 14:22:40.254 7ff0e80ef700 10 mds.0.cache |__ 0    auth [dir 0x1 / [2,head] auth pv=401 v=399 cv=0/0 dir_auth=0 ap=1+1+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.222450 11=10+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x55ab200a1800]
2018-04-13 14:22:40.254 7ff0e80ef700 15 mds.0.cache  subtree [dir 0x1 / [2,head] auth pv=401 v=399 cv=0/0 dir_auth=0 ap=1+1+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.222450 11=10+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x55ab200a1800]
2018-04-13 14:22:40.254 7ff0e80ef700 15 mds.0.cache  subtree [dir 0x100 ~mds0/ [2,head] auth v=1392 cv=0/0 dir_auth=0 state=1610612737|complete f(v0 10=0+10) n(v7 rc2018-04-13 14:22:32.818877 237=161+76) hs=10+0,ss=0+0 dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x55ab2071c000]
2018-04-13 14:22:40.254 7ff0e80ef700 20 mds.0.journal EMetaBlob::add_dir_context final:
2018-04-13 14:22:40.254 7ff0e80ef700 20 mds.0.journal EMetaBlob::add_dir_context final:
2018-04-13 14:22:40.254 7ff0e80ef700 15 mds.0.cache  subtrees {0x1=[],0x100=[]} 
2018-04-13 14:22:40.254 7ff0e80ef700 15 mds.0.cache  ambiguous_subtrees         
2018-04-13 14:22:40.254 7ff0e10e1700  5 mds.0.log _submit_thread 10340520~2439 : EUpdate setattr [metablob 0x1, 2 dirs]
2018-04-13 14:22:40.254 7ff0e10e1700  5 mds.0.log _submit_thread 10342979~872 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2 dirs]
2018-04-13 14:22:41.314 7ff0e50e9700 10 mds.beacon.a _send up:active seq 38     
2018-04-13 14:22:41.314 7ff0e80ef700 10 mds.beacon.a handle_mds_beacon up:active seq 38 rtt 0.000497
2018-04-13 14:22:41.375 7ff0e80ef700 20 mds.0.22 get_session have 0x55ab20b21880 client.124124 192.168.1.3:0/2079452257 state open
2018-04-13 14:22:41.375 7ff0e80ef700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 6) v1 from client.124124
2018-04-13 14:22:41.375 7ff0e80ef700 10 mds.0.sessionmap touch_session s=0x55ab20b21880 name=client.124124
2018-04-13 14:22:45.155 7ff0e58ea700  7 mds.0.cache trim bytes_used=402kB limit=1GB reservation=0.05% count=0
2018-04-13 14:22:45.155 7ff0e58ea700  7 mds.0.cache trim_lru trimming 0 items from LRU size=25 mid=0 pintail=0 pinned=25
2018-04-13 14:22:45.155 7ff0e58ea700  7 mds.0.cache trim_lru trimmed 0 items    
2018-04-13 14:22:45.155 7ff0e58ea700 10 mds.0.cache trim_client_leases          
2018-04-13 14:22:45.155 7ff0e58ea700 10 mds.0.cache trim_client_leases pool 1 trimmed 0 leases, 1 left
2018-04-13 14:22:45.156 7ff0e58ea700  2 mds.0.cache check_memory_usage total 439916, rss 39424, heap 323776, baseline 323776, buffers 0, 16 / 28 inodes have caps, 16 caps, 0.571429 caps per inode
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.log trim 5 / 128 segments, 4401 / -1 events, 0 (0) expiring, 0 (0) expired
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.log _trim_expired_segments waiting for 1/4194304 to expire
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.server find_idle_sessions.  laggy until 0.000000
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.0.server laggiest active session is client.124124 192.168.1.3:0/2079452257 and sufficiently new (2018-04-13 14:22:41.375817)
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.0.server not evicting a slow client, because there is only one
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.locker scatter_tick               
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.0.locker caps_tick 0 revoking caps  
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.0.bal export_pin_queue size=0       
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.bal auth tree [dir 0x1 / [2,head] auth pv=401 v=399 cv=0/0 dir_auth=0 ap=1+1+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.222450 11=10+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x55ab200a1800] export_pin=-1
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.bal auth tree [dir 0x100 ~mds0/ [2,head] auth v=1392 cv=0/0 dir_auth=0 state=1610612737|complete f(v0 10=0+10) n(v7 rc2018-04-13 14:22:32.818877 237=161+76) hs=10+0,ss=0+0 dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x55ab2071c000] export_pin=-1
2018-04-13 14:22:45.156 7ff0e58ea700 15 mds.0.bal tick last_sample now 28000.7s 
2018-04-13 14:22:45.156 7ff0e58ea700 15 mds.0.bal get_load mdsload<[2.29226,18.5347 39.3617]/[2.29226,18.5347 39.3617], req 12.3342, hr 0, qlen 0, cpu 0.21>
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.cache find_stale_fragment_freeze  
2018-04-13 14:22:45.156 7ff0e58ea700 10 mds.0.snap check_osd_map - version unchanged
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.0.22 updating export targets, currently 0 ranks are targets
2018-04-13 14:22:45.156 7ff0e58ea700 20 mds.beacon.a 0 slow request found       
2018-04-13 14:22:45.179 7ff0e20e3700 10 MDSIOContextBase::complete: 25C_MDS_inode_update_finish
2018-04-13 14:22:45.179 7ff0e20e3700 15 mds.0.cache.ino(0x100000002ad) pop_and_dirty_projected_inode 0x100000002ad v34
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.ino(0x100000002ad) mark_dirty [inode 0x100000002ad [2,head] /nfsclnt.test/file.2 auth v32 pv34 ap=2+0 dirtyparent s=0 n(v0 rc2018-04-13 14:22:40.222450 1=1+0) (iauth xlockdone x=1) (iversion lock w=1 last_client=124124) cr={124124=0-4194304@1} caps={124124=pLsXsFscr/-@5} | request=1 lock=3 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55ab2085ca00]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.den(0x100000002aa file.2) mark_dirty [dentry #0x1/nfsclnt.test/file.2 [2,head] auth (dversion lock) pv=34 v=32 ap=0+2 ino=0x100000002ad state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55ab208f1c00]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.dir(0x100000002aa) _mark_dirty (already dirty) [dir 0x100000002aa /nfsclnt.test/ [2,head] auth pv=35 v=34 cv=0/0 ap=1+2+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.904765 10=10+0) n(v0 rc2018-04-13 14:22:40.222450 10=10+0) hs=10+0,ss=0+0 dirty=10 | child=1 dirty=1 authpin=1 0x55ab2085a000] version 34
2018-04-13 14:22:45.179 7ff0e20e3700 15 mds.0.cache.ino(0x100000002aa) pop_and_dirty_projected_inode 0x100000002aa v400
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.ino(0x100000002aa) mark_dirty [inode 0x100000002aa [...2,head] /nfsclnt.test/ auth v398 pv400 ap=1+1 dirtyparent f(v0 m2018-04-13 14:22:32.904765 10=10+0) n(v0 rc2018-04-13 14:22:40.222450 11=10+1) (isnap sync r=1) (inest lock w=1) (ifile excl) (iversion lock w=1 last_client=124124) caps={124124=pAsLsXsFsx/Fx@12},l=124124 | request=0 lock=3 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55ab2085b500]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.den(0x1 nfsclnt.test) mark_dirty [dentry #0x1/nfsclnt.test [2,head] auth (dn sync l) (dversion lock) pv=400 v=398 ap=0+2 ino=0x100000002aa state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 waiter=0 authpin=0 clientlease=1 0x55ab202dbc00]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.dir(0x1) _mark_dirty (already dirty) [dir 0x1 / [2,head] auth pv=401 v=400 cv=0/0 dir_auth=0 ap=1+1+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.222450 11=10+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x55ab200a1800] version 400
2018-04-13 14:22:45.179 7ff0e20e3700 15 mds.0.cache.ino(0x1) pop_and_dirty_projected_inode 0x1 v107
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.ino(0x1) mark_dirty [inode 0x1 [...2,head] / auth v106 pv107 ap=1+0 snaprealm=0x55ab20304780 f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.222450 12=10+2)/n(v0 1=0+1) (isnap sync r=1) (inest lock w=1 dirty) (iversion lock w=1 last_client=124124) caps={124124=pAsLsXsFs/-@18} | dirtyscattered=1 request=0 lock=3 dirfrag=1 caps=1 openingsnapparents=0 dirty=1 waiter=0 authpin=1 0x55ab200a1100]
2018-04-13 14:22:45.179 7ff0e20e3700 15 mds.0.cache.dir(0x100000002aa) pop_and_dirty_projected_fnode 0x55ab20814c10 v35
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.dir(0x100000002aa) _mark_dirty (already dirty) [dir 0x100000002aa /nfsclnt.test/ [2,head] auth pv=35 v=35 cv=0/0 ap=1+2+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.904765 10=10+0) n(v0 rc2018-04-13 14:22:40.253499 10=10+0) hs=10+0,ss=0+0 dirty=10 | child=1 dirty=1 authpin=1 0x55ab2085a000] version 35
2018-04-13 14:22:45.179 7ff0e20e3700 15 mds.0.cache.dir(0x1) pop_and_dirty_projected_fnode 0x55ab20814b10 v401
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.cache.dir(0x1) _mark_dirty (already dirty) [dir 0x1 / [2,head] auth pv=401 v=401 cv=0/0 dir_auth=0 ap=1+1+2 state=1610874881|complete f(v0 m2018-04-13 14:22:32.839915 1=0+1) n(v15 rc2018-04-13 14:22:40.253499 11=10+1)/n(v15 rc2018-04-13 14:22:40.222450 11=10+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x55ab200a1800] version 401
2018-04-13 14:22:45.179 7ff0e20e3700 20 mds.0.bal hit_dir 1 pop is 5.50103, frag * size 10
2018-04-13 14:22:45.179 7ff0e20e3700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.124124:492 setattr mode=0666 #0x100000002ad 2018-04-13 14:22:40.253499 caller_uid=4447, caller_gid=4447{10,4447,}) v4
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.locker xlock_finish on (iauth xlockdone x=1) [inode 0x100000002ad [2,head] /nfsclnt.test/file.2 auth v34 ap=2+0 dirtyparent s=0 n(v0 rc2018-04-13 14:22:40.253499 1=1+0) (iauth xlockdone x=1) (iversion lock w=1 last_client=124124) cr={124124=0-4194304@1} caps={124124=pLsXsFscr/-@5} | request=1 lock=3 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55ab2085ca00]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.locker eval_gather (iauth xlockdone) on [inode 0x100000002ad [2,head] /nfsclnt.test/file.2 auth v34 ap=2+0 dirtyparent s=0 n(v0 rc2018-04-13 14:22:40.253499 1=1+0) (iauth xlockdone) (iversion lock w=1 last_client=124124) cr={124124=0-4194304@1} caps={124124=pLsXsFscr/-@5} | request=1 lock=2 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55ab2085ca00]
2018-04-13 14:22:45.179 7ff0e20e3700 10 mds.0.locker  next state is sync issued/allows loner /s xlocker /s other /s

#2 Updated by Jeff Layton about 1 year ago

I'll see if I can cook up libcephfs standalone testcase for this.

#3 Updated by Patrick Donnelly about 1 year ago

Sounds like Ganesha is blocked on a journal flush by the MDS.

#4 Updated by dongdong tao about 1 year ago

I think patrick is right, maybe we should call flush_mdlog_sync to make mds do the mdlog flush before we wait on the list

#5 Updated by Patrick Donnelly about 1 year ago

  • Assignee set to Jeff Layton
  • Priority changed from Normal to High

#6 Updated by Jeff Layton about 1 year ago

I tried to roll a standalone testcase for this, but it didn't stall out in the same way. I'm not quite sure what causes the problem at this point, but I'll take a look at the proposed solution and see if it helps the (more complex) testcase that I do have.

#7 Updated by Jeff Layton about 1 year ago

Thanks, dongdong! That seems to resolve the problem. Pull request is up here:

https://github.com/ceph/ceph/pull/21498

I also have an integration build going in shaman now, and will plan to run this through teuthology fs test suite.

#8 Updated by Patrick Donnelly about 1 year ago

  • Status changed from New to Pending Backport
  • Source set to Development
  • Backport set to luminous
  • Component(FS) Client added

#9 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #23802: luminous: slow ceph_ll_sync_inode calls after setattr added

#10 Updated by Nathan Cutler about 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF