Project

General

Profile

Bug #13546

mv of directories hung Ceph filesystem

Added by Eric Eastman over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
10/20/2015
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

On a test Ceph cluster running Ceph v9.1.0 with the 4.3rc6 kernel on Trusty, I tried to rename two directories in a Ceph filesystem to a sub-directory with the commands:

root@ede-c1-adm01:/cephfs/top# ls
ede-c1-gw01 ede-c1-gw02
root@ede-c1-adm01:/cephfs/top# mkdir z
root@ede-c1-adm01:/cephfs/top# mv e* z/

and the file Ceph filesystem hung on all mounted clients. On all clients, if I do a ls /cephfs/ the process hangs, and ^c ^z will not break the hang.

The system I did the mv command on was fuse mounted with the mount options:
cat /proc/mounts | grep ceph
ceph-fuse /cephfs fuse.ceph-fuse rw,noatime,user_id=0,group_id=0,default_permissions,allow_other 0 0

There are 2 clients systems that had the Ceph file system kernel mounted with the mount options:
cat /proc/mounts | grep ceph
10.14.2.11,10.14.2.12,10.14.2.13:/ /cephfs ceph rw,noatime,name=cephfs,secret=<hidden>,acl 0 0

Before I ran the mv command, I had stopped the 2 write processes per kernel mounted system, so there should have been no I/O going to the Ceph filesystem at the time of the mv operation.

In another window, I had a ceph -w running, and after the mv command it started showing:

015-10-20 14:55:26.511106 mon.0 [INF] pgmap v5643: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:55:36.723555 mon.0 [INF] pgmap v5644: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail; 204 B/s wr, 0 op/s
2015-10-20 14:55:37.735373 mon.0 [INF] pgmap v5645: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail; 456 B/s wr, 0 op/s
2015-10-20 14:56:06.526391 mds.0 [WRN] 4 slow requests, 1 included below; oldest blocked for > 134822.090727 secs
2015-10-20 14:56:06.526414 mds.0 [WRN] slow request 31.369423 seconds old, received at 2015-10-20 14:55:35.156855: client_request(client.4240:131 rename #10000000001/ede-c1-gw01 #10000000000/ede-c1-gw01 2015-10-20 14:55:35.142806) currently failed to xlock, waiting
2015-10-20 14:56:31.019164 mon.0 [INF] pgmap v5646: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:56:36.527636 mds.0 [WRN] 4 slow requests, 1 included below; oldest blocked for > 134852.091916 secs
2015-10-20 14:56:36.527681 mds.0 [WRN] slow request 61.370613 seconds old, received at 2015-10-20 14:55:35.156855: client_request(client.4240:131 rename #10000000001/ede-c1-gw01 #10000000000/ede-c1-gw01 2015-10-20 14:55:35.142806) currently failed to xlock, waiting
2015-10-20 14:56:41.527858 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 134857.092109 secs
2015-10-20 14:56:41.527868 mds.0 [WRN] slow request 31.129815 seconds old, received at 2015-10-20 14:56:10.397846: client_request(client.14128:14187 getattr pAsLsXsFs #10000000000 2015-10-20 14:56:10.395375) currently failed to rdlock, waiting
2015-10-20 14:56:57.166362 mon.0 [INF] pgmap v5647: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:57:11.528982 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 134887.093320 secs
2015-10-20 14:57:11.529010 mds.0 [WRN] slow request 61.131026 seconds old, received at 2015-10-20 14:56:10.397846: client_request(client.14128:14187 getattr pAsLsXsFs #10000000000 2015-10-20 14:56:10.395375) currently failed to rdlock, waiting
2015-10-20 14:57:26.527825 mon.0 [INF] pgmap v5648: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:57:36.742034 mon.0 [INF] pgmap v5649: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:57:37.754074 mon.0 [INF] pgmap v5650: 832 pgs: 832 active+clean; 55106 MB data, 162 GB used, 863 GB / 1061 GB avail
2015-10-20 14:57:36.530023 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 134912.094357 secs
2015-10-20 14:57:36.530046 mds.0 [WRN] slow request 121.373053 seconds old, received at 2015-10-20 14:55:35.156855: client_request(client.4240:131 rename #10000000001/ede-c1-gw01 #10000000000/ede-c1-gw01 2015-10-20 14:55:35.142806) currently failed to xlock, waiting
2015-10-20 14:58:11.531409 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 134947.095728 secs
2015-10-20 14:58:11.531435 mds.0 [WRN] slow request 121.133434 seconds old, received at 2015-10-20 14:56:10.397846: client_request(client.14128:14187 getattr pAsLsXsFs #10000000000 2015-10-20 14:56:10.395375) currently failed to rdlock, waiting
2015-10-20 14:59:36.534872 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 135032.099205 secs
2015-10-20 14:59:36.534895 mds.0 [WRN] slow request 241.377901 seconds old, received at 2015-10-20 14:55:35.156855: client_request(client.4240:131 rename #10000000001/ede-c1-gw01 #10000000000/ede-c1-gw01 2015-10-20 14:55:35.142806) currently failed to xlock, waiting
2015-10-20 15:00:00.000975 mon.0 [INF] HEALTH_WARN; mds0: Client ede-c1-gw01:cephfs failing to respond to capability release; mds0: Client ede-c1-gw02:cephfs failing to respond to capability release
2015-10-20 15:00:11.536200 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 135067.100527 secs
2015-10-20 15:00:11.536245 mds.0 [WRN] slow request 241.138233 seconds old, received at 2015-10-20 14:56:10.397846: client_request(client.14128:14187 getattr pAsLsXsFs #10000000000 2015-10-20 14:56:10.395375) currently failed to rdlock, waiting
2015-10-20 15:03:36.543495 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 135272.107818 secs
2015-10-20 15:03:36.543520 mds.0 [WRN] slow request 481.386514 seconds old, received at 2015-10-20 14:55:35.156855: client_request(client.4240:131 rename #10000000001/ede-c1-gw01 #10000000000/ede-c1-gw01 2015-10-20 14:55:35.142806) currently failed to xlock, waiting
2015-10-20 15:04:11.544806 mds.0 [WRN] 5 slow requests, 1 included below; oldest blocked for > 135307.109083 secs
2015-10-20 15:04:11.544827 mds.0 [WRN] slow request 481.146789 seconds old, received at 2015-10-20 14:56:10.397846: client_request(client.14128:14187 getattr pAsLsXsFs #10000000000 2015-10-20 14:56:10.395375) currently failed to rdlock, waiting
...
2015-10-20 15:33:46.625316 mds.0 [WRN] slow request 482.361401 seconds old, received at 2015-10-20 15:25:44.263752: client_request(client.4240:139 getattr pAsLsXsFs #10000000000 2015-10-20 15:25:44.249554) currently failed to rdlock, waiting

System info:

ceph -s
    cluster 75a91bbe-b287-11e4-889f-001517987704
     health HEALTH_WARN
            mds0: Client ede-c1-gw01:cephfs failing to respond to capability release
            mds0: Client ede-c1-gw02:cephfs failing to respond to capability release
     monmap e1: 3 mons at {ede-c1-mon01=10.14.2.11:6789/0,ede-c1-mon02=10.14.2.12:6789/0,ede-c1-mon03=10.14.2.13:6789/0}
            election epoch 12, quorum 0,1,2 ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
     mdsmap e22: 1/1/1 up {0=ede-c1-mds02=up:active}, 2 up:standby
     osdmap e49: 18 osds: 18 up, 18 in
            flags sortbitwise
      pgmap v5650: 832 pgs, 4 pools, 55106 MB data, 25170 objects
            162 GB used, 863 GB / 1061 GB avail
                 832 active+clean

ceph health detail
HEALTH_WARN mds0: Client ede-c1-gw01:cephfs failing to respond to capability release; mds0: Client ede-c1-gw02:cephfs failing to respond to capability release
mds0: Client ede-c1-gw01:cephfs failing to respond to capability release(client_id: 14128)
mds0: Client ede-c1-gw02:cephfs failing to respond to capability release(client_id: 14138)

ceph -v
ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)

Kernel mounted system
uname -a
Linux ede-c1-gw01 4.3.0-040300rc6-generic #201510182030 SMP Mon Oct 19 00:31:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

fuse mounted system
uname -a
Linux ede-c1-adm01 4.3.0-040300rc6-generic #201510182030 SMP Mon Oct 19 00:31:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

dpkg -l | grep fuse
ii  ceph-fuse                             9.1.0-1trusty                    amd64        FUSE-based client for the Ceph distributed file system
ii  fuse                                  2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace
ii  libfuse2:amd64                        2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace (library)

From the fuse mount:
df 
Filesystem      1K-blocks      Used Available Use% Mounted on
/dev/sda5        35723920   7704560  26181640  23% /
none                    4         0         4   0% /sys/fs/cgroup
udev              1014384        12   1014372   1% /dev
tmpfs              204892       740    204152   1% /run
none                 5120         0      5120   0% /run/lock
none              1024444        12   1024432   1% /run/shm
none               102400         0    102400   0% /run/user
ceph-fuse      1113567232 208039936 905527296  19% /cephfs

From one of systems using the kernel mount
df 
Filesystem                          1K-blocks      Used Available Use% Mounted on
/dev/sda5                            35723920   3158012  30728188  10% /
none                                        4         0         4   0% /sys/fs/cgroup
udev                                  1014012         4   1014008   1% /dev
tmpfs                                  204892       664    204228   1% /run
none                                     5120         0      5120   0% /run/lock
none                                  1024444         8   1024436   1% /run/shm
none                                   102400         0    102400   0% /run/user
10.14.2.11,10.14.2.12,10.14.2.13:/ 1113567232 208039936 905527296  19% /cephfs


I am not seeing any errors in /var/log/kern.log or /var/log/syslog.

ceph-mon.ede-c1-mon01.log.bz2 (119 KB) Eric Eastman, 10/20/2015 08:11 PM

ceph-mds.ede-c1-mds02.log.1.gz (5.33 KB) Eric Eastman, 10/20/2015 08:11 PM

ceph-mds.ede-c1-mds02.log.bz2 (1.54 KB) Eric Eastman, 10/20/2015 08:11 PM

ceph.log.mon.bz2 (22.2 KB) Eric Eastman, 10/20/2015 08:11 PM

Associated revisions

Revision 5d8d6662 (diff)
Added by Yan, Zheng over 3 years ago

mds: fix completed cap flush handling

Client re-send cap flush when MDS restarts. The cap flush message
may release some caps even if the corresponding flush is already
completed.

Fixes: #13546
Signed-off-by: Yan, Zheng <>

History

#2 Updated by Greg Farnum over 3 years ago

  • Priority changed from Normal to High

Bumping this up as we ought to examine it and I think it's been lost in the shuffle.

#3 Updated by Zheng Yan over 3 years ago

  • Status changed from New to Verified

#4 Updated by Greg Farnum over 3 years ago

  • Status changed from Verified to In Progress

Do you have a PR or any other commits to go with that? Is it safe to unilaterally not drop caps in a case like this?

#5 Updated by Zheng Yan over 3 years ago

  • Status changed from In Progress to Need Review

#6 Updated by Greg Farnum over 3 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF