Bug #271
closeddropping dirty+flushing AxFxw state for
0%
Description
This weekend i tried to rsync kernel.org to my cluster, just to get some data on it.
At first it all seems to go fine, until i started to rain the following messages in my dmesg:
[89636.033739] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c32dc8 1099511929817 [89636.033743] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c33320 1099511929818 [89636.033746] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c33878 1099511929819 [89636.033750] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c33dd0 1099511929820 [89636.033754] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c34328 1099511929821 [89636.033758] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c34880 1099511929822 [89636.033761] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c34dd8 1099511929823 [89636.033765] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c35330 1099511929824 [89636.033769] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c35888 1099511929825 [89636.033773] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c36338 1099511929826 [89636.033777] ceph: dropping dirty+flushing AxFx state for ffff8800b0c36890 1099511929827 [89636.033781] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c36de8 1099511929828 [89636.033785] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c37340 1099511929829 [89636.033788] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c37898 1099511929830 [89636.033792] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c38308 1099511929831 [89636.033796] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c38860 1099511929832 [89636.033800] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c38db8 1099511929833 [89636.033804] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c39310 1099511929834 [89636.033808] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c39868 1099511929835 [89636.033811] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c39dc0 1099511929836 [89636.033815] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3a318 1099511929837 [89636.033819] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3a870 1099511929838 [89636.033822] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3adc8 1099511929839 [89636.033826] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3b320 1099511929840 [89636.033830] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3b878 1099511929841 [89636.033834] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3bdd0 1099511929842 [89636.033838] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3c328 1099511929843 [89636.033842] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3c880 1099511929844 [89636.033845] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3cdd8 1099511929845 [89636.033849] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3d330 1099511929846 [89636.033853] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3d888 1099511929847 [89636.033857] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3dde0 1099511929848 [89636.033861] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3e338 1099511929849 [89636.033865] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3e890 1099511929850 [89636.033868] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3ede8 1099511929851 [89636.033872] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3f340 1099511929852 [89636.033876] ceph: dropping dirty+flushing AxFxw state for ffff8800b0c3f898 1099511929853 [89636.033880] ceph: dropping dirty+flushing AxFxw state for ffff880027540308 1099511929854 [89636.033883] ceph: dropping dirty+flushing AxFxw state for ffff880027540860 1099511929855 [89636.033887] ceph: dropping dirty+flushing AxFxw state for ffff880027540db8 1099511929856 [89636.033891] ceph: dropping dirty+flushing AxFxw state for ffff880027541310 1099511929857 [89636.033895] ceph: dropping dirty+flushing AxFxw state for ffff880027541868 1099511929858 [89636.033898] ceph: dropping dirty+flushing AxFxw state for ffff880027541dc0 1099511929859 [89636.033902] ceph: dropping dirty+flushing AxFxw state for ffff880027542318 1099511929860 [89636.033906] ceph: dropping dirty+flushing AxFxw state for ffff880027542870 1099511929861 [89636.033909] ceph: dropping dirty+flushing AxFxw state for ffff880027542dc8 1099511929862 [89636.033913] ceph: dropping dirty+flushing AxFxw state for ffff880027543320 1099511929863 [89636.033917] ceph: dropping dirty+flushing AxFxw state for ffff880027543878 1099511929864 [89636.033921] ceph: dropping dirty+flushing AxFxw state for ffff880027543dd0 1099511929865 [89636.033924] ceph: dropping dirty+flushing AxFxw state for ffff880027544328 1099511929866 [89636.033928] ceph: dropping dirty+flushing AxFxw state for ffff880027544880 1099511929867 [89636.033932] ceph: dropping dirty+flushing AxFxw state for ffff880027544dd8 1099511929868 [89636.033935] ceph: dropping dirty+flushing AxFxw state for ffff880027545330 1099511929869 [89636.033939] ceph: dropping dirty+flushing AxFxw state for ffff880027545888 1099511929870 [89636.033943] ceph: dropping dirty+flushing AxFxw state for ffff880027545de0 1099511929871 [89636.033947] ceph: dropping dirty+flushing AxFxw state for ffff880027546338 1099511929872 [89636.033951] ceph: dropping dirty+flushing AxFxw state for ffff880027546890 1099511929873 [89636.033955] ceph: dropping dirty+flushing AxFxw state for ffff880027546de8 1099511929874 [89636.033958] ceph: dropping dirty+flushing AxFxw state for ffff880027547340 1099511929875 [89636.033962] ceph: dropping dirty+flushing AxFxw state for ffff880027547898 1099511929876 [89636.033966] ceph: dropping dirty+flushing AxFxw state for ffff880027548308 1099511929877 [89636.033970] ceph: dropping dirty+flushing AxFxw state for ffff880027548860 1099511929878 [89636.033973] ceph: dropping dirty+flushing AxFxw state for ffff880027548db8 1099511929879 [89636.033977] ceph: dropping dirty+flushing AxFxw state for ffff880027549310 1099511929880 [89636.033981] ceph: dropping dirty+flushing AxFxw state for ffff880027549868 1099511929881 [89636.033985] ceph: dropping dirty+flushing AxFxw state for ffff880027549dc0 1099511929882 [89636.033988] ceph: dropping dirty+flushing AxFxw state for ffff88002754a318 1099511929883 [89636.033992] ceph: dropping dirty+flushing AxFxw state for ffff88002754a870 1099511929884 [89636.033996] ceph: dropping dirty+flushing AxFxw state for ffff88002754adc8 1099511929885 [89636.033999] ceph: dropping dirty+flushing AxFxw state for ffff88002754b320 1099511929886 [89636.034003] ceph: dropping dirty+flushing AxFxw state for ffff88002754b878 1099511929887 [89636.034007] ceph: dropping dirty+flushing AxFxw state for ffff88002754bdd0 1099511929888 [89636.034010] ceph: dropping dirty+flushing AxFxw state for ffff88002754c328 1099511929889 [89636.034014] ceph: dropping dirty+flushing AxFxw state for ffff88002754c880 1099511929890 [89636.034018] ceph: dropping dirty+flushing Fx state for ffff88005bb37898 1099511928796 [89636.034022] ceph: dropping dirty+flushing Fx state for ffff88005bb35330 1099511927341 [89636.034026] ceph: dropping dirty+flushing AxFxw state for ffff88002754cdd8 1099511929891 [89636.034030] ceph: dropping dirty+flushing AxFxw state for ffff88002754d330 1099511929892 [89636.034034] ceph: dropping dirty+flushing AxFxw state for ffff88002754d888 1099511929893 [89636.034038] ceph: dropping dirty+flushing AxFxw state for ffff88002754dde0 1099511929894 [89636.034042] ceph: dropping dirty+flushing AxFxw state for ffff88002754e338 1099511929895 [89636.034045] ceph: dropping dirty+flushing AxFxw state for ffff88002754e890 1099511929896 [89636.034049] ceph: dropping dirty+flushing AxFxw state for ffff88002754ede8 1099511929897 [89636.034059] ceph: dropping dirty+flushing Fx state for ffff88007b39cdd8 1099511633194 [89636.034064] ceph: dropping dirty+flushing Fx state for ffff88007b380860 1099511927313
This kept on going, until the rsync stalled.
I was able to cancel the rsync with CTRL+C, but then i tried a "sync", which then failed:
[146281.380053] INFO: task sync:11834 blocked for more than 120 seconds. [146281.381687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
Reading data from the cluster was still possible, but writes all hung.
After a reboot of the client, i could continue the rsync and it finished.
Kernel: 2.6.35-rc1
Client: master branch (Running IPv6 only)
Updated by Sage Weil almost 14 years ago
- Status changed from New to Can't reproduce
If you see this again, can you save the mds logs? It looks like one of the MDS sessions is closing, but i have no idea why that would happen.
Updated by Wido den Hollander almost 14 years ago
Just had this issue again.
Situation:- Last night my 2 mons crashed during two rsync's (have not reported this crash yet)
- One of my MDS (mds1) crashed too
- I restarted the mons
- Restarted the crashed MDS
- Messages started to appear.
I've uploaded the logs at logger.ceph.widodh.nl in /srv/ceph/issues/issue_271
mds.0.log has a zero-size since the disk got filled up with coredumps...
Updated by Wido den Hollander almost 14 years ago
While this was going on, one of my MDS'es was pulling 92% CPU, with the following cluster state:
10.07.19_21:34:18.613919 pg v24768: 7968 pgs: 7968 active+clean; 647 GB data, 1306 GB used, 5077 GB / 6383 GB avail 10.07.19_21:34:18.637467 mds e36: 1/1/1 up {0=up:active}, 1 up:standby(laggy or crashed)
Updated by Wido den Hollander almost 14 years ago
I just saw this again, saved the logs on logger.ceph.widodh.nl in /srv/ceph/issues/issue_271
Jul 21 11:32:06 client02 kernel: [ 5876.617147] ceph: dropping dirty+flushing AxFxw state for ffff8800067c6de8 1099513032197 Jul 21 11:32:06 client02 kernel: [ 5876.617154] ceph: dropping dirty+flushing AxFxw state for ffff8800067c7340 1099513032198 Jul 21 11:32:06 client02 kernel: [ 5876.617159] ceph: dropping dirty+flushing AxFxw state for ffff8800067c7898 1099513032199 Jul 21 11:32:06 client02 kernel: [ 5876.617163] ceph: dropping dirty+flushing AxFxw state for ffff8800067c8308 1099513032200 Jul 21 11:32:06 client02 kernel: [ 5876.617171] ceph: dropping dirty+flushing AxFxw state for ffff8800067c9868 1099513032201 Jul 21 11:32:06 client02 kernel: [ 5876.617177] ceph: dropping dirty+flushing AxFxw state for ffff8800067cadc8 1099513032202 Jul 21 11:32:06 client02 kernel: [ 5876.617625] ceph: dropping dirty+flushing Fx state for ffff88000673a870 1099512958343 Jul 21 11:32:06 client02 kernel: [ 5876.617632] ceph: dropping dirty+flushing Fx state for ffff88000665a870 1099512958023 Jul 21 11:32:06 client02 kernel: [ 5876.617861] ceph: dropping dirty+flushing Fx state for ffff8800067ba318 1099512958034 Jul 21 11:32:06 client02 kernel: [ 5876.618412] ceph: dropping dirty+flushing AxFxw state for ffff880009b12dc8 1099513032203 Jul 21 11:32:06 client02 kernel: [ 5876.618917] ceph: dropping dirty+flushing Fx state for ffff880009a8c328 1099512960344 Jul 21 11:32:06 client02 kernel: [ 5876.621489] ceph: dropping dirty+flushing AxFxw state for ffff880006042318 1099513032204 Jul 21 11:32:06 client02 kernel: [ 5876.623815] ceph: dropping dirty+flushing AxFxw state for ffff880006082318 1099513032205 Jul 21 11:32:06 client02 kernel: [ 5876.623847] ceph: dropping dirty+flushing AxFxw state for ffff8800061a6338 1099513032206 Jul 21 11:32:06 client02 kernel: [ 5876.623882] ceph: dropping dirty+flushing AxFxw state for ffff8800061a6890 1099513032207 Jul 21 11:32:06 client02 kernel: [ 5876.623900] ceph: dropping dirty+flushing Fx state for ffff880008ffcdd8 1099512959266 Jul 21 11:32:06 client02 kernel: [ 5876.623908] ceph: dropping dirty+flushing Fx state for ffff880006041868 1099512966128 Jul 21 11:32:06 client02 kernel: [ 5876.623915] ceph: dropping dirty+flushing Fx state for ffff8800060b6890 1099512966247 Jul 21 11:32:06 client02 kernel: [ 5876.623921] ceph: dropping dirty+flushing Fx state for ffff8800060b7340 1099512966251
Then i checked the mds.1.log, which says:
10.07.21_11:32:06.317738 7f142f4a9710 mds0.server no longer in reconnect state, ignoring reconnect, sending close 10.07.21_11:32:06.319997 7f142f4a9710 log [INF] : denied reconnect attempt (mds is up:active) from client6409 [2001:16f8:10:2::c3c3:8c7a]:0/23980306 after 312.370185 (allowed interval 45) 10.07.21_11:32:07.003051 7f141e3eb710 -- [2001:16f8:10:2::c3c3:2e5c]:6800/25675 >> [2001:16f8:10:2::c3c3:8c7a]:0/23980306 pipe(0x66b73180 sd=-1 pgs=794270 cs=1 l=0).fault initiating reconnect 10.07.21_11:32:07.011850 7f141e4ec710 -- [2001:16f8:10:2::c3c3:2e5c]:6800/25675 >> [2001:16f8:10:2::c3c3:8c7a]:0/23980306 pipe(0x66d77810 sd=43 pgs=0 cs=0 l=0).accept peer addr is really [2001:16f8:10:2::c3c3:8c7a]:0/23980306 (socket is [2001:16f8:10:2::c3c3:8c7a]:39671/0)
Updated by Yehuda Sadeh almost 14 years ago
It does look like a multiple mds only bug. This message can occur when ci->i_caps is empty while the ci->i_flushing_item is not. Items are removed from ci->i_caps in __ceph_remove_cap(), and this one is either called from either ceph_destroy_inode() (IMO, when it's called ci->i_flushing_item should be empty) or from one of the import/export handlers.