Project

General

Profile

Actions

Bug #271

closed

dropping dirty+flushing AxFxw state for

Added by Wido den Hollander almost 14 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

This weekend i tried to rsync kernel.org to my cluster, just to get some data on it.

At first it all seems to go fine, until i started to rain the following messages in my dmesg:

[89636.033739] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c32dc8 1099511929817
[89636.033743] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c33320 1099511929818
[89636.033746] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c33878 1099511929819
[89636.033750] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c33dd0 1099511929820
[89636.033754] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c34328 1099511929821
[89636.033758] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c34880 1099511929822
[89636.033761] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c34dd8 1099511929823
[89636.033765] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c35330 1099511929824
[89636.033769] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c35888 1099511929825
[89636.033773] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c36338 1099511929826
[89636.033777] ceph:  dropping dirty+flushing AxFx state for ffff8800b0c36890 1099511929827
[89636.033781] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c36de8 1099511929828
[89636.033785] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c37340 1099511929829
[89636.033788] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c37898 1099511929830
[89636.033792] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c38308 1099511929831
[89636.033796] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c38860 1099511929832
[89636.033800] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c38db8 1099511929833
[89636.033804] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c39310 1099511929834
[89636.033808] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c39868 1099511929835
[89636.033811] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c39dc0 1099511929836
[89636.033815] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3a318 1099511929837
[89636.033819] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3a870 1099511929838
[89636.033822] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3adc8 1099511929839
[89636.033826] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3b320 1099511929840
[89636.033830] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3b878 1099511929841
[89636.033834] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3bdd0 1099511929842
[89636.033838] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3c328 1099511929843
[89636.033842] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3c880 1099511929844
[89636.033845] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3cdd8 1099511929845
[89636.033849] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3d330 1099511929846
[89636.033853] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3d888 1099511929847
[89636.033857] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3dde0 1099511929848
[89636.033861] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3e338 1099511929849
[89636.033865] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3e890 1099511929850
[89636.033868] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3ede8 1099511929851
[89636.033872] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3f340 1099511929852
[89636.033876] ceph:  dropping dirty+flushing AxFxw state for ffff8800b0c3f898 1099511929853
[89636.033880] ceph:  dropping dirty+flushing AxFxw state for ffff880027540308 1099511929854
[89636.033883] ceph:  dropping dirty+flushing AxFxw state for ffff880027540860 1099511929855
[89636.033887] ceph:  dropping dirty+flushing AxFxw state for ffff880027540db8 1099511929856
[89636.033891] ceph:  dropping dirty+flushing AxFxw state for ffff880027541310 1099511929857
[89636.033895] ceph:  dropping dirty+flushing AxFxw state for ffff880027541868 1099511929858
[89636.033898] ceph:  dropping dirty+flushing AxFxw state for ffff880027541dc0 1099511929859
[89636.033902] ceph:  dropping dirty+flushing AxFxw state for ffff880027542318 1099511929860
[89636.033906] ceph:  dropping dirty+flushing AxFxw state for ffff880027542870 1099511929861
[89636.033909] ceph:  dropping dirty+flushing AxFxw state for ffff880027542dc8 1099511929862
[89636.033913] ceph:  dropping dirty+flushing AxFxw state for ffff880027543320 1099511929863
[89636.033917] ceph:  dropping dirty+flushing AxFxw state for ffff880027543878 1099511929864
[89636.033921] ceph:  dropping dirty+flushing AxFxw state for ffff880027543dd0 1099511929865
[89636.033924] ceph:  dropping dirty+flushing AxFxw state for ffff880027544328 1099511929866
[89636.033928] ceph:  dropping dirty+flushing AxFxw state for ffff880027544880 1099511929867
[89636.033932] ceph:  dropping dirty+flushing AxFxw state for ffff880027544dd8 1099511929868
[89636.033935] ceph:  dropping dirty+flushing AxFxw state for ffff880027545330 1099511929869
[89636.033939] ceph:  dropping dirty+flushing AxFxw state for ffff880027545888 1099511929870
[89636.033943] ceph:  dropping dirty+flushing AxFxw state for ffff880027545de0 1099511929871
[89636.033947] ceph:  dropping dirty+flushing AxFxw state for ffff880027546338 1099511929872
[89636.033951] ceph:  dropping dirty+flushing AxFxw state for ffff880027546890 1099511929873
[89636.033955] ceph:  dropping dirty+flushing AxFxw state for ffff880027546de8 1099511929874
[89636.033958] ceph:  dropping dirty+flushing AxFxw state for ffff880027547340 1099511929875
[89636.033962] ceph:  dropping dirty+flushing AxFxw state for ffff880027547898 1099511929876
[89636.033966] ceph:  dropping dirty+flushing AxFxw state for ffff880027548308 1099511929877
[89636.033970] ceph:  dropping dirty+flushing AxFxw state for ffff880027548860 1099511929878
[89636.033973] ceph:  dropping dirty+flushing AxFxw state for ffff880027548db8 1099511929879
[89636.033977] ceph:  dropping dirty+flushing AxFxw state for ffff880027549310 1099511929880
[89636.033981] ceph:  dropping dirty+flushing AxFxw state for ffff880027549868 1099511929881
[89636.033985] ceph:  dropping dirty+flushing AxFxw state for ffff880027549dc0 1099511929882
[89636.033988] ceph:  dropping dirty+flushing AxFxw state for ffff88002754a318 1099511929883
[89636.033992] ceph:  dropping dirty+flushing AxFxw state for ffff88002754a870 1099511929884
[89636.033996] ceph:  dropping dirty+flushing AxFxw state for ffff88002754adc8 1099511929885
[89636.033999] ceph:  dropping dirty+flushing AxFxw state for ffff88002754b320 1099511929886
[89636.034003] ceph:  dropping dirty+flushing AxFxw state for ffff88002754b878 1099511929887
[89636.034007] ceph:  dropping dirty+flushing AxFxw state for ffff88002754bdd0 1099511929888
[89636.034010] ceph:  dropping dirty+flushing AxFxw state for ffff88002754c328 1099511929889
[89636.034014] ceph:  dropping dirty+flushing AxFxw state for ffff88002754c880 1099511929890
[89636.034018] ceph:  dropping dirty+flushing Fx state for ffff88005bb37898 1099511928796
[89636.034022] ceph:  dropping dirty+flushing Fx state for ffff88005bb35330 1099511927341
[89636.034026] ceph:  dropping dirty+flushing AxFxw state for ffff88002754cdd8 1099511929891
[89636.034030] ceph:  dropping dirty+flushing AxFxw state for ffff88002754d330 1099511929892
[89636.034034] ceph:  dropping dirty+flushing AxFxw state for ffff88002754d888 1099511929893
[89636.034038] ceph:  dropping dirty+flushing AxFxw state for ffff88002754dde0 1099511929894
[89636.034042] ceph:  dropping dirty+flushing AxFxw state for ffff88002754e338 1099511929895
[89636.034045] ceph:  dropping dirty+flushing AxFxw state for ffff88002754e890 1099511929896
[89636.034049] ceph:  dropping dirty+flushing AxFxw state for ffff88002754ede8 1099511929897
[89636.034059] ceph:  dropping dirty+flushing Fx state for ffff88007b39cdd8 1099511633194
[89636.034064] ceph:  dropping dirty+flushing Fx state for ffff88007b380860 1099511927313

This kept on going, until the rsync stalled.

I was able to cancel the rsync with CTRL+C, but then i tried a "sync", which then failed:

[146281.380053] INFO: task sync:11834 blocked for more than 120 seconds.
[146281.381687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

Reading data from the cluster was still possible, but writes all hung.

After a reboot of the client, i could continue the rsync and it finished.

Kernel: 2.6.35-rc1
Client: master branch (Running IPv6 only)

Actions #1

Updated by Sage Weil almost 14 years ago

single or multiple mds's?

Actions #2

Updated by Wido den Hollander almost 14 years ago

Multiple, 2 to be exact.

Actions #3

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Can't reproduce

If you see this again, can you save the mds logs? It looks like one of the MDS sessions is closing, but i have no idea why that would happen.

Actions #4

Updated by Wido den Hollander almost 14 years ago

Just had this issue again.

Situation:
  • Last night my 2 mons crashed during two rsync's (have not reported this crash yet)
  • One of my MDS (mds1) crashed too
  • I restarted the mons
  • Restarted the crashed MDS
  • Messages started to appear.

I've uploaded the logs at logger.ceph.widodh.nl in /srv/ceph/issues/issue_271

mds.0.log has a zero-size since the disk got filled up with coredumps...

Actions #5

Updated by Wido den Hollander almost 14 years ago

While this was going on, one of my MDS'es was pulling 92% CPU, with the following cluster state:

10.07.19_21:34:18.613919    pg v24768: 7968 pgs: 7968 active+clean; 647 GB data, 1306 GB used, 5077 GB / 6383 GB avail
10.07.19_21:34:18.637467   mds e36: 1/1/1 up {0=up:active}, 1 up:standby(laggy or crashed)
Actions #6

Updated by Wido den Hollander almost 14 years ago

I just saw this again, saved the logs on logger.ceph.widodh.nl in /srv/ceph/issues/issue_271

Jul 21 11:32:06 client02 kernel: [ 5876.617147] ceph:  dropping dirty+flushing AxFxw state for ffff8800067c6de8 1099513032197
Jul 21 11:32:06 client02 kernel: [ 5876.617154] ceph:  dropping dirty+flushing AxFxw state for ffff8800067c7340 1099513032198
Jul 21 11:32:06 client02 kernel: [ 5876.617159] ceph:  dropping dirty+flushing AxFxw state for ffff8800067c7898 1099513032199
Jul 21 11:32:06 client02 kernel: [ 5876.617163] ceph:  dropping dirty+flushing AxFxw state for ffff8800067c8308 1099513032200
Jul 21 11:32:06 client02 kernel: [ 5876.617171] ceph:  dropping dirty+flushing AxFxw state for ffff8800067c9868 1099513032201
Jul 21 11:32:06 client02 kernel: [ 5876.617177] ceph:  dropping dirty+flushing AxFxw state for ffff8800067cadc8 1099513032202
Jul 21 11:32:06 client02 kernel: [ 5876.617625] ceph:  dropping dirty+flushing Fx state for ffff88000673a870 1099512958343
Jul 21 11:32:06 client02 kernel: [ 5876.617632] ceph:  dropping dirty+flushing Fx state for ffff88000665a870 1099512958023
Jul 21 11:32:06 client02 kernel: [ 5876.617861] ceph:  dropping dirty+flushing Fx state for ffff8800067ba318 1099512958034
Jul 21 11:32:06 client02 kernel: [ 5876.618412] ceph:  dropping dirty+flushing AxFxw state for ffff880009b12dc8 1099513032203
Jul 21 11:32:06 client02 kernel: [ 5876.618917] ceph:  dropping dirty+flushing Fx state for ffff880009a8c328 1099512960344
Jul 21 11:32:06 client02 kernel: [ 5876.621489] ceph:  dropping dirty+flushing AxFxw state for ffff880006042318 1099513032204
Jul 21 11:32:06 client02 kernel: [ 5876.623815] ceph:  dropping dirty+flushing AxFxw state for ffff880006082318 1099513032205
Jul 21 11:32:06 client02 kernel: [ 5876.623847] ceph:  dropping dirty+flushing AxFxw state for ffff8800061a6338 1099513032206
Jul 21 11:32:06 client02 kernel: [ 5876.623882] ceph:  dropping dirty+flushing AxFxw state for ffff8800061a6890 1099513032207
Jul 21 11:32:06 client02 kernel: [ 5876.623900] ceph:  dropping dirty+flushing Fx state for ffff880008ffcdd8 1099512959266
Jul 21 11:32:06 client02 kernel: [ 5876.623908] ceph:  dropping dirty+flushing Fx state for ffff880006041868 1099512966128
Jul 21 11:32:06 client02 kernel: [ 5876.623915] ceph:  dropping dirty+flushing Fx state for ffff8800060b6890 1099512966247
Jul 21 11:32:06 client02 kernel: [ 5876.623921] ceph:  dropping dirty+flushing Fx state for ffff8800060b7340 1099512966251

Then i checked the mds.1.log, which says:

10.07.21_11:32:06.317738 7f142f4a9710 mds0.server  no longer in reconnect state, ignoring reconnect, sending close
10.07.21_11:32:06.319997 7f142f4a9710 log [INF] : denied reconnect attempt (mds is up:active) from client6409 [2001:16f8:10:2::c3c3:8c7a]:0/23980306 after 312.370185 (allowed interval 45)
10.07.21_11:32:07.003051 7f141e3eb710 -- [2001:16f8:10:2::c3c3:2e5c]:6800/25675 >> [2001:16f8:10:2::c3c3:8c7a]:0/23980306 pipe(0x66b73180 sd=-1 pgs=794270 cs=1 l=0).fault initiating reconnect
10.07.21_11:32:07.011850 7f141e4ec710 -- [2001:16f8:10:2::c3c3:2e5c]:6800/25675 >> [2001:16f8:10:2::c3c3:8c7a]:0/23980306 pipe(0x66d77810 sd=43 pgs=0 cs=0 l=0).accept peer addr is really [2001:16f8:10:2::c3c3:8c7a]:0/23980306 (socket is [2001:16f8:10:2::c3c3:8c7a]:39671/0)
Actions #7

Updated by Yehuda Sadeh almost 14 years ago

It does look like a multiple mds only bug. This message can occur when ci->i_caps is empty while the ci->i_flushing_item is not. Items are removed from ci->i_caps in __ceph_remove_cap(), and this one is either called from either ceph_destroy_inode() (IMO, when it's called ci->i_flushing_item should be empty) or from one of the import/export handlers.

Actions

Also available in: Atom PDF