Bug #24405
closedI/O errors reported by client copy apps (rsync/cp)
0%
Description
Hello,
Globally cephfs is working fine (thanks for that !), except for one point: sometimes, after a random load of copy work, the filesystem send an I/O error to the program.
(Mounted with "mount -t ceph" on a client also in the role of osd-pool/mgr/mon/mds.)
In a copy process (~30TB), it kills rsync or skip files in cp:
tom@telegeo02:~> rsync -avvv /somewhere/* /mnt/cephfs/pool_21p3/ ... S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_14QPF____20161228_QLT_R2.DBL.TIF 90,464,592 100% 63.53MB/s 0:00:01 (xfr#31253, ir-chk=1014/45890) S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_14QPF____20161228_QLT_R2.HDR 2,946 100% 8.01kB/s 0:00:00 (xfr#31254, ir-chk=1013/45890) S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QPF____20161228_FRE_R1.DBL.TIF 521,175,040 54% 18.79MB/s 0:00:23 rsync: [sender] write error: Broken pipe (32) rsync: write failed on "/mnt/cephfs/pool_21p3/S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QPF____20161228_FRE_R1.DBL.TIF": Input/output error (5) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
tom@telegeo02:~> cp -rnv /somewhere/* /mnt/cephfs/pool_21p3/ > /export/miro/tom/logs/log_ceph_testcp cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170223T165311_N0204_R026_T14QNE_20170223T170837.SAFE/S2A_OPER_SSC_L2VALD_14QNE____20170223.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QNE____20170223_FRE_R1.DBL.TIF': Input/output error cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170223T165311_N0204_R026_T14QQK_20170223T165747.SAFE/S2A_OPER_SSC_L2VALD_14QQK____20170223.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QQK____20170223_FRE_R1.DBL.TIF': Input/output error cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15PXT_20170224T163738.SAFE/S2A_OPER_SSC_L2VALD_15PXT____20170224.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_15PXT____20170224_QLT_R1.DBL.TIF': Input/output error cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15QZE_20170224T162512.SAFE/S2A_OPER_SSC_L2VALD_15QZE____20170224.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_15QZE____20170224_FRE_R1.DBL.TIF': Input/output error cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15QZV_20170224T163738.SAFE/S2A_OPER_SSC_L2VALD_15QZV____20170224.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_15QZV____20170224_FRE_R1.DBL.TIF': Input/output error
I'm using deprecated hardware to make "real-world" stress tests and as you can see (in the status file linked) 3 osd are already dead. When they died, such an error appeared. That is a logical reason but not ok for prod. This is also the case when the osd get killed because of a "no memory left" error (160% of max use !), issue solved now (set to 512MB => 1G used). However, the error is still present, and I have no trace in any log ! (nor syslog or ceph logs, checked on all machines).
For me, there are 2 issues:
- any disk error prompted before the osd is down should be handled, if not at the osd-level, at least by the ceph client. Only if the error is "real" (eg. not enough osds present to write data) it can be reported on the filesystem users as an I/O error.
- there is literally no information to locate the source of the remaining issue (a disfunctionning disk ? an network/memory overload ?). The info in /sys/kernel/debug/ceph/ is not helping (most of the time empty => seems not to be a "lag" issue).
Any idea to locate the issue / any fix is welcome !
Thanks!
Files
Updated by Zheng Yan almost 6 years ago
After errors happen, please retry rsync the files that failed to write. If errors still happen, enable ceph debugging (echo module ceph +p /sys/kernel/debug/dynamic_debug/control; echo module libceph +p /sys/kernel/debug/dynamic_debug/control) and retry again. The log should show where the errors are from.
Updated by Thomas De Maet almost 6 years ago
- File cmd_debug_mode.txt cmd_debug_mode.txt added
- File trace_ceph_io_error trace_ceph_io_error added
- File trace_ceph_io_error_2 trace_ceph_io_error_2 added
- File trace_ceph_io_error_3 trace_ceph_io_error_3 added
- File trace_ceph_io_error_4 trace_ceph_io_error_4 added
Hello !
Thank you for your quick answer!
I've activated the debug mode and you can find the outputs after 4 trials on the same file (at the 5th time it actually worked !)
It seems to point on the mds0&1 (2 active mds), for which I also had the issue #24403. Seems to be linked ?
Apart from this point, this does not make much sense to me, sorry...
Can you help a bit more ?
Updated by Zheng Yan almost 6 years ago
There are lots of -EFBIG in the log. have you changed file layout (very large strip unit?)
Updated by Thomas De Maet almost 6 years ago
yes indeed, the file layout is modified:
ceph.dir.layout="stripe_unit=1376256 stripe_count=1 object_size=704643072 pool=cephfs_data_21p3"
the stripe_unit is below the default though (1.31MB vs default 4MB; I wanted to limit the very small files effect on used space)
The object_size is larger (672MB) to reduce the number of objects per file (I have files ~1Go). No doc mention that it can be a wrong setup.
Both are multiple of my erasure coding size (21+3) and 64KB. I feel it was better that way (It is ?).
I had the issue #24402 with the used space. Can be related ?
Thanks
Updated by Zheng Yan almost 6 years ago
big object size causes problem for osd recovery. default value of osd_max_object_size is 128M.
Updated by Zheng Yan almost 6 years ago
- Status changed from New to Fix Under Review
Updated by Zheng Yan almost 6 years ago
- Status changed from Fix Under Review to 4
Updated by Thomas De Maet almost 6 years ago
Hello
Sorry, still doing bunchs of tests...
If I'm not completely misunderstanding how Ceph EC is working, in previous tests the object_size=672MB is resolved into 21 times 32MB objects on OSDs (+3 of coding), which is under the osd_max_object_size=128MB ??
In the last tests I've reduced the object_size to 84MB. At the beginning the issue reappeared several times (some old meta I guess), but now I have copied ~6TB without the bug. I've other issue with the total raw used space exploding, but will open a new ticket for it.
I also have to mention that I work with new linux kernels since resolution of #24403.
As it is a random and rare issue, I cant ensure it is gone. I suppose we can close the bug and I will reopen it if necessary.
Thanks again!