Project

General

Profile

Actions

Bug #24405

closed

I/O errors reported by client copy apps (rsync/cp)

Added by Thomas De Maet almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Globally cephfs is working fine (thanks for that !), except for one point: sometimes, after a random load of copy work, the filesystem send an I/O error to the program.

(Mounted with "mount -t ceph" on a client also in the role of osd-pool/mgr/mon/mds.)

In a copy process (~30TB), it kills rsync or skip files in cp:

tom@telegeo02:~> rsync -avvv /somewhere/* /mnt/cephfs/pool_21p3/
...
S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_14QPF____20161228_QLT_R2.DBL.TIF
     90,464,592 100%   63.53MB/s    0:00:01 (xfr#31253, ir-chk=1014/45890)
S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_14QPF____20161228_QLT_R2.HDR
          2,946 100%    8.01kB/s    0:00:00 (xfr#31254, ir-chk=1013/45890)
S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QPF____20161228_FRE_R1.DBL.TIF
    521,175,040  54%   18.79MB/s    0:00:23  
rsync: [sender] write error: Broken pipe (32)
rsync: write failed on "/mnt/cephfs/pool_21p3/S2A_MSIL2A_20161228T170712_N0204_R069_T14QPF_20161228T171337.SAFE/S2A_OPER_SSC_L2VALD_14QPF____20161228.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QPF____20161228_FRE_R1.DBL.TIF": Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
tom@telegeo02:~> cp -rnv /somewhere/* /mnt/cephfs/pool_21p3/ > /export/miro/tom/logs/log_ceph_testcp
cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170223T165311_N0204_R026_T14QNE_20170223T170837.SAFE/S2A_OPER_SSC_L2VALD_14QNE____20170223.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QNE____20170223_FRE_R1.DBL.TIF': Input/output error
cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170223T165311_N0204_R026_T14QQK_20170223T165747.SAFE/S2A_OPER_SSC_L2VALD_14QQK____20170223.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_14QQK____20170223_FRE_R1.DBL.TIF': Input/output error
cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15PXT_20170224T163738.SAFE/S2A_OPER_SSC_L2VALD_15PXT____20170224.DBL.DIR/S2A_OPER_SSC_PDTANX_L2VALD_15PXT____20170224_QLT_R1.DBL.TIF': Input/output error
cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15QZE_20170224T162512.SAFE/S2A_OPER_SSC_L2VALD_15QZE____20170224.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_15QZE____20170224_FRE_R1.DBL.TIF': Input/output error
cp: error writing '/mnt/cephfs/pool_21p3/S2A_MSIL2A_20170224T162331_N0204_R040_T15QZV_20170224T163738.SAFE/S2A_OPER_SSC_L2VALD_15QZV____20170224.DBL.DIR/S2A_OPER_SSC_PDTIMG_L2VALD_15QZV____20170224_FRE_R1.DBL.TIF': Input/output error

I'm using deprecated hardware to make "real-world" stress tests and as you can see (in the status file linked) 3 osd are already dead. When they died, such an error appeared. That is a logical reason but not ok for prod. This is also the case when the osd get killed because of a "no memory left" error (160% of max use !), issue solved now (set to 512MB => 1G used). However, the error is still present, and I have no trace in any log ! (nor syslog or ceph logs, checked on all machines).

For me, there are 2 issues:

- any disk error prompted before the osd is down should be handled, if not at the osd-level, at least by the ceph client. Only if the error is "real" (eg. not enough osds present to write data) it can be reported on the filesystem users as an I/O error.

- there is literally no information to locate the source of the remaining issue (a disfunctionning disk ? an network/memory overload ?). The info in /sys/kernel/debug/ceph/ is not helping (most of the time empty => seems not to be a "lag" issue).

Any idea to locate the issue / any fix is welcome !

Thanks!


Files

status.txt (9.61 KB) status.txt Thomas De Maet, 06/04/2018 11:35 AM
cmd_debug_mode.txt (10.7 KB) cmd_debug_mode.txt commands used Thomas De Maet, 06/04/2018 01:36 PM
trace_ceph_io_error (679 KB) trace_ceph_io_error 10000 lines after 1st bug Thomas De Maet, 06/04/2018 01:36 PM
trace_ceph_io_error_2 (720 KB) trace_ceph_io_error_2 10000 lines after 2nd bug Thomas De Maet, 06/04/2018 01:36 PM
trace_ceph_io_error_3 (674 KB) trace_ceph_io_error_3 10000 lines after 3rd bug Thomas De Maet, 06/04/2018 01:36 PM
trace_ceph_io_error_4 (675 KB) trace_ceph_io_error_4 10000 lines after 4th bug Thomas De Maet, 06/04/2018 01:36 PM
Actions #1

Updated by Zheng Yan almost 6 years ago

After errors happen, please retry rsync the files that failed to write. If errors still happen, enable ceph debugging (echo module ceph +p /sys/kernel/debug/dynamic_debug/control; echo module libceph +p /sys/kernel/debug/dynamic_debug/control) and retry again. The log should show where the errors are from.

Updated by Thomas De Maet almost 6 years ago

Hello !

Thank you for your quick answer!

I've activated the debug mode and you can find the outputs after 4 trials on the same file (at the 5th time it actually worked !)

It seems to point on the mds0&1 (2 active mds), for which I also had the issue #24403. Seems to be linked ?
Apart from this point, this does not make much sense to me, sorry...

Can you help a bit more ?

Actions #3

Updated by Zheng Yan almost 6 years ago

There are lots of -EFBIG in the log. have you changed file layout (very large strip unit?)

Actions #4

Updated by Thomas De Maet almost 6 years ago

yes indeed, the file layout is modified:

ceph.dir.layout="stripe_unit=1376256 stripe_count=1 object_size=704643072 pool=cephfs_data_21p3"

the stripe_unit is below the default though (1.31MB vs default 4MB; I wanted to limit the very small files effect on used space)

The object_size is larger (672MB) to reduce the number of objects per file (I have files ~1Go). No doc mention that it can be a wrong setup.

Both are multiple of my erasure coding size (21+3) and 64KB. I feel it was better that way (It is ?).

I had the issue #24402 with the used space. Can be related ?

Thanks

Actions #5

Updated by Zheng Yan almost 6 years ago

big object size causes problem for osd recovery. default value of osd_max_object_size is 128M.

Actions #6

Updated by Zheng Yan almost 6 years ago

  • Status changed from New to Fix Under Review
Actions #7

Updated by Zheng Yan almost 6 years ago

  • Status changed from Fix Under Review to 4
Actions #8

Updated by Thomas De Maet almost 6 years ago

Hello

Sorry, still doing bunchs of tests...

If I'm not completely misunderstanding how Ceph EC is working, in previous tests the object_size=672MB is resolved into 21 times 32MB objects on OSDs (+3 of coding), which is under the osd_max_object_size=128MB ??

In the last tests I've reduced the object_size to 84MB. At the beginning the issue reappeared several times (some old meta I guess), but now I have copied ~6TB without the bug. I've other issue with the total raw used space exploding, but will open a new ticket for it.

I also have to mention that I work with new linux kernels since resolution of #24403.

As it is a random and rare issue, I cant ensure it is gone. I suppose we can close the bug and I will reopen it if necessary.

Thanks again!

Actions #9

Updated by Zheng Yan almost 6 years ago

  • Status changed from 4 to Closed
Actions

Also available in: Atom PDF