Bug #8300
closed
Regression in 3.14: "No such device or address" reading file content
Added by Markus Blank-Burian almost 10 years ago.
Updated almost 10 years ago.
Description
On kernel 3.14.2 and ceph 0.72.2, reading from some files gives the error message "No such device or address". Kernel 3.10.36, 3.12.17 and 3.13.11 can successfully read the files content. The file is visible via "ls", but a simple "cat" fails. I see the same error on different nodes and it is persistent across remounts. A ceph-mds debug trace made during the "cat"-test is attached. The file having problems is called "T0.01_N260_S0.005/r21/lowtempshear.h5"
Files
Are there any messages in dmesg on the affected node? Do you have debugfs enabled?
I should note that the MDS is behaving fine according to that log; Zheng thinks there's been a regression in the CRUSH code since nothing else generates an ENXIO.
dmesg shows nothing special without debuggung enabled. i attached debug output of kernel as well as the osdmap. can it pose a problem, that there are non-existing host in the latter?
- Assignee set to Ilya Dryomov
Hi Markus,
Judging by debug output, I'm assuming you can build your own kernels?
yes, we build our own kernels, so patching/testing is possible.
OK, please try the attached patch (on top of 3.14.2) and see if it fixes the problem.
Yes, your patch fixes the problem. Thank you very much for looking into this!
- Status changed from New to Resolved
Great, this patch is in 3.15-rc1 ("crush: fix off-by-one errors in total_tries refactor"). I'll make sure it gets into 3.14 stable.
Also available in: Atom
PDF