Bug #8300
closedRegression in 3.14: "No such device or address" reading file content
0%
Description
On kernel 3.14.2 and ceph 0.72.2, reading from some files gives the error message "No such device or address". Kernel 3.10.36, 3.12.17 and 3.13.11 can successfully read the files content. The file is visible via "ls", but a simple "cat" fails. I see the same error on different nodes and it is persistent across remounts. A ceph-mds debug trace made during the "cat"-test is attached. The file having problems is called "T0.01_N260_S0.005/r21/lowtempshear.h5"
Files
Updated by Greg Farnum almost 10 years ago
Are there any messages in dmesg on the affected node? Do you have debugfs enabled?
Updated by Greg Farnum almost 10 years ago
I should note that the MDS is behaving fine according to that log; Zheng thinks there's been a regression in the CRUSH code since nothing else generates an ENXIO.
Updated by Markus Blank-Burian almost 10 years ago
- File kaa-94.txt kaa-94.txt added
- File osdmap.txt osdmap.txt added
dmesg shows nothing special without debuggung enabled. i attached debug output of kernel as well as the osdmap. can it pose a problem, that there are non-existing host in the latter?
Updated by Ilya Dryomov almost 10 years ago
- Assignee set to Ilya Dryomov
Hi Markus,
Judging by debug output, I'm assuming you can build your own kernels?
Updated by Markus Blank-Burian almost 10 years ago
yes, we build our own kernels, so patching/testing is possible.
Updated by Ilya Dryomov almost 10 years ago
OK, please try the attached patch (on top of 3.14.2) and see if it fixes the problem.
Updated by Markus Blank-Burian almost 10 years ago
Yes, your patch fixes the problem. Thank you very much for looking into this!
Updated by Ilya Dryomov almost 10 years ago
- Status changed from New to Resolved
Great, this patch is in 3.15-rc1 ("crush: fix off-by-one errors in total_tries refactor"). I'll make sure it gets into 3.14 stable.