Project

General

Profile

Bug #17825

4.8.6's cephfs.ko can't read any files from old fs running ceph-10.2.3, but 4.7.9's could

Added by Alexandre Oliva almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature:

Description

After I upgraded my cephfs.ko clients from 4.7.9 to 4.8.6, I had to resort to ceph-fuse to access a cephfs I've had for several years. cephfs.ko can list directories all right, but attempting to read files yields EIO, and dmesg logs such stuff as:
WARNING: CPU: 0 PID: 25961 at net/ceph/osd_client.c:550 ceph_osdc_alloc_messages+0x171/0x1a0 [libceph]
[38009.116654] Call Trace:
[38009.116662] [<ffffffffb13e5ebd>] dump_stack+0x63/0x86
[38009.116669] [<ffffffffb10a0e8b>] __warn+0xcb/0xf0
[38009.116677] [<ffffffffb10a0fbd>] warn_slowpath_null+0x1d/0x20
[38009.116703] [<ffffffffc0b22151>] ceph_osdc_alloc_messages+0x171/0x1a0 [libceph]
[38009.116730] [<ffffffffc0b81952>] ceph_pool_perm_check+0x5e2/0x980 [ceph]
[38009.116758] [<ffffffffc0b88c2b>] ceph_get_caps+0x4b/0x3d0 [ceph]
[38009.116782] [<ffffffffc0b7a2f0>] ? ceph_write_iter+0xbe0/0xbe0 [ceph]
[38009.116789] [<ffffffffb12725a4>] ? mntput+0x24/0x40
[38009.116796] [<ffffffffb125ad80>] ? terminate_walk+0xe0/0xf0
[38009.116821] [<ffffffffc0b7a725>] ceph_read_iter+0xe5/0xb40 [ceph]
[38009.116848] [<ffffffffc0b85019>] ? __ceph_caps_issued_mask+0x1e9/0x1f0 [ceph]
[38009.116856] [<ffffffffb124f7e2>] __vfs_read+0xe2/0x150
[38009.116863] [<ffffffffb12508c6>] vfs_read+0x96/0x130
[38009.116870] [<ffffffffb1251da5>] SyS_read+0x55/0xc0
[38009.116877] [<ffffffffb1802572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[38009.116923] --
[ end trace 6bff712d7c4cbd24 ]---
[38024.526160] libceph: tid 1 pool does not exist
[38024.526184] libceph: tid 2 pool does not exist

History

#1 Updated by Greg Farnum almost 4 years ago

  • Assignee set to Zheng Yan

Did we do something terrible with pool namespaces or something that caused this? :/

#2 Updated by Zheng Yan almost 4 years ago

which version of MDS do you use?

#3 Updated by Alexandre Oliva almost 4 years ago

All of the userland components are running 10.2.3.

#4 Updated by Alexandre Oliva almost 4 years ago

Although I'm running 10.2.3 now, the filesystem is pretty old. I don't recall what version I was running when it was created, but maybe this mds dump can offer some useful insights:

created 2013-03-02 22:37:37.588505
modified 2016-11-09 02:53:22.161632
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 486230
last_failure_osd_epoch 322601
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in 0
up {0=7095734}
failed
damaged
stopped
data_pools 0
metadata_pool 1
inline_data disabled
7095734: 172.31.160.7:6800/26934 'frit' mds.0.535650 up:active seq 20082 (standby for rank 0)

#5 Updated by Zheng Yan almost 4 years ago

'data_pools 0' likely causes this issue. I will check

#7 Updated by Alexandre Oliva almost 4 years ago

Thanks for the patch. I've just rebuild libceph with that change, and with it I no longer have problems accessing the file data.

#9 Updated by Ilya Dryomov almost 4 years ago

  • Status changed from Pending Backport to Resolved

in 4.8.9.

Also available in: Atom PDF