Bug #17825: 4.8.6's cephfs.ko can't read any files from old fs running ceph-10.2.3, but 4.7.9's could - Linux kernel client - Ceph

Actions

Copy link

Bug #17825

closed

4.8.6's cephfs.ko can't read any files from old fs running ceph-10.2.3, but 4.7.9's could

Added by Alexandre Oliva over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Zheng Yan

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

kcephfs

Crash signature (v1):

Crash signature (v2):

Description

After I upgraded my cephfs.ko clients from 4.7.9 to 4.8.6, I had to resort to ceph-fuse to access a cephfs I've had for several years. cephfs.ko can list directories all right, but attempting to read files yields EIO, and dmesg logs such stuff as:
WARNING: CPU: 0 PID: 25961 at net/ceph/osd_client.c:550 ceph_osdc_alloc_messages+0x171/0x1a0 [libceph]
[38009.116654] Call Trace:
[38009.116662] [<ffffffffb13e5ebd>] dump_stack+0x63/0x86
[38009.116669] [<ffffffffb10a0e8b>] __warn+0xcb/0xf0
[38009.116677] [<ffffffffb10a0fbd>] warn_slowpath_null+0x1d/0x20
[38009.116703] [<ffffffffc0b22151>] ceph_osdc_alloc_messages+0x171/0x1a0 [libceph]
[38009.116730] [<ffffffffc0b81952>] ceph_pool_perm_check+0x5e2/0x980 [ceph]
[38009.116758] [<ffffffffc0b88c2b>] ceph_get_caps+0x4b/0x3d0 [ceph]
[38009.116782] [<ffffffffc0b7a2f0>] ? ceph_write_iter+0xbe0/0xbe0 [ceph]
[38009.116789] [<ffffffffb12725a4>] ? mntput+0x24/0x40
[38009.116796] [<ffffffffb125ad80>] ? terminate_walk+0xe0/0xf0
[38009.116821] [<ffffffffc0b7a725>] ceph_read_iter+0xe5/0xb40 [ceph]
[38009.116848] [<ffffffffc0b85019>] ? __ceph_caps_issued_mask+0x1e9/0x1f0 [ceph]
[38009.116856] [<ffffffffb124f7e2>] __vfs_read+0xe2/0x150
[38009.116863] [<ffffffffb12508c6>] vfs_read+0x96/0x130
[38009.116870] [<ffffffffb1251da5>] SyS_read+0x55/0xc0
[38009.116877] [<ffffffffb1802572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[38009.116923] --[ end trace 6bff712d7c4cbd24 ]---
[38024.526160] libceph: tid 1 pool does not exist
[38024.526184] libceph: tid 2 pool does not exist

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Assignee set to Zheng Yan

Did we do something terrible with pool namespaces or something that caused this? :/

Actions

Copy link

Updated by Zheng Yan over 7 years ago

which version of MDS do you use?

Actions

Copy link

Updated by Alexandre Oliva over 7 years ago

All of the userland components are running 10.2.3.

Actions

Copy link

Updated by Alexandre Oliva over 7 years ago

Although I'm running 10.2.3 now, the filesystem is pretty old. I don't recall what version I was running when it was created, but maybe this mds dump can offer some useful insights:

created 2013-03-02 22:37:37.588505
modified 2016-11-09 02:53:22.161632
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 486230
last_failure_osd_epoch 322601
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in 0
up {0=7095734}
failed
damaged
stopped
data_pools 0
metadata_pool 1
inline_data disabled
7095734: 172.31.160.7:6800/26934 'frit' mds.0.535650 up:active seq 20082 (standby for rank 0)