Project

General

Profile

Actions

Bug #46828

closed

cephfs kernel client s390x: Missing the first created directory when running ls

Added by Tuan Hoang almost 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

Hi ceph masters,

I am running ceph octopus from Ubuntu repository on Ubuntu 20.04 s390x arch. The cluster is manually created.

I am seeing a behavior that : when creating directories using cephfs kernel client, the first created directory is always missing by running $ls, BUT if I specifically list it, it is there. For example:

... creating the cluster ...
... creating the data/metadata cephfs pools ...

# mount -t ceph :/ /mnt -o name=admin

# mkdir /mnt/1

# ls -al /mnt
total 4
drwxr-xr-x  3 root root    1 Aug  4 09:24 .
drwxr-xr-x 18 root root 4096 Apr 30 07:35 ..

# ls -al /mnt/1
total 0
drwxr-xr-x 3 root root 1 Aug  4 09:24 ..

# mkdir /mnt/2

# ls -al /mnt
total 4
drwxr-xr-x  4 root root    2 Aug  4 09:24 .
drwxr-xr-x 18 root root 4096 Apr 30 07:35 ..
drwxr-xr-x  2 root root    0 Aug  4 09:24 2

At first, I have seen another behavior is that, all the created directory/ies seem to be missing if they are created within less than 20 seconds after the cluster is up (HEALTH_OK, etc.) and data/metadata pools are created (that means before/during the PG autoscaler is running). But now, this is quite hard to reproduce and I have seen the first directory missing behavior more often and stable. So I think it is more of an issue ...

I have tried many different clusters with configurations:
- Different kernel (5.4.0-37+ on Ubuntu) + ceph deb versions (15.2.1-0ubuntu1 and 15.2.3-0ubuntu0.20.04.1)
- Client machine is either as a node of the cluster (osd, mds, etc.) or outside of the cluster
- Cluster with 1 or 3 MONs/MGRs, 1-3 OSD, 1-3 MDS
- Creating the directory during/after PG autoscaling and/or few seconds after data/metadata pools are created

In the mean time, I'm trying to use a x86_64 client instead of a s390x client. Cluster is still pure s390x. This behavior does not exist on pure x86_64 cluster+client.

I have tried umount and mount again the cephfs on same/different client machines, still the directory is missing when $ls, but shows if specifically list it. So I guess it is more of an issue with MDS rather than the kernel cephfs client, because the cephfs kernel driver seems to correctly tell RADOS backend (and MDS ?) to create the directory because when specifically reading the directory from data pool, it can be read. It is just that when listdir, it fails.

I wonder if in the past if you have seen some kind of symtoms like this. Also the logs don't seem to show anything out of order, if you have any suggestions regarding which part of the log I should look into, please kindly let me know.

Also, I think this is

Thanks !


Files

Actions

Also available in: Atom PDF