Project

General

Profile

Actions

Bug #46374

open

ceph-fuse blocks forever, fails to start, emits no errors

Added by John Mulligan almost 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

For testing purposes we're running ceph-fuse in a container (code here [1]), that runs the ceph-fuse command to mount cephfs. Up to and including Ceph v15.2.3 there's been no issue with this.
However, with v15.2.4 the ceph-fuse command now blocks forever with no output or other indication of issues.

I've tried passing '-o debug' as well as '-d' to the command line to enable debugging, but nothing is printed to stdout/stderr.

This strikes me as a regression in behavior because the script has been functioning correctly on ceph luminous through octopus until v15.2.4. However, if there's something the script is doing wrong I'm happy to work with the ceph-fuse experts to improve it as long as the script can function on all the above listed ceph versions.

Since I get no output, I did try running strace on the command. Here's a snippet of output:

994150 openat(AT_FDCWD, "/usr/lib64/ceph/liburcu-common.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
1994150 openat(AT_FDCWD, "/lib64/liburcu-common.so.6", O_RDONLY|O_CLOEXEC) = 3
1994150 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\23\0\0\0\0\0\0"..., 832) = 832
1994150 lseek(3, 13232, SEEK_SET)       = 13232
1994150 read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
1994150 fstat(3, {st_mode=S_IFREG|0755, st_size=21592, ...}) = 0
1994150 lseek(3, 13232, SEEK_SET)       = 13232
1994150 read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
1994150 mmap(NULL, 2113672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4ec2766000
1994150 mprotect(0x7f4ec276a000, 2093056, PROT_NONE) = 0
1994150 mmap(0x7f4ec2969000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f4ec2969000
1994150 mmap(0x7f4ec296a000, 136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4ec296a000
1994150 close(3)                        = 0
1994150 mprotect(0x7f4ec2969000, 4096, PROT_READ) = 0
1994150 mprotect(0x7f4ec2b73000, 4096, PROT_READ) = 0
1994150 mprotect(0x7f4ec2d7b000, 4096, PROT_READ) = 0
1994150 mprotect(0x7f4ec2f86000, 4096, PROT_READ) = 0
1994150 membarrier(MEMBARRIER_CMD_QUERY, 0) = -1 EPERM (Operation not permitted)
1994150 munmap(0x7f4ecf7b0000, 26382)   = 0
1994150 futex(0x7f4ec432b040, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1994150 uname({sysname="Linux", nodename="popcorn", ...}) = 0
1994150 mmap(NULL, 10883072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4ec1d05000
1994150 brk(NULL)                       = 0x56549e154000
1994150 brk(0x56549e176000)             = 0x56549e176000
1994150 getrandom("\xda", 1, 0)         = 1
1994150 getpid()                        = 778
1994150 prctl(PR_GET_NAME, "ceph-fuse") = 0
1994150 futex(0x56549e0f26e8, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
1994150 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
1994150 +++ killed by SIGINT +++
1993556 <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], WSTOPPED|WCONTINUED, NULL) = 778
1993556 rt_sigprocma

Since it was not obviously related to networking or file access issues, I did not glean much from the above but perhaps it can serve as a clue for others.

This issue occurs on my laptop running fedora 31 + podman as well as our ci running ubuntu with docker.

Any guidance to get additional debugging info/state would be appreciated. This is currently blocking our ci.

[1] - https://github.com/ceph/go-ceph/blob/d4440eb8c2966508ca4bf41240e6d3aefd144e97/micro-osd.sh


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #47964: ceph-fuse RPM package must same-version ceph rpmNew

Actions
Actions #1

Updated by John Mulligan almost 4 years ago

Apologies, I found one very important item that I had previously overlooked. Because we use this environment to test our project we need to install the ceph -devel packages, this triggers an upgrade of ceph rpms in the container.
What I missed was that the ceph-fuse package did not get upgraded along with other ceph packages. When running the script most packages were v15.2.4 but ceph-fuse was still ceph-fuse-15.2.3-0.el8.x86_64.
Upgrading the package makes the setup script work again, and ceph-fuse no longer blocks.

I would like to leave this issue open for now as I am not sure if these versions are supposed to work together. If they are not, I would like to suggest a better behavior, such as failing with an error message, rather than blocking forever.

Actions #2

Updated by Patrick Donnelly almost 4 years ago

John Mulligan wrote:

Apologies, I found one very important item that I had previously overlooked. Because we use this environment to test our project we need to install the ceph -devel packages, this triggers an upgrade of ceph rpms in the container.
What I missed was that the ceph-fuse package did not get upgraded along with other ceph packages. When running the script most packages were v15.2.4 but ceph-fuse was still ceph-fuse-15.2.3-0.el8.x86_64.
Upgrading the package makes the setup script work again, and ceph-fuse no longer blocks.

I would like to leave this issue open for now as I am not sure if these versions are supposed to work together. If they are not, I would like to suggest a better behavior, such as failing with an error message, rather than blocking forever.

The ceph-common library is internal and is linked to by most ceph executables. If that changes, then the old executables (like ceph-fuse) won't be able to start. I don't see a good way to give you runtime diagnostics about that. I am curious why it just hangs though.

Actions #3

Updated by Patrick Donnelly over 3 years ago

  • Related to Bug #47964: ceph-fuse RPM package must same-version ceph rpm added
Actions

Also available in: Atom PDF