Bug #46374
openceph-fuse blocks forever, fails to start, emits no errors
0%
Description
For testing purposes we're running ceph-fuse in a container (code here [1]), that runs the ceph-fuse command to mount cephfs. Up to and including Ceph v15.2.3 there's been no issue with this.
However, with v15.2.4 the ceph-fuse command now blocks forever with no output or other indication of issues.
I've tried passing '-o debug' as well as '-d' to the command line to enable debugging, but nothing is printed to stdout/stderr.
This strikes me as a regression in behavior because the script has been functioning correctly on ceph luminous through octopus until v15.2.4. However, if there's something the script is doing wrong I'm happy to work with the ceph-fuse experts to improve it as long as the script can function on all the above listed ceph versions.
Since I get no output, I did try running strace on the command. Here's a snippet of output:
994150 openat(AT_FDCWD, "/usr/lib64/ceph/liburcu-common.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 1994150 openat(AT_FDCWD, "/lib64/liburcu-common.so.6", O_RDONLY|O_CLOEXEC) = 3 1994150 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\23\0\0\0\0\0\0"..., 832) = 832 1994150 lseek(3, 13232, SEEK_SET) = 13232 1994150 read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32 1994150 fstat(3, {st_mode=S_IFREG|0755, st_size=21592, ...}) = 0 1994150 lseek(3, 13232, SEEK_SET) = 13232 1994150 read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32 1994150 mmap(NULL, 2113672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4ec2766000 1994150 mprotect(0x7f4ec276a000, 2093056, PROT_NONE) = 0 1994150 mmap(0x7f4ec2969000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f4ec2969000 1994150 mmap(0x7f4ec296a000, 136, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4ec296a000 1994150 close(3) = 0 1994150 mprotect(0x7f4ec2969000, 4096, PROT_READ) = 0 1994150 mprotect(0x7f4ec2b73000, 4096, PROT_READ) = 0 1994150 mprotect(0x7f4ec2d7b000, 4096, PROT_READ) = 0 1994150 mprotect(0x7f4ec2f86000, 4096, PROT_READ) = 0 1994150 membarrier(MEMBARRIER_CMD_QUERY, 0) = -1 EPERM (Operation not permitted) 1994150 munmap(0x7f4ecf7b0000, 26382) = 0 1994150 futex(0x7f4ec432b040, FUTEX_WAKE_PRIVATE, 2147483647) = 0 1994150 uname({sysname="Linux", nodename="popcorn", ...}) = 0 1994150 mmap(NULL, 10883072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4ec1d05000 1994150 brk(NULL) = 0x56549e154000 1994150 brk(0x56549e176000) = 0x56549e176000 1994150 getrandom("\xda", 1, 0) = 1 1994150 getpid() = 778 1994150 prctl(PR_GET_NAME, "ceph-fuse") = 0 1994150 futex(0x56549e0f26e8, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 1994150 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- 1994150 +++ killed by SIGINT +++ 1993556 <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], WSTOPPED|WCONTINUED, NULL) = 778 1993556 rt_sigprocma
Since it was not obviously related to networking or file access issues, I did not glean much from the above but perhaps it can serve as a clue for others.
This issue occurs on my laptop running fedora 31 + podman as well as our ci running ubuntu with docker.
Any guidance to get additional debugging info/state would be appreciated. This is currently blocking our ci.
[1] - https://github.com/ceph/go-ceph/blob/d4440eb8c2966508ca4bf41240e6d3aefd144e97/micro-osd.sh