Project

General

Profile

Bug #47033

Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'

client: inode ref leak

Added by Zheng Yan 5 months ago. Updated 5 months ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature:

Description

It can be easily reproduced by following program.

#define _FILE_OFFSET_BITS 64
#include <features.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <cephfs/libcephfs.h>

int main(int argc, char *argv[]) {

        struct ceph_mount_info *cmount = NULL;
        int n = 64;
        bool parent = true;

        if (argc > 2)
                n = atoi(argv[2]);

        while (--n >= 0) {
                pid_t pid = fork();
                if (pid < 0) {
                        printf("fork fail %d\n", pid);
                        exit(-1);
                }
                if (pid == 0) {
                        parent = false;
                        break;
                }
        }
        if (parent) {
                pid_t pid;
                int status;
                while ((pid = wait(&status)) > 0);
                return 0;
        }

        ceph_create(&cmount, "admin");
        ceph_conf_read_file(cmount, "./ceph.conf");
        ceph_mount(cmount, NULL);

        ceph_chdir(cmount, argv[1]);

        char buf[4096];
        sprintf(buf, "dir%d", n);
        int ret = ceph_mkdir(cmount, buf, 0755);
        if (ret < 0 && ret != -EEXIST) {
                printf("ceph_mkdir fail %d\n", ret);
                return 0;
        }

        ceph_chdir(cmount, buf);

        /*
        struct ceph_dir_result *dirp;
        ret = ceph_opendir(cmount, ".", &dirp);
        if (ret < 0) {
                printf("ceph_opendir fail %d\n", ret);
                return 0;
        }

        while (ceph_readdir(cmount, dirp))
                ;

        ceph_closedir(cmount, dirp);
        */

        int count = 0;
        time_t start = time(NULL);
        for (int i = 0; i < 20000; ++i) {
                sprintf(buf, "file%d", i, i);
                int fd = ceph_open(cmount, buf, O_CREAT|O_RDONLY, 0644);
                if (fd < 0) {
                        printf("ceph_open fail %d\n", fd);
                        exit(-1);
                }
                /*
                ret = ceph_fchmod(cmount, fd, 0666);
                if (ret < 0) {
                        printf("ceph_fchmod fail %d\n", ret);
                        exit(-1);
                }
                */

                ceph_close(cmount, fd);
                count++;
                if (time(NULL) > start) {
                        printf("%d\n", count);
                        count = 0;
                        start = time(NULL);
                }
        }
        ceph_unmount(cmount);
        return 0;
}

pre-create testdir at root of cephfs, change mode of testdir to 0777.

repeatedly run './test_create testdir 1' (without removing cleanup data)

last good commit is aef8569b807dc946f7dabc44b20c5d986c44e364. taking client_lock in Client::put_inode does not work

History

#1 Updated by Xiubo Li 5 months ago

  • Status changed from New to In Progress

I will take a look of this. Thanks :-)

#2 Updated by Xiubo Li 5 months ago

Zheng Yan wrote:

It can be easily reproduced by following program.

[...]

pre-create testdir at root of cephfs, change mode of testdir to 0777.

repeatedly run './test_create testdir 1' (without removing cleanup data)

last good commit is aef8569b807dc946f7dabc44b20c5d986c44e364. taking client_lock in Client::put_inode does not work

BTW, the above commit is invalid, and also couldn't get any info from the "Client::put_inode does not work".

#3 Updated by Zheng Yan 5 months ago

good commit is c8b5f84f49ef74609ba3ea69dea0764ef925ae85

#4 Updated by Xiubo Li 5 months ago

With [1] and [2] I have run the test for very long time and didn't see any errors.

[1] https://github.com/ceph/ceph/pull/36580
[2] https://github.com/ceph/ceph/pull/36553

#5 Updated by Patrick Donnelly 5 months ago

  • Priority changed from Normal to High
  • Target version set to v16.0.0
  • Source set to Development
  • Backport set to octopus,nautilus
  • Component(FS) Client added

#6 Updated by Xiubo Li 5 months ago

  • Status changed from In Progress to Duplicate
  • Parent task set to #46882

#7 Updated by Xiubo Li 5 months ago

Xiubo Li wrote:

With [1] and [2] I have run the test for very long time and didn't see any errors.

[1] https://github.com/ceph/ceph/pull/36580
[2] https://github.com/ceph/ceph/pull/36553

Ran this for a whole night, and didn't reproduce it with the above [1].

#8 Updated by Zheng Yan 5 months ago

  • Status changed from Duplicate to New

It fails immediately with following trace.

/home/zhyan/Ceph/ceph/src/client/Client.cc: In function 'void Client::delay_put_requests(bool)' thread 7fffee7c0080 time 2020-08-20T14:05:43.636450+0800
/home/zhyan/Ceph/ceph/src/client/Client.cc: 1922: FAILED ceph_assert(!true)
ceph version 16.0.0-4491-gc7857aef5a (c7857aef5a46841cd201faeb3f6e6589bb1a33dc) pacific (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fffeef1f76e]
2: (()+0x2518f9) [0x7fffeef1f8f9]
3: (()+0x45ef0) [0x7ffff7e89ef0]
4: (()+0xac143) [0x7ffff7ef0143]
5: (()+0xacb62) [0x7ffff7ef0b62]
6: (ceph_mount()+0x88) [0x7ffff7e7a0b8]
7: ./test_create() [0x4012fb]
8: (__libc_start_main()+0xf2) [0x7ffff7950042]
9: ./test_create() [0x40114e]

https://github.com/ukernel/ceph/commits/wip-47033

#9 Updated by Zheng Yan 5 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF