Project

General

Profile

Actions

Bug #63151

open

reef: cephadm: deployment of nfs over rgw fails with "free(): invalid pointer"

Added by Adam King 7 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When trying to deploy nfs over rgw, the nfs daemon will fail to start. The logs say

Oct 09 13:50:11 smithi031 systemd[1]: Starting Ceph nfs.foo.0.0.smithi031.filhfh for 5ec1f23a-66a7-11ee-8db6-212e2dc638e7...
Oct 09 13:50:12 smithi031 podman[177412]: 
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_logging :LOG :NULL :LOG: Setting log level for all components to NIV_EVENT
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 5.5
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] fsal_init_fds_limit :MDCACHE LRU :EVENT :Setting the system-imposed limit on FDs to 1048576.
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
Oct 09 13:50:12 smithi031 bash[177412]: 390c01b6108f3642a8198bb92183bc5c41d65b452f5fe8f4d22957155488d77b
Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
Oct 09 13:50:12 smithi031 systemd[1]: Started Ceph nfs.foo.0.0.smithi031.filhfh for 5ec1f23a-66a7-11ee-8db6-212e2dc638e7.
Oct 09 13:50:18 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:18 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend
Oct 09 13:50:18 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:18 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0)
Oct 09 13:50:24 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:24 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
Oct 09 13:50:24 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: free(): invalid pointer

with full debug logging enabled we see

ganesha.nfsd-7[main] do_block_init :CONFIG :F_DBG :0x55645ac80eb8 name=FSAL type=CONFIG_BLOCK
ganesha.nfsd-7[main] export_display :EXPORT :M_DBG :DEFAULTS 0x55645ac80df0 Export 1 pseudo ((null)) with path ((null)) and tag ((null)) perms (options=03303002/00000000               ,     ,    ,               ,               ,         ,                ,                ,                )
ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":1): do_block_load EXPORT
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80fa8 name=Export_id type=CONFIG_UINT16
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80ed0 name=Path type=CONFIG_PATH
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80ed8 name=Pseudo type=CONFIG_PATH
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80fa0 name=Security_Label type=CONFIG_BOOLBIT
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Access_Type type=CONFIG_ENUM
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_ENUM Access_Type mask=000001e0 flags=000001e0 value=033031e2
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Protocols type=CONFIG_LIST
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Protocols mask=00700000 flags=00200000 value=032031e2
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Transports type=CONFIG_LIST
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Transports mask=07000000 flags=02000000 value=022031e2
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Squash type=CONFIG_LIST
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Squash mask=00000007 flags=00000000 value=022031e0
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f80 name=Attr_Expiration_Time type=CONFIG_INT32
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80eb8 name=FSAL type=CONFIG_BLOCK
ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): process block FSAL link_mem = 0x55645ac80eb8
ganesha.nfsd-7[main] fsal_init :CONFIG :F_DBG :Allocating args 0x55645ac80eb8/0x55645b6df930
ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): do_block_init FSAL
ganesha.nfsd-7[main] do_block_init :CONFIG :F_DBG :0x55645b6df930 name=Name type=CONFIG_STRING
ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): do_block_load FSAL
ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645b6df930 name=Name type=CONFIG_STRING
ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): commit FSAL
ganesha.nfsd-7[main] fsal_cfg_commit :EXPORT :F_DBG :get export ref for id 1 /, refcount = 2
ganesha.nfsd-7[main] gsh_refstr_release :EXPORT :F_DBG :Releasing refstr /
ganesha.nfsd-7[main] gsh_refstr_release :EXPORT :F_DBG :Releasing refstr /foouser
ganesha.nfsd-7[main] lookup_fsal :RW LOCK :F_DBG :Acquired mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:437
ganesha.nfsd-7[main] lookup_fsal :RW LOCK :F_DBG :Released mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:451
ganesha.nfsd-7[main] load_fsal :RW LOCK :F_DBG :Acquired mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:306
ganesha.nfsd-7[main] load_fsal :RW LOCK :F_DBG :Released mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:324
ganesha.nfsd-7[main] load_fsal :NFS STARTUP :DEBUG :Loading FSAL RGW with /usr/lib64/ganesha/libfsalrgw.so
free(): invalid pointer

The last line before the invalid pointed error seems to imply an issue with /usr/lib64/ganesha/libfsalrgw.so

Running gdb on the ganesha process gave the trace

(gdb) bt
#0  0x00007f1ba1930acf in raise () from target:/lib64/libc.so.6
#1  0x00007f1ba1903ea5 in abort () from target:/lib64/libc.so.6
#2  0x00007f1ba1971cc7 in __libc_message () from target:/lib64/libc.so.6
#3  0x00007f1ba1978fcc in malloc_printerr () from target:/lib64/libc.so.6
#4  0x00007f1ba197ab54 in _int_free () from target:/lib64/libc.so.6
#5  0x00007f1b672cafe1 in arrow::internal::GetEnvVar[abi:cxx11](char const*) () from target:/lib64/librgw.so.2
#6  0x00007f1b6719c43d in arrow::(anonymous namespace)::DefaultBackend() () from target:/lib64/librgw.so.2
#7  0x00007f1b6719d1d0 in arrow::default_memory_pool() () from target:/lib64/librgw.so.2
#8  0x00007f1b664b6b9a in _GLOBAL__sub_I_interfaces.cc () from target:/lib64/librgw.so.2
#9  0x00007f1ba43a6f4a in call_init (l=<optimized out>, argc=argc@entry=6, argv=argv@entry=0x7ffdd5523f48, env=env@entry=0x7ffdd5523f80) at dl-init.c:72
#10 0x00007f1ba43a704a in call_init (env=0x7ffdd5523f80, argv=0x7ffdd5523f48, argc=6, l=<optimized out>) at dl-init.c:30
#11 _dl_init (main_map=0x560f5db457e0, argc=6, argv=0x7ffdd5523f48, env=0x7ffdd5523f80) at dl-init.c:119
#12 0x00007f1ba1a4ae2c in _dl_catch_exception () from target:/lib64/libc.so.6
#13 0x00007f1ba43ae79e in dl_open_worker (a=0x7ffdd5522ff0) at dl-open.c:794
#14 dl_open_worker (a=0x7ffdd5522ff0) at dl-open.c:757
#15 0x00007f1ba1a4add4 in _dl_catch_exception () from target:/lib64/libc.so.6
#16 0x00007f1ba43ae981 in _dl_open (file=0x7ffdd55232a0 "/usr/lib64/ganesha/libfsalrgw.so", mode=-2147483638, caller_dlopen=0x7f1ba3fd3b7d <load_fsal+477>, 
    nsid=<optimized out>, argc=6, argv=<optimized out>, env=0x7ffdd5523f80) at dl-open.c:876
#17 0x00007f1ba2e8df8a in dlopen_doit () from target:/lib64/libdl.so.2
#18 0x00007f1ba1a4add4 in _dl_catch_exception () from target:/lib64/libc.so.6
#19 0x00007f1ba1a4ae93 in _dl_catch_error () from target:/lib64/libc.so.6
#20 0x00007f1ba2e8e52e in _dlerror_run () from target:/lib64/libdl.so.2
#21 0x00007f1ba2e8e02a in dlopen@@GLIBC_2.2.5 () from target:/lib64/libdl.so.2
#22 0x00007f1ba3fd3b7d in load_fsal () from target:/lib64/libganesha_nfsd.so.5.5
#23 0x00007f1ba3fd4e39 in fsal_load_init () from target:/lib64/libganesha_nfsd.so.5.5
#24 0x00007f1ba406047a in fsal_cfg_commit () from target:/lib64/libganesha_nfsd.so.5.5
#25 0x00007f1ba401378c in proc_block () from target:/lib64/libganesha_nfsd.so.5.5
#26 0x00007f1ba4013059 in proc_block () from target:/lib64/libganesha_nfsd.so.5.5
#27 0x00007f1ba4014237 in load_config_from_parse () from target:/lib64/libganesha_nfsd.so.5.5
#28 0x00007f1ba4061477 in ReadExports () from target:/lib64/libganesha_nfsd.so.5.5
#29 0x0000560f5c17c86e in main ()

although if there is some extra debuginfo out there for ganesha I didn't have it.

Ganesha version from the container

[root@smithi031 /]# /usr/bin/ganesha.nfsd -v       
NFS-Ganesha Release = V5.5

Installed ganesha packages in the container

[root@smithi031 /]# rpm -qa | grep ganesha
nfs-ganesha-5.5-2.el8s.x86_64
nfs-ganesha-rados-grace-5.5-2.el8s.x86_64
nfs-ganesha-selinux-5.5-2.el8s.noarch
nfs-ganesha-ceph-5.5-2.el8s.x86_64
nfs-ganesha-rados-urls-5.5-2.el8s.x86_64
nfs-ganesha-rgw-5.5-2.el8s.x86_64
Actions #2

Updated by Adam King 7 months ago

the ganesha conf of the nfs daemon

[root@smithi031 ~]# cat /var/lib/ceph/5ec1f23a-66a7-11ee-8db6-212e2dc638e7/nfs.foo.0.0.smithi031.filhfh/etc/ganesha/ganesha.conf 
# This file is generated by cephadm.
NFS_CORE_PARAM {
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 4;
        NFS_Port = 12049;
}

NFSv4 {
        Delegations = false;
        RecoveryBackend = 'rados_cluster';
        Minor_Versions = 1, 2;
}

RADOS_KV {
        UserId = "nfs.foo.0.0.smithi031.filhfh";
        nodeid = "nfs.foo.0";
        pool = ".nfs";
        namespace = "foo";
}

RADOS_URLS {
        UserId = "nfs.foo.0.0.smithi031.filhfh";
        watch_url = "rados://.nfs/foo/conf-nfs.foo";
}

RGW {
        cluster = "ceph";
        name = "client.nfs.foo.0.0.smithi031.filhfh-rgw";
}

%url    rados://.nfs/foo/conf-nfs.foo

Actions #3

Updated by Adam King 7 months ago

export info

[ceph: root@smithi031 /]# ceph nfs export info foo /foouser
{
  "access_type": "RW",
  "clients": [],
  "cluster_id": "foo",
  "export_id": 1,
  "fsal": {
    "access_key_id": "X6S22LRHDZ06FUUPMHFZ",
    "name": "RGW",
    "secret_access_key": "SPqEgOygIQQ4bHJPKaUMD3kzoWHv68qXONmDdKbs",
    "user_id": "foouser" 
  },
  "path": "/",
  "protocols": [
    4
  ],
  "pseudo": "/foouser",
  "security_label": true,
  "squash": "none",
  "transports": [
    "TCP" 
  ]
}

Actions #4

Updated by Matt Benjamin 6 months ago

I think what you're experiencing is:

https://tracker.ceph.com/issues/63394

This fix hasn't merged yet, but it passed our downstream QA (rhcs-7.0).

regards,

Matt

Actions

Also available in: Atom PDF