Bug #63151
openreef: cephadm: deployment of nfs over rgw fails with "free(): invalid pointer"
0%
Description
When trying to deploy nfs over rgw, the nfs daemon will fail to start. The logs say
Oct 09 13:50:11 smithi031 systemd[1]: Starting Ceph nfs.foo.0.0.smithi031.filhfh for 5ec1f23a-66a7-11ee-8db6-212e2dc638e7... Oct 09 13:50:12 smithi031 podman[177412]: Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_logging :LOG :NULL :LOG: Setting log level for all components to NIV_EVENT Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 5.5 Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] fsal_init_fds_limit :MDCACHE LRU :EVENT :Setting the system-imposed limit on FDs to 1048576. Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. Oct 09 13:50:12 smithi031 bash[177412]: 390c01b6108f3642a8198bb92183bc5c41d65b452f5fe8f4d22957155488d77b Oct 09 13:50:12 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:12 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 Oct 09 13:50:12 smithi031 systemd[1]: Started Ceph nfs.foo.0.0.smithi031.filhfh for 5ec1f23a-66a7-11ee-8db6-212e2dc638e7. Oct 09 13:50:18 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:18 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend Oct 09 13:50:18 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:18 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) Oct 09 13:50:24 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: 09/10/2023 13:50:24 : epoch 65240514 : smithi031 : ganesha.nfsd-7[main] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE Oct 09 13:50:24 smithi031 ceph-5ec1f23a-66a7-11ee-8db6-212e2dc638e7-nfs-foo-0-0-smithi031-filhfh[177429]: free(): invalid pointer
with full debug logging enabled we see
ganesha.nfsd-7[main] do_block_init :CONFIG :F_DBG :0x55645ac80eb8 name=FSAL type=CONFIG_BLOCK ganesha.nfsd-7[main] export_display :EXPORT :M_DBG :DEFAULTS 0x55645ac80df0 Export 1 pseudo ((null)) with path ((null)) and tag ((null)) perms (options=03303002/00000000 , , , , , , , , ) ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":1): do_block_load EXPORT ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80fa8 name=Export_id type=CONFIG_UINT16 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80ed0 name=Path type=CONFIG_PATH ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80ed8 name=Pseudo type=CONFIG_PATH ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80fa0 name=Security_Label type=CONFIG_BOOLBIT ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Access_Type type=CONFIG_ENUM ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_ENUM Access_Type mask=000001e0 flags=000001e0 value=033031e2 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Protocols type=CONFIG_LIST ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Protocols mask=00700000 flags=00200000 value=032031e2 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Transports type=CONFIG_LIST ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Transports mask=07000000 flags=02000000 value=022031e2 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 name=Squash type=CONFIG_LIST ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f84 CONFIG_LIST Squash mask=00000007 flags=00000000 value=022031e0 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80f80 name=Attr_Expiration_Time type=CONFIG_INT32 ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645ac80eb8 name=FSAL type=CONFIG_BLOCK ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): process block FSAL link_mem = 0x55645ac80eb8 ganesha.nfsd-7[main] fsal_init :CONFIG :F_DBG :Allocating args 0x55645ac80eb8/0x55645b6df930 ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): do_block_init FSAL ganesha.nfsd-7[main] do_block_init :CONFIG :F_DBG :0x55645b6df930 name=Name type=CONFIG_STRING ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): do_block_load FSAL ganesha.nfsd-7[main] do_block_load :CONFIG :F_DBG :0x55645b6df930 name=Name type=CONFIG_STRING ganesha.nfsd-7[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/foo/export-1":2): commit FSAL ganesha.nfsd-7[main] fsal_cfg_commit :EXPORT :F_DBG :get export ref for id 1 /, refcount = 2 ganesha.nfsd-7[main] gsh_refstr_release :EXPORT :F_DBG :Releasing refstr / ganesha.nfsd-7[main] gsh_refstr_release :EXPORT :F_DBG :Releasing refstr /foouser ganesha.nfsd-7[main] lookup_fsal :RW LOCK :F_DBG :Acquired mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:437 ganesha.nfsd-7[main] lookup_fsal :RW LOCK :F_DBG :Released mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:451 ganesha.nfsd-7[main] load_fsal :RW LOCK :F_DBG :Acquired mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:306 ganesha.nfsd-7[main] load_fsal :RW LOCK :F_DBG :Released mutex 0x7f1fc30eaec0 (&fsal_lock) at /builddir/build/BUILD/nfs-ganesha-5.5/src/FSAL/fsal_manager.c:324 ganesha.nfsd-7[main] load_fsal :NFS STARTUP :DEBUG :Loading FSAL RGW with /usr/lib64/ganesha/libfsalrgw.so free(): invalid pointer
The last line before the invalid pointed error seems to imply an issue with /usr/lib64/ganesha/libfsalrgw.so
Running gdb on the ganesha process gave the trace
(gdb) bt #0 0x00007f1ba1930acf in raise () from target:/lib64/libc.so.6 #1 0x00007f1ba1903ea5 in abort () from target:/lib64/libc.so.6 #2 0x00007f1ba1971cc7 in __libc_message () from target:/lib64/libc.so.6 #3 0x00007f1ba1978fcc in malloc_printerr () from target:/lib64/libc.so.6 #4 0x00007f1ba197ab54 in _int_free () from target:/lib64/libc.so.6 #5 0x00007f1b672cafe1 in arrow::internal::GetEnvVar[abi:cxx11](char const*) () from target:/lib64/librgw.so.2 #6 0x00007f1b6719c43d in arrow::(anonymous namespace)::DefaultBackend() () from target:/lib64/librgw.so.2 #7 0x00007f1b6719d1d0 in arrow::default_memory_pool() () from target:/lib64/librgw.so.2 #8 0x00007f1b664b6b9a in _GLOBAL__sub_I_interfaces.cc () from target:/lib64/librgw.so.2 #9 0x00007f1ba43a6f4a in call_init (l=<optimized out>, argc=argc@entry=6, argv=argv@entry=0x7ffdd5523f48, env=env@entry=0x7ffdd5523f80) at dl-init.c:72 #10 0x00007f1ba43a704a in call_init (env=0x7ffdd5523f80, argv=0x7ffdd5523f48, argc=6, l=<optimized out>) at dl-init.c:30 #11 _dl_init (main_map=0x560f5db457e0, argc=6, argv=0x7ffdd5523f48, env=0x7ffdd5523f80) at dl-init.c:119 #12 0x00007f1ba1a4ae2c in _dl_catch_exception () from target:/lib64/libc.so.6 #13 0x00007f1ba43ae79e in dl_open_worker (a=0x7ffdd5522ff0) at dl-open.c:794 #14 dl_open_worker (a=0x7ffdd5522ff0) at dl-open.c:757 #15 0x00007f1ba1a4add4 in _dl_catch_exception () from target:/lib64/libc.so.6 #16 0x00007f1ba43ae981 in _dl_open (file=0x7ffdd55232a0 "/usr/lib64/ganesha/libfsalrgw.so", mode=-2147483638, caller_dlopen=0x7f1ba3fd3b7d <load_fsal+477>, nsid=<optimized out>, argc=6, argv=<optimized out>, env=0x7ffdd5523f80) at dl-open.c:876 #17 0x00007f1ba2e8df8a in dlopen_doit () from target:/lib64/libdl.so.2 #18 0x00007f1ba1a4add4 in _dl_catch_exception () from target:/lib64/libc.so.6 #19 0x00007f1ba1a4ae93 in _dl_catch_error () from target:/lib64/libc.so.6 #20 0x00007f1ba2e8e52e in _dlerror_run () from target:/lib64/libdl.so.2 #21 0x00007f1ba2e8e02a in dlopen@@GLIBC_2.2.5 () from target:/lib64/libdl.so.2 #22 0x00007f1ba3fd3b7d in load_fsal () from target:/lib64/libganesha_nfsd.so.5.5 #23 0x00007f1ba3fd4e39 in fsal_load_init () from target:/lib64/libganesha_nfsd.so.5.5 #24 0x00007f1ba406047a in fsal_cfg_commit () from target:/lib64/libganesha_nfsd.so.5.5 #25 0x00007f1ba401378c in proc_block () from target:/lib64/libganesha_nfsd.so.5.5 #26 0x00007f1ba4013059 in proc_block () from target:/lib64/libganesha_nfsd.so.5.5 #27 0x00007f1ba4014237 in load_config_from_parse () from target:/lib64/libganesha_nfsd.so.5.5 #28 0x00007f1ba4061477 in ReadExports () from target:/lib64/libganesha_nfsd.so.5.5 #29 0x0000560f5c17c86e in main ()
although if there is some extra debuginfo out there for ganesha I didn't have it.
Ganesha version from the container
[root@smithi031 /]# /usr/bin/ganesha.nfsd -v NFS-Ganesha Release = V5.5
Installed ganesha packages in the container
[root@smithi031 /]# rpm -qa | grep ganesha nfs-ganesha-5.5-2.el8s.x86_64 nfs-ganesha-rados-grace-5.5-2.el8s.x86_64 nfs-ganesha-selinux-5.5-2.el8s.noarch nfs-ganesha-ceph-5.5-2.el8s.x86_64 nfs-ganesha-rados-urls-5.5-2.el8s.x86_64 nfs-ganesha-rgw-5.5-2.el8s.x86_64
Updated by Adam King 7 months ago
https://pulpito.ceph.com/adking-2023-10-06_23:08:11-orch:cephadm-wip-adk4-testing-2023-10-06-1539-reef-distro-default-smithi/7416071
https://pulpito.ceph.com/adking-2023-10-06_23:08:11-orch:cephadm-wip-adk4-testing-2023-10-06-1539-reef-distro-default-smithi/7416080
https://pulpito.ceph.com/adking-2023-10-06_23:08:11-orch:cephadm-wip-adk4-testing-2023-10-06-1539-reef-distro-default-smithi/7416139
https://pulpito.ceph.com/adking-2023-10-06_23:08:11-orch:cephadm-wip-adk4-testing-2023-10-06-1539-reef-distro-default-smithi/7416144
Updated by Adam King 7 months ago
the ganesha conf of the nfs daemon
[root@smithi031 ~]# cat /var/lib/ceph/5ec1f23a-66a7-11ee-8db6-212e2dc638e7/nfs.foo.0.0.smithi031.filhfh/etc/ganesha/ganesha.conf # This file is generated by cephadm. NFS_CORE_PARAM { Enable_NLM = false; Enable_RQUOTA = false; Protocols = 4; NFS_Port = 12049; } NFSv4 { Delegations = false; RecoveryBackend = 'rados_cluster'; Minor_Versions = 1, 2; } RADOS_KV { UserId = "nfs.foo.0.0.smithi031.filhfh"; nodeid = "nfs.foo.0"; pool = ".nfs"; namespace = "foo"; } RADOS_URLS { UserId = "nfs.foo.0.0.smithi031.filhfh"; watch_url = "rados://.nfs/foo/conf-nfs.foo"; } RGW { cluster = "ceph"; name = "client.nfs.foo.0.0.smithi031.filhfh-rgw"; } %url rados://.nfs/foo/conf-nfs.foo
Updated by Adam King 7 months ago
export info
[ceph: root@smithi031 /]# ceph nfs export info foo /foouser { "access_type": "RW", "clients": [], "cluster_id": "foo", "export_id": 1, "fsal": { "access_key_id": "X6S22LRHDZ06FUUPMHFZ", "name": "RGW", "secret_access_key": "SPqEgOygIQQ4bHJPKaUMD3kzoWHv68qXONmDdKbs", "user_id": "foouser" }, "path": "/", "protocols": [ 4 ], "pseudo": "/foouser", "security_label": true, "squash": "none", "transports": [ "TCP" ] }
Updated by Matt Benjamin 6 months ago
I think what you're experiencing is:
https://tracker.ceph.com/issues/63394
This fix hasn't merged yet, but it passed our downstream QA (rhcs-7.0).
regards,
Matt