Bug #55824
closedceph-fuse[88614]: ceph mount failed with (65536) Unknown error 65536
0%
Description
When mounting a ceph-fuse it failed:
2022-05-31T03:13:43.772 DEBUG:teuthology.orchestra.run:got remote process result: 32 2022-05-31T03:13:43.773 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-05-31T03:13:43.773 DEBUG:teuthology.orchestra.run.smithi045:> ls /sys/fs/fuse/connections 2022-05-31T03:13:43.806 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi045.stderr:2022-05-31T03:13:43.806+0000 7f7ece3c53c0 -1 init, newargv = 0x55c8f14132f0 newargc=15 2022-05-31T03:13:43.806 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi045.stderr:ceph-fuse[88614]: starting ceph client 2022-05-31T03:13:43.811 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi045.stderr:ceph-fuse[88614]: probably no MDS server is up? 2022-05-31T03:13:43.812 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi045.stderr:ceph-fuse[88614]: ceph mount failed with (65536) Unknown error 65536 2022-05-31T03:13:43.975 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi045.stderr:daemon-helper: command failed with exit status 1 2022-05-31T03:13:44.931 DEBUG:teuthology.orchestra.run.smithi045:> sudo modprobe fuse 2022-05-31T03:13:45.023 INFO:teuthology.orchestra.run:Running command with timeout 30 2022-05-31T03:13:45.023 DEBUG:teuthology.orchestra.run.smithi045:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections 2022-05-31T03:13:45.106 INFO:teuthology.orchestra.run.smithi045.stderr:mount: /sys/fs/fuse/connections: /sys/fs/fuse/connections already mounted or mount point busy. 2022-05-31T03:13:45.110 DEBUG:teuthology.orchestra.run:got remote process result: 32 2022-05-31T03:13:45.111 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-05-31T03:13:45.111 DEBUG:teuthology.orchestra.run.smithi045:> ls /sys/fs/fuse/connections 2022-05-31T03:13:45.165 DEBUG:teuthology.orchestra.run:got remote process result: 1 2022-05-31T03:13:45.165 INFO:tasks.cephfs.fuse_mount:mount command failed. 2022-05-31T03:13:45.165 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_1b30281f276b97a8186594b7f92fe1f728418ada/teuthology/contextutil.py", line 31, in nested vars.append(enter()) File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/ceph.py", line 447, in cephfs_setup fs = Filesystem(ctx, fs_config=temp, name=name, create=True) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/filesystem.py", line 512, in __init__ self.create() File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/filesystem.py", line 680, in create self.run_client_payload(f"setfattr -n ceph.dir.layout.pool -v {ec_data_pool_name} . && getfattr -n ceph.dir.layout .") File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/filesystem.py", line 718, in run_client_payload m.mount_wait() File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/mount.py", line 453, in mount_wait self.mount(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/fuse_mount.py", line 52, in mount return self._mount(mntopts, check_status) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/fuse_mount.py", line 68, in _mount retval = self._run_mount_cmd(mntopts, check_status) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/fuse_mount.py", line 96, in _run_mount_cmd check_status, pre_mount_conns, mountcmd_stdout, mountcmd_stderr) File "/home/teuthworker/src/git.ceph.com_ceph-c_9e1d1677a609de7d04588e347b7aaa17ce9b38a2/qa/tasks/cephfs/fuse_mount.py", line 171, in _wait_and_record_our_fuse_conn self.fuse_daemon.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_1b30281f276b97a8186594b7f92fe1f728418ada/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_1b30281f276b97a8186594b7f92fe1f728418ada/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi045 with status 1: "(cd /home/ubuntu/cephtest && exec sudo nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.admin sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-fuse -f --admin-socket '/var/run/ceph/$cluster-$name.$pid.asok' /home/ubuntu/cephtest/mnt.admin --id admin --client_fs=cephfs)" 2022-05-31T03:13:45.168 INFO:teuthology.misc:Shutting down mds daemons... 2022-05-31T03:13:45.168 DEBUG:tasks.ceph.mds.a:waiting for process to exit 2022-05-31T03:13:45.168 INFO:teuthology.orchestra.run:waiting for 300
From the mds.0 logs:
2022-05-31T03:13:43.482+0000 7f21b5089700 10 mds.0.cache cache not ready for trimming 2022-05-31T03:13:43.482+0000 7f21b5089700 20 mds.0.cache upkeep thread waiting interval 1.000000000s 2022-05-31T03:13:43.493+0000 7f21b708d700 5 mds.beacon.d Sending beacon up:active seq 5 2022-05-31T03:13:43.493+0000 7f21b708d700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] --> [v2:172.21.15.136:3300/0,v1:172.21.15.136:6789/0] -- mdsbeacon(4270/d up:active seq=5 v5) v8 -- 0x5639ce331340 con 0x5639ce159400 2022-05-31T03:13:43.493+0000 7f21b708d700 20 mds.beacon.d sender thread waiting interval 4s 2022-05-31T03:13:43.957+0000 7f21b9091700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] <== mon.1 v2:172.21.15.136:3300/0 49 ==== mdsmap(e 6) v2 ==== 1094+0+0 (secure 0 0 0) 0x5639ce1bb380 con 0x5639ce159400 2022-05-31T03:13:43.957+0000 7f21b9091700 1 mds.d Updating MDS map to version 6 from mon.1 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d my gid is 4270 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d map says I am mds.0.4 state up:active 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d msgr says I am [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] 2022-05-31T03:13:43.957+0000 7f21b9091700 10 mds.d handle_mds_map: handling map as rank 0 2022-05-31T03:13:43.959+0000 7f21bc097700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] <== mon.1 v2:172.21.15.136:3300/0 50 ==== mdsbeacon(4270/d up:active seq=5 v6) v8 ==== 130+0+0 (secure 0 0 0) 0x5639ce331080 con 0x5639ce159400 2022-05-31T03:13:43.959+0000 7f21bc097700 5 mds.beacon.d received beacon reply up:active seq 5 rtt 0.46599 2022-05-31T03:13:44.051+0000 7f21b688c700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] --> [v2:172.21.15.136:6832/87765,v1:172.21.15.136:6833/87765] -- mgrreport(unknown.d +0-0 packed 1534 task_status=0) v9 -- 0x5639cd42d180 con 0x5639ce159800
The client logs:
2022-05-31T03:13:43.811+0000 7f7eaa7fc700 20 client.4441 tick 2022-05-31T03:13:43.811+0000 7f7eaa7fc700 20 client.4441 collect_and_send_metrics 2022-05-31T03:13:43.811+0000 7f7eaa7fc700 20 client.4441 collect_and_send_global_metrics 2022-05-31T03:13:43.811+0000 7f7eaa7fc700 5 client.4441 collect_and_send_global_metrics: no session with rank=0 -- not sending metric 2022-05-31T03:13:43.811+0000 7f7eaa7fc700 20 client.4441 trim_cache size 0 max 16384 2022-05-31T03:13:43.811+0000 7f7eaa7fc700 20 client.4441 upkeep thread waiting interval 1.000000000s 2022-05-31T03:13:43.812+0000 7f7eabfff700 1 client.4441 handle_mds_map epoch 5 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 10 client.4441 mds cluster unavailable: epoch=5 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 1 client.4441 shutdown 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 10 client.4441 shutdown stopping cache invalidator finisher 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 10 client.4441 shutdown stopping dentry invalidator finisher 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 10 client.4441 shutdown stopping interrupt finisher 2022-05-31T03:13:43.812+0000 7f7ece3c53c0 10 client.4441 shutdown stopping remount finisher
Still not sure why the ceph-fuse didn't detect any active MDS.
Updated by Xiubo Li almost 2 years ago
Xiubo Li wrote:
When mounting a ceph-fuse it failed:
[...]
From the mds.0 logs:
[...]
The client logs:
[...]
Checked the mds.0 logs, for the epoch 5, the mds.0 was still in the up:creating state:
.21.15.136:6789/0] -- mdsbeacon(4270/d up:active seq=4 v4) v8 -- 0x5639ce331080 con 0x5639ce159400 2022-05-31T03:13:39.494+0000 7f21b3085700 10 mds.0.snapclient sync 2022-05-31T03:13:39.494+0000 7f21b3085700 10 mds.0.snapclient refresh want 1 2022-05-31T03:13:39.494+0000 7f21b9091700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] <== mon.1 v2:172.21.15.136:3300/0 46 ==== mdsmap(e 5) v2 ==== 1094+0+0 (secure 0 0 0) 0x5639ce1bb380 con 0x5639ce159400 2022-05-31T03:13:39.494+0000 7f21b9091700 1 mds.d Updating MDS map to version 5 from mon.1 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d my gid is 4270 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d map says I am mds.0.4 state up:creating 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d msgr says I am [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] 2022-05-31T03:13:39.494+0000 7f21b9091700 10 mds.d handle_mds_map: handling map as rank 0 2022-05-31T03:13:39.494+0000 7f21b9091700 10 notify_mdsmap: mds.metrics 2022-05-31T03:13:39.494+0000 7f21b9091700 10 notify_mdsmap: mds.metrics: rank0 is unavailable 2022-05-31T03:13:39.494+0000 7f21b9091700 10 reset_seq: mds.metrics: last_updated_seq=0 2022-05-31T03:13:39.494+0000 7f21b9091700 20 set_next_seq: mds.metrics: current sequence number 0, setting next sequence number 0 2022-05-31T03:13:39.495+0000 7f21b9091700 1 -- [v2:172.21.15.136:6835/1045808825,v1:172.21.15.136:6837/1045808825] <== mon.1 v2:172.21.15.136:3300/0 47 ==== osd_map(26..26 src has 1..26) v4 ==== 685+0+0 (secure 0 0 0) 0x5639cd41f6c0 con 0x5639ce159400 2022-05-31T03:13:39.495+0000 7f21b9091700 7 mds.0.server operator(): full = 0 epoch = 26 2022-05-31T03:13:39.495+0000 7f21b9091700 10 mds.0.server apply_blocklist: killed 0
So no MDS is in up.
Still not sure why the ceph-fuse didn't detect any active MDS.
Updated by Xiubo Li almost 2 years ago
Maybe we should wait for a while when mounting the ceph-fuse ?
Updated by Venky Shankar almost 2 years ago
- Category set to Administration/Usability
- Assignee set to Xiubo Li
- Target version set to v18.0.0
- Backport set to quincy, pacific
Updated by Xiubo Li almost 2 years ago
The ec is enabled and the test will create and set the ec pool to the layout:
ceph: cephfs: ec_profile: - m=2 - k=2 - crush-failure-domain=osd max_mds: 5 session_timeout: 300 standby_replay: true
The logs:
2022-06-06T22:30:37.348 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-06-06T22:30:37.348 DEBUG:teuthology.orchestra.run.smithi102:> stat --file-system '--printf=%T 2022-06-06T22:30:37.349 DEBUG:teuthology.orchestra.run.smithi102:> ' -- /home/ubuntu/cephtest/mnt.admin 2022-06-06T22:30:37.405 INFO:teuthology.orchestra.run.smithi102.stdout:fuseblk 2022-06-06T22:30:37.405 INFO:tasks.cephfs.fuse_mount:ceph-fuse is mounted on /home/ubuntu/cephtest/mnt.admin 2022-06-06T22:30:37.406 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-06-06T22:30:37.406 DEBUG:teuthology.orchestra.run.smithi102:> sudo chmod 1777 /home/ubuntu/cephtest/mnt.admin 2022-06-06T22:30:37.493 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-06-06T22:30:37.494 DEBUG:teuthology.orchestra.run.smithi102:> (cd /home/ubuntu/cephtest/mnt.admin && exec bash -c 'setfattr -n ceph.dir.layout.pool -v cephfs_data_ec . && getfattr -n ceph.dir.layout .') 2022-06-06T22:30:37.654 INFO:teuthology.orchestra.run.smithi102.stdout:# file: . 2022-06-06T22:30:37.654 INFO:teuthology.orchestra.run.smithi102.stdout:ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_data_ec" 2022-06-06T22:30:37.655 INFO:teuthology.orchestra.run.smithi102.stdout: 2022-06-06T22:30:37.656 INFO:tasks.cephfs.fuse_mount:Running fusermount -u on ubuntu@smithi102.front.sepia.ceph.com... 2022-06-06T22:30:37.657 INFO:teuthology.orchestra.run:Running command with timeout 300 2022-06-06T22:30:37.657 DEBUG:teuthology.orchestra.run.smithi102:> sudo fusermount -u /home/ubuntu/cephtest/mnt.admin 2022-06-06T22:30:37.737 INFO:tasks.cephfs.fuse_mount.ceph-fuse.admin.smithi102.stderr:ceph-fuse[124989]: fuse finished with error 0 and tester_r 0 2022-06-06T22:30:37.743 INFO:teuthology.orchestra.run:waiting for 300 2022-06-06T22:30:37.771 INFO:tasks.cephfs.mount:Cleaning up mount ubuntu@smithi102.front.sepia.ceph.com
From the code from qa/tasks/cephfs/filesystem.py, in Line#680 it will mount a fuse client and then set the ec pool to the layout:
626 def create(self): ... 653 if self.metadata_overlay: 654 self.mon_manager.raw_cluster_cmd('fs', 'new', 655 self.name, self.metadata_pool_name, data_pool_name, 656 '--allow-dangerous-metadata-overlay') 657 else: 658 self.mon_manager.raw_cluster_cmd('fs', 'new', 659 self.name, 660 self.metadata_pool_name, 661 data_pool_name) 662 663 if self.ec_profile and 'disabled' not in self.ec_profile: 664 ec_data_pool_name = data_pool_name + "_ec" 665 log.debug("EC profile is %s", self.ec_profile) 666 cmd = ['osd', 'erasure-code-profile', 'set', ec_data_pool_name] 667 cmd.extend(self.ec_profile) 668 self.mon_manager.raw_cluster_cmd(*cmd) 669 self.mon_manager.raw_cluster_cmd( 670 'osd', 'pool', 'create', ec_data_pool_name, 671 'erasure', ec_data_pool_name, 672 '--pg_num_min', str(self.pg_num_min), 673 '--target_size_ratio', str(self.target_size_ratio_ec)) 674 self.mon_manager.raw_cluster_cmd( 675 'osd', 'pool', 'set', 676 ec_data_pool_name, 'allow_ec_overwrites', 'true') 677 self.add_data_pool(ec_data_pool_name, create=False) 678 self.check_pool_application(ec_data_pool_name) 679 680 self.run_client_payload(f"setfattr -n ceph.dir.layout.pool -v {ec_data_pool_name} . && getfattr -n ceph.dir.layout .") 681 ...
The code of run_client_payload():
713 def run_client_payload(self, cmd): 714 # avoid circular dep by importing here: 715 from tasks.cephfs.fuse_mount import FuseMount 716 d = misc.get_testdir(self._ctx) 717 m = FuseMount(self._ctx, d, "admin", self.client_remote, cephfs_name=self.name) 718 m.mount_wait() 719 m.run_shell_payload(cmd) 720 m.umount_wait(require_clean=True) 721
But it ran too fast that the MDSes in the ceph cluster were not ready yet. So we need to wait for a while to get at least the rank 0 to be up:active.
Updated by Xiubo Li almost 2 years ago
- Status changed from New to Fix Under Review
Updated by Venky Shankar almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56105: pacific: ceph-fuse[88614]: ceph mount failed with (65536) Unknown error 65536 added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56106: quincy: ceph-fuse[88614]: ceph mount failed with (65536) Unknown error 65536 added
Updated by Xiubo Li almost 2 years ago
- Status changed from Pending Backport to Resolved