Bug #17847
open"Fuse mount failed to populate /sys/ after 31 seconds" in jewel 10.2.4
0%
Description
Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-11-09_15:03:11-upgrade:hammer-x-jewel-distro-basic-vps/
Job: 534539
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-11-09_15:03:11-upgrade:hammer-x-jewel-distro-basic-vps/534539/teuthology.log
016-11-09T15:34:25.496 INFO:teuthology.task.sequential:In sequential, running task print... 2016-11-09T15:34:25.496 INFO:teuthology.task.print:**** done ceph.healthy 2016-11-09T15:34:25.496 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 63, in _run_spawned mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/full_sequential.py", line 30, in task mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task mgr.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/src/ceph-qa-suite_wip-17734-jewel/tasks/ceph_fuse.py", line 126, in task mount.mount() File "/home/teuthworker/src/ceph-qa-suite_wip-17734-jewel/tasks/cephfs/fuse_mount.py", line 123, in mount waited RuntimeError: Fuse mount failed to populate /sys/ after 31 seconds
Updated by John Spray over 7 years ago
For those not familiar with the test, can you explain what versions are in play here, and what version of the client/mds are running when the issue happens?
Updated by Zheng Yan over 7 years ago
2016-11-09 15:28:35.478741 7f215c02b7c0 0 ceph version 0.94.9-14-gf776699 (f776699fd00ff5f0deea15ec4f2efb991831b055), process ceph-fuse, pid 20493 2016-11-09 15:28:35.480593 7f215c02b7c0 -1 init, newargv = 0x45296f0 newargc=11 2016-11-09 15:28:47.232559 7f2154885700 0 monclient: hunting for new mon 2016-11-09 15:28:47.232673 7f2154885700 0 client.4151 ms_handle_reset on 172.21.2.35:6790/0 2016-11-09 15:28:53.235001 7f214e5fa700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6808/20373 pipe(0x45367f0 sd=7 :38378 s=2 pgs=7 cs=1 l=0 c=0x45308b0).fault, initiating reconnect 2016-11-09 15:28:53.235588 7f214e6fb700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6808/20373 pipe(0x45367f0 sd=7 :38378 s=1 pgs=7 cs=2 l=0 c=0x45308b0).fault 2016-11-09 15:35:19.130826 7f214e6fb700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6800/22430 pipe(0x7f2140018090 sd=7 :36690 s=2 pgs=10 cs=1 l=0 c=0x7f2140012f60).fault, initiating reconnect 2016-11-09 15:35:19.131705 7f214e5fa700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6800/22430 pipe(0x7f2140018090 sd=7 :36690 s=1 pgs=10 cs=2 l=0 c=0x7f2140012f60).fault 2016-11-09 15:35:49.144190 7f2154885700 0 monclient: hunting for new mon 2016-11-09 15:35:49.144379 7f2154885700 0 client.4151 ms_handle_reset on 172.21.2.131:6789/0 2016-11-09 15:35:55.149552 7f2154885700 0 monclient: hunting for new mon 2016-11-09 15:35:55.149678 7f2154885700 0 client.4151 ms_handle_reset on 172.21.2.35:6790/0 2016-11-09 15:36:01.151528 7f2154885700 0 monclient: hunting for new mon 2016-11-09 15:36:01.151576 7f2154885700 0 client.4151 ms_handle_reset on 172.21.2.35:6789/0 2016-11-09 15:36:01.152171 7f215c029700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6789/0 pipe(0x7f214001d570 sd=0 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2140011020).fault 2016-11-09 15:36:08.495666 7f214e7fc700 0 -- 172.21.2.119:0/3605853097 >> 172.21.2.35:6790/0 pipe(0x7f212c0082a0 sd=0 :0 s=1 pgs=0 cs=0 l=1 c=0x7f212c00c540).fault
it's likely someone adjust the host's time
Updated by John Spray over 7 years ago
- Status changed from New to Rejected
Looks like bad clocks in the test environment
Updated by Yuri Weinstein over 7 years ago
related to https://github.com/ceph/ceph-qa-suite/pull/1256 ?
Updated by John Spray over 7 years ago
Yuri: probably not related unless there's something I'm missing? Zheng was pointing out that the clock in the log seemed to jump forward, which could easily have broken the mon cluster and would explain this ceph-fuse client failing to come online.
Updated by Yuri Weinstein over 7 years ago
John - it's reliably reproducible :(
Just re-ran again http://pulpito.front.sepia.ceph.com/yuriw-2016-11-14_16:55:44-upgrade:hammer-x-jewel-distro-basic-vps/
Updated by Sage Weil over 7 years ago
- Status changed from Rejected to 12
It's a hammer ceph-fuse trying to mount.. my guess is a compat issue
Updated by Sage Weil over 7 years ago
https://github.com/ceph/ceph-qa-suite/pull/1280
added debugging. the mon connection looks fine, but it isn't doing well with the mds