Project

General

Profile

Actions

Bug #17847

open

"Fuse mount failed to populate /sys/ after 31 seconds" in jewel 10.2.4

Added by Yuri Weinstein over 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/hammer-x
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-11-09_15:03:11-upgrade:hammer-x-jewel-distro-basic-vps/
Job: 534539
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-11-09_15:03:11-upgrade:hammer-x-jewel-distro-basic-vps/534539/teuthology.log

016-11-09T15:34:25.496 INFO:teuthology.task.sequential:In sequential, running task print...
2016-11-09T15:34:25.496 INFO:teuthology.task.print:**** done ceph.healthy
2016-11-09T15:34:25.496 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 63, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/full_sequential.py", line 30, in task
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/ceph-qa-suite_wip-17734-jewel/tasks/ceph_fuse.py", line 126, in task
    mount.mount()
  File "/home/teuthworker/src/ceph-qa-suite_wip-17734-jewel/tasks/cephfs/fuse_mount.py", line 123, in mount
    waited
RuntimeError: Fuse mount failed to populate /sys/ after 31 seconds
Actions #1

Updated by John Spray over 7 years ago

For those not familiar with the test, can you explain what versions are in play here, and what version of the client/mds are running when the issue happens?

Actions #2

Updated by Zheng Yan over 7 years ago

2016-11-09 15:28:35.478741 7f215c02b7c0  0 ceph version 0.94.9-14-gf776699 (f776699fd00ff5f0deea15ec4f2efb991831b055), process ceph-fuse, pid 20493
2016-11-09 15:28:35.480593 7f215c02b7c0 -1 init, newargv = 0x45296f0 newargc=11
2016-11-09 15:28:47.232559 7f2154885700  0 monclient: hunting for new mon
2016-11-09 15:28:47.232673 7f2154885700  0 client.4151 ms_handle_reset on 172.21.2.35:6790/0
2016-11-09 15:28:53.235001 7f214e5fa700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6808/20373 pipe(0x45367f0 sd=7 :38378 s=2 pgs=7 cs=1 l=0 c=0x45308b0).fault, initiating reconnect
2016-11-09 15:28:53.235588 7f214e6fb700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6808/20373 pipe(0x45367f0 sd=7 :38378 s=1 pgs=7 cs=2 l=0 c=0x45308b0).fault
2016-11-09 15:35:19.130826 7f214e6fb700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6800/22430 pipe(0x7f2140018090 sd=7 :36690 s=2 pgs=10 cs=1 l=0 c=0x7f2140012f60).fault, initiating reconnect
2016-11-09 15:35:19.131705 7f214e5fa700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6800/22430 pipe(0x7f2140018090 sd=7 :36690 s=1 pgs=10 cs=2 l=0 c=0x7f2140012f60).fault
2016-11-09 15:35:49.144190 7f2154885700  0 monclient: hunting for new mon
2016-11-09 15:35:49.144379 7f2154885700  0 client.4151 ms_handle_reset on 172.21.2.131:6789/0
2016-11-09 15:35:55.149552 7f2154885700  0 monclient: hunting for new mon
2016-11-09 15:35:55.149678 7f2154885700  0 client.4151 ms_handle_reset on 172.21.2.35:6790/0
2016-11-09 15:36:01.151528 7f2154885700  0 monclient: hunting for new mon
2016-11-09 15:36:01.151576 7f2154885700  0 client.4151 ms_handle_reset on 172.21.2.35:6789/0
2016-11-09 15:36:01.152171 7f215c029700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.131:6789/0 pipe(0x7f214001d570 sd=0 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2140011020).fault
2016-11-09 15:36:08.495666 7f214e7fc700  0 -- 172.21.2.119:0/3605853097 >> 172.21.2.35:6790/0 pipe(0x7f212c0082a0 sd=0 :0 s=1 pgs=0 cs=0 l=1 c=0x7f212c00c540).fault

it's likely someone adjust the host's time

Actions #3

Updated by John Spray over 7 years ago

  • Status changed from New to Rejected

Looks like bad clocks in the test environment

Actions #5

Updated by John Spray over 7 years ago

Yuri: probably not related unless there's something I'm missing? Zheng was pointing out that the clock in the log seemed to jump forward, which could easily have broken the mon cluster and would explain this ceph-fuse client failing to come online.

Actions #7

Updated by Sage Weil over 7 years ago

  • Status changed from Rejected to 12

It's a hammer ceph-fuse trying to mount.. my guess is a compat issue

Actions #8

Updated by Sage Weil over 7 years ago

https://github.com/ceph/ceph-qa-suite/pull/1280

added debugging. the mon connection looks fine, but it isn't doing well with the mds

Actions #10

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF