Project

General

Profile

Bug #40102

qa: probable kernel deadlock/oops during umount on testing branch

Added by Patrick Donnelly 5 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature:

Description

2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:Running:
2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:> sudo adjust-ulimits daemon-helper kill python -c '
2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:> import os
2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:> import stat
2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:> import json
2019-05-31T03:05:07.400 INFO:teuthology.orchestra.run.smithi109:> import sys
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:>
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:> try:
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:>     s = os.stat("/home/ubuntu/cephtest/mnt.0/datafile")
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:> except OSError as e:
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:>     sys.exit(e.errno)
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:>
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:> attrs = ["st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime"]
2019-05-31T03:05:07.401 INFO:teuthology.orchestra.run.smithi109:> print json.dumps(
2019-05-31T03:05:07.402 INFO:teuthology.orchestra.run.smithi109:>     dict([(a, getattr(s, a)) for a in attrs]),
2019-05-31T03:05:07.402 INFO:teuthology.orchestra.run.smithi109:>     indent=2)
2019-05-31T03:05:07.402 INFO:teuthology.orchestra.run.smithi109:> '
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:{
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_ctime": 1559271907.390514,
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_mtime": 1559271907.390514,
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_nlink": 1,
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_gid": 0,
2019-05-31T03:05:07.466 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_dev": 43,
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_size": 33554432,
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_ino": 1099511627776,
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_uid": 0,
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_mode": 33188,
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:  "st_atime": 1559271906.966523
2019-05-31T03:05:07.467 INFO:teuthology.orchestra.run.smithi109.stdout:}
2019-05-31T03:05:07.661 DEBUG:tasks.cephfs.kernel_mount:Unmounting client client.0...
2019-05-31T03:05:07.661 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-05-31T03:05:07.661 INFO:teuthology.orchestra.run.smithi109:Running:
2019-05-31T03:05:07.661 INFO:teuthology.orchestra.run.smithi109:> sudo umount /home/ubuntu/cephtest/mnt.0
2019-05-31T03:05:29.665 INFO:teuthology.orchestra.run.smithi002:Running:
2019-05-31T03:05:29.665 INFO:teuthology.orchestra.run.smithi002:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2019-05-31T03:05:29.669 INFO:teuthology.orchestra.run.smithi036:Running:
2019-05-31T03:05:29.669 INFO:teuthology.orchestra.run.smithi036:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2019-05-31T03:05:29.673 INFO:teuthology.orchestra.run.smithi079:Running:
2019-05-31T03:05:29.674 INFO:teuthology.orchestra.run.smithi079:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2019-05-31T03:05:29.677 INFO:teuthology.orchestra.run.smithi109:Running:
2019-05-31T03:05:29.678 INFO:teuthology.orchestra.run.smithi109:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2019-05-31T03:20:07.668 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 307, in copy_file_to
    copy_to_log(src, logger, capture=stream)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1305, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 715, in recv_stderr
    raise socket.timeout()
timeout
2019-05-31T03:20:07.674 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 307, in copy_file_to
    copy_to_log(src, logger, capture=stream)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1293, in _read
    return self.channel.recv(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 667, in recv
    raise socket.timeout()
timeout
2019-05-31T03:21:27.992 ERROR:paramiko.transport:Socket exception: No route to host (113)
2019-05-31T03:21:27.993 DEBUG:teuthology.orchestra.run:got remote process result: None
2019-05-31T03:21:27.993 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2019-05-31T03:21:27.994 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'smithi109.front.sepia.ceph.com', 'timeout': 60}
2019-05-31T03:21:28.001 DEBUG:tasks.ceph:Missed logrotate, host unreachable
2019-05-31T03:21:31.064 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 172.21.15.109
2019-05-31T03:21:31.065 INFO:tasks.cephfs_test_runner:test_rebuild_nondefault_layout (tasks.cephfs.test_data_scan.TestDataScan) ... ERROR

From: /ceph/teuthology-archive/yuriw-2019-05-30_20:50:30-kcephfs-mimic_v13.2.6_QE-testing-basic-smithi/3989164/teuthology.log

This was with mimic and the testing branch of the kclient. Probably has nothign to do with mimic.

History

#1 Updated by Patrick Donnelly 5 months ago

Another: /ceph/teuthology-archive/yuriw-2019-05-30_20:50:30-kcephfs-mimic_v13.2.6_QE-testing-basic-smithi/3989039/teuthology.log

#2 Updated by Patrick Donnelly 5 months ago

Another: /ceph/teuthology-archive/yuriw-2019-05-30_20:50:30-kcephfs-mimic_v13.2.6_QE-testing-basic-smithi/3989013/teuthology.log

#3 Updated by Zheng Yan 5 months ago

it's kernel BUG at fs/ceph/mds_client.c:1500!

BUG_ON(session->s_nr_caps > 0);

No idea how can it happen

#4 Updated by Zheng Yan 5 months ago

  • Status changed from New to Testing

it's a longstanding bug. fix by "ceph: use ceph_evict_inode to cleanup inode's resource" in https://github.com/ceph/ceph-client/tree/testing

#5 Updated by Patrick Donnelly 5 months ago

  • Target version set to v15.0.0

#7 Updated by Ilya Dryomov about 1 month ago

The backport to 4.19 was incorrect, 4.19.76 is busted. Fixed in 4.19.77.

#8 Updated by Ilya Dryomov 19 days ago

Ilya Dryomov wrote:

The backport to 4.19 was incorrect, 4.19.76 is busted. Fixed in 4.19.77.

This goes for Ubuntu Disco 5.0.0-32 kernel as well: https://marc.info/?l=ceph-users&m=157167769117987&w=2

#9 Updated by Alex Litvak 14 days ago

Was 5.0.32 actually fixed?

#10 Updated by Nathan Fish 13 days ago

Also available in: Atom PDF