Project

General

Profile

Actions

Bug #55620

open

ceph pacific fails to perform fs/multifs test

Added by Aliaksei Makarau almost 2 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During execution of the integration tests (IBM Z, BE) the fs/multifs suite produces a set of error related to segfaults.

teuthology.log:

2021-12-24T16:29:37.961 INFO:teuthology.orchestra.run.m1306035.stderr:2021-12-24T16:29:37.952+0100 3ff7f7fe900  1 -- 172.18.232.35:0/1713868722 <== mon.2 v2:172.18.232.30:3301/0 2 ==== config(0 keys) v1 =
=== 4+0+0 (secure 0 0 0) 0x3ff8c003d90 con 0x3ff7806b290
2021-12-24T16:29:37.962 INFO:teuthology.orchestra.run.m1306035.stderr:2021-12-24T16:29:37.952+0100 3ff7f7fe900  1 -- 172.18.232.35:0/1713868722 <== mon.2 v2:172.18.232.30:3301/0 3 ==== mgrmap(e 4) v1 ====
 70370+0+0 (secure 0 0 0) 0x3ff8c0220c0 con 0x3ff7806b290
2021-12-24T16:29:38.298 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-12-24T16:29:38.299 INFO:tasks.cephfs_test_runner:test_mount_all_caps_absent (tasks.cephfs.test_multifs_auth.TestClientsWithoutAuth) ... ERROR
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:======================================================================
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:ERROR: test_mount_all_caps_absent (tasks.cephfs.test_multifs_auth.TestClientsWithoutAuth)
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/cephfs_test_case.py", line 212,
in tearDown
2021-12-24T16:29:38.300 INFO:tasks.cephfs_test_runner:    self.mds_cluster.delete_all_filesystems()
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 479, in del
ete_all_filesystems
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:    for fs in self.status().get_filesystems():
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 382, in sta
tus
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:    return FSStatus(self.mon_manager, epoch)
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 79, in __in
it__
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:    self.map = json.loads(self.mon.raw_cluster_cmd(*cmd))
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/ceph_manager.py", line 1581, in raw_clu
ster_cmd
2021-12-24T16:29:38.301 INFO:tasks.cephfs_test_runner:    p = self.run_cluster_cmd(args=args, stdout=stdout, **kwargs)
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/ceph_manager.py", line 1574, in run_clu
ster_cmd
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:    return self.controller.run(**kwargs)
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/remote.py", line 509, in run
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 455, in run
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:    r.wait()
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 161, in wait
2021-12-24T16:29:38.302 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 181, in _raise_for_status
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on m1306035 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive
/coverage timeout 120 ceph --cluster ceph fs dump --format=json'
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:Ran 1 test in 164.104s
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:
2021-12-24T16:29:38.303 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:======================================================================
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:ERROR: test_mount_all_caps_absent (tasks.cephfs.test_multifs_auth.TestClientsWithoutAuth)
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/cephfs_test_case.py", line 212,
in tearDown
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:    self.mds_cluster.delete_all_filesystems()
2021-12-24T16:29:38.304 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 479, in del
ete_all_filesystems
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:    for fs in self.status().get_filesystems():
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 382, in sta
tus
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:    return FSStatus(self.mon_manager, epoch)
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs/filesystem.py", line 79, in __in
it__
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:    self.map = json.loads(self.mon.raw_cluster_cmd(*cmd))
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/ceph_manager.py", line 1581, in raw_clu
ster_cmd
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:    p = self.run_cluster_cmd(args=args, stdout=stdout, **kwargs)
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/ceph_manager.py", line 1574, in run_clu
ster_cmd
2021-12-24T16:29:38.305 INFO:tasks.cephfs_test_runner:    return self.controller.run(**kwargs)
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/remote.py", line 509, in run
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 455, in run
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:    r.wait()
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 161, in wait
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:  File "/home/teuthology/src/teuthology_pacific/teuthology/orchestra/run.py", line 181, in _raise_for_status
2021-12-24T16:29:38.306 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2021-12-24T16:29:38.307 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on m1306035 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive
/coverage timeout 120 ceph --cluster ceph fs dump --format=json'
2021-12-24T16:29:38.307 INFO:tasks.cephfs_test_runner:
2021-12-24T16:29:38.307 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthology/src/teuthology_pacific/teuthology/run_tasks.py", line 94, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ibm-s390-cloud_ceph_c39ba7d47040c91efe2793b55ab9465a9a4ec66b/qa/tasks/cephfs_test_runner.py", line 211, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_mount_all_caps_absent (tasks.cephfs.test_multifs_auth.TestClientsWithoutAuth)
2021-12-24T16:29:38.307 DEBUG:teuthology.run_tasks:Unwinding manager cephfs_test_runner
2021-12-24T16:29:38.311 DEBUG:teuthology.run_tasks:Unwinding manager kclient
2021-12-24T16:29:38.315 INFO:tasks.kclient:Unmounting kernel clients...
2021-12-24T16:29:38.316 INFO:teuthology.orchestra.run:Running command with timeout 300

The test produces a segfault and the following log in ceph-mon.a.log.gz is:

   -15> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 mon.a@1(peon).mgr e4 Sending map to subscriber 0x3ff6c065910 v2:172.18.232.30:6834/2136209486
   -14> 2021-12-24T16:27:34.503+0100 3ff937fe900  1 -- [v2:172.18.232.35:3300/0,v1:172.18.232.35:6789/0] --> [v2:172.18.232.30:6834/2136209486,v1:172.18.232.30:6835/2136209486] -- mgrmap(e 4) v1 -- 0x3ff5
c5d1d90 con 0x3ff6c065910
   -13> 2021-12-24T16:27:34.503+0100 3ff937fe900 10 mon.a@1(peon).monmap v1 check_sub monmap next 2 have 1
   -12> 2021-12-24T16:27:34.503+0100 3ff937fe900  1 -- [v2:172.18.232.35:3300/0,v1:172.18.232.35:6789/0] <== client.5551 v1:192.168.0.1:0/2933524942 3 ==== mon_subscribe({fsmap.user=0,monmap=2+,osdmap=0})
 v2 ==== 65+0+0 (unknown 2575153023 0 0) 0x3ff5c4a9ad0 con 0x3ff6c01a3f0
   -11> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 mon.a@1(peon) e1 _ms_dispatch existing session 0x3ff6c0834c0 for client.5551
   -10> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 mon.a@1(peon) e1  entity_name client.testuser global_id 5551 (reclaim_ok) caps allow r fsname=cephfs
    -9> 2021-12-24T16:27:34.503+0100 3ff937fe900 10 mon.a@1(peon) e1 handle_subscribe mon_subscribe({fsmap.user=0,monmap=2+,osdmap=0}) v2
    -8> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 is_capable service=mon command= read addr v1:192.168.0.1:0/2933524942 on cap allow r
    -7> 2021-12-24T16:27:34.503+0100 3ff937fe900 20  allow so far , doing grant allow r
    -6> 2021-12-24T16:27:34.503+0100 3ff937fe900 20  match
    -5> 2021-12-24T16:27:34.503+0100 3ff937fe900 10 mon.a@1(peon) e1 handle_subscribe: MDS sub 'fsmap.user'
    -4> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 is_capable service=mds command= read addr v1:192.168.0.1:0/2933524942 on cap allow r
    -3> 2021-12-24T16:27:34.503+0100 3ff937fe900 20  allow so far , doing grant allow r
    -2> 2021-12-24T16:27:34.503+0100 3ff937fe900 20  match
    -1> 2021-12-24T16:27:34.503+0100 3ff937fe900 20 mon.a@1(peon).mds e18 check_sub: fsmap.user
     0> 2021-12-24T16:27:34.513+0100 3ff937fe900 -1 *** Caught signal (Segmentation fault) **
 in thread 3ff937fe900 thread_name:ms_dispatch

 ceph version 16.2.6-710-geaff0ba3695 (eaff0ba3695f9a68cd1eda6939be4347a55bf703) pacific (stable)
 1: [0x3ff937fa35e]
 2: (MDSMonitor::check_sub(Subscription*)+0x2a0) [0x2aa2d55f008]
 3: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0x13ec) [0x2aa2d2f5314]
 4: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x680) [0x2aa2d327d48]
 5: (Monitor::_ms_dispatch(Message*)+0x34e) [0x2aa2d328ebe]
 6: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x82) [0x2aa2d3647ca]
 7: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x484) [0x3ffa6835d14]
 8: (DispatchQueue::entry()+0x53c) [0x3ffa683332c]
 9: (DispatchQueue::DispatchThread::entry()+0x18) [0x3ffa68fbd98]
 10: /lib/s390x-linux-gnu/libpthread.so.0(+0x9986) [0x3ffa6089986]
 11: /lib/s390x-linux-gnu/libc.so.6(+0x103cc6) [0x3ffa5b03cc6]
 12: [(nil)]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #55924: pacific: ceph pacific fails to perform fs/multifs testRejectedVenky ShankarActions
Copied to CephFS - Backport #55925: quincy: ceph pacific fails to perform fs/multifs testNewVenky ShankarActions
Actions #1

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 46371
Actions #2

Updated by Venky Shankar almost 2 years ago

  • Category set to Correctness/Safety
  • Status changed from Fix Under Review to Pending Backport
  • Assignee set to Aliaksei Makarau
  • Target version changed from v16.2.8 to v18.0.0
  • Backport set to quincy, pacific
Actions #3

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55924: pacific: ceph pacific fails to perform fs/multifs test added
Actions #4

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55925: quincy: ceph pacific fails to perform fs/multifs test added
Actions #5

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF