Project

General

Profile

Bug #46831

nautilus: mds: SIGSEGV in MDCache::finish_uncommitted_slave

Added by Patrick Donnelly about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature:

Description

2020-08-04T09:18:26.606 INFO:tasks.ceph.mds.c.smithi163.stderr:*** Caught signal (Segmentation fault) **
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: ceph version 14.2.10-256-gf23ff76200 (f23ff7620014d0d1324261eb383e8e25c588bdae) nautilus (stable)
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 1: (()+0x128a0) [0x7fc4604408a0]
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 2: (MDCache::finish_uncommitted_slave(metareqid_t, bool)+0x21e) [0x55d4c5f8004e]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 3: (ESlaveUpdate::replay(MDSRank*)+0xf9) [0x55d4c617de89]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 4: (MDLog::_replay_thread()+0x8b2) [0x55d4c611b6f2]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 5: (MDLog::ReplayThread::entry()+0xd) [0x55d4c5e7d80d]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 6: (()+0x76db) [0x7fc4604356db]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 7: (clone()+0x3f) [0x7fc45f61ba3f]
2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr:2020-08-04 09:18:26.600 7fc450483700 -1 *** Caught signal (Segmentation fault) **

From: /ceph/teuthology-archive/teuthology-2020-08-04_01:12:01-fs-nautilus-distro-basic-smithi/5285373/teuthology.log


Related issues

Related to fs - Backport #45709: nautilus: mds: wrong link count under certain circumstance Resolved
Duplicated by fs - Bug #46675: nautilus: fs/upgrade test: Crash: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds Duplicate

History

#1 Updated by Patrick Donnelly about 2 months ago

  • Status changed from In Progress to New
  • Assignee changed from Patrick Donnelly to Zheng Yan

Looks like this occurs shortly after the upgrade from Luminous:

2020-08-04T09:18:21.934 INFO:teuthology.run_tasks:Running task ceph.restart...
2020-08-04T09:18:21.947 INFO:tasks.ceph.mds.a:Restarting daemon
2020-08-04T09:18:21.947 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i a
2020-08-04T09:18:21.977 INFO:tasks.ceph.mds.a:Started
2020-08-04T09:18:21.978 INFO:tasks.ceph.mds.b:Restarting daemon
2020-08-04T09:18:21.978 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i b
2020-08-04T09:18:21.980 INFO:tasks.ceph.mds.b:Started
2020-08-04T09:18:21.981 INFO:tasks.ceph.mds.c:Restarting daemon
2020-08-04T09:18:21.981 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i c
2020-08-04T09:18:21.983 INFO:tasks.ceph.mds.c:Started
...
2020-08-04T09:18:26.606 INFO:tasks.ceph.mds.c.smithi163.stderr:*** Caught signal (Segmentation fault) **
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: ceph version 14.2.10-256-gf23ff76200 (f23ff7620014d0d1324261eb383e8e25c588bdae) nautilus (stable)
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 1: (()+0x128a0) [0x7fc4604408a0]
2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 2: (MDCache::finish_uncommitted_slave(metareqid_t, bool)+0x21e) [0x55d4c5f8004e]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 3: (ESlaveUpdate::replay(MDSRank*)+0xf9) [0x55d4c617de89]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 4: (MDLog::_replay_thread()+0x8b2) [0x55d4c611b6f2]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 5: (MDLog::ReplayThread::entry()+0xd) [0x55d4c5e7d80d]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 6: (()+0x76db) [0x7fc4604356db]
2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 7: (clone()+0x3f) [0x7fc45f61ba3f]
2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr:2020-08-04 09:18:26.600 7fc450483700 -1 *** Caught signal (Segmentation fault) **
2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay

And, indeed, the MDS is replaying ESlaveUpdate. Zheng, can you take a closer look at this?

This ticket is targeting 16.0.0 for now assuming it's also a bug in master.

#2 Updated by Zheng Yan about 2 months ago

  • Status changed from New to Fix Under Review

#3 Updated by Ramana Raja about 2 months ago

This issue was already reported at https://tracker.ceph.com/issues/46675

#4 Updated by Patrick Donnelly about 2 months ago

  • Target version changed from v16.0.0 to v14.2.11
  • Backport deleted (octopus,nautilus)

Bug is only in nautilus.

#5 Updated by Patrick Donnelly about 2 months ago

  • Pull request ID set to 36462

#6 Updated by Patrick Donnelly about 2 months ago

  • Related to Backport #45709: nautilus: mds: wrong link count under certain circumstance added

#7 Updated by Patrick Donnelly about 2 months ago

  • Duplicated by Bug #46675: nautilus: fs/upgrade test: Crash: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds added

#9 Updated by Zheng Yan about 1 month ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF