Actions
Bug #46831
closednautilus: mds: SIGSEGV in MDCache::finish_uncommitted_slave
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-08-04T09:18:26.606 INFO:tasks.ceph.mds.c.smithi163.stderr:*** Caught signal (Segmentation fault) ** 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: ceph version 14.2.10-256-gf23ff76200 (f23ff7620014d0d1324261eb383e8e25c588bdae) nautilus (stable) 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 1: (()+0x128a0) [0x7fc4604408a0] 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 2: (MDCache::finish_uncommitted_slave(metareqid_t, bool)+0x21e) [0x55d4c5f8004e] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 3: (ESlaveUpdate::replay(MDSRank*)+0xf9) [0x55d4c617de89] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 4: (MDLog::_replay_thread()+0x8b2) [0x55d4c611b6f2] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 5: (MDLog::ReplayThread::entry()+0xd) [0x55d4c5e7d80d] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 6: (()+0x76db) [0x7fc4604356db] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 7: (clone()+0x3f) [0x7fc45f61ba3f] 2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr:2020-08-04 09:18:26.600 7fc450483700 -1 *** Caught signal (Segmentation fault) **
From: /ceph/teuthology-archive/teuthology-2020-08-04_01:12:01-fs-nautilus-distro-basic-smithi/5285373/teuthology.log
Updated by Patrick Donnelly over 3 years ago
- Status changed from In Progress to New
- Assignee changed from Patrick Donnelly to Zheng Yan
Looks like this occurs shortly after the upgrade from Luminous:
2020-08-04T09:18:21.934 INFO:teuthology.run_tasks:Running task ceph.restart... 2020-08-04T09:18:21.947 INFO:tasks.ceph.mds.a:Restarting daemon 2020-08-04T09:18:21.947 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i a 2020-08-04T09:18:21.977 INFO:tasks.ceph.mds.a:Started 2020-08-04T09:18:21.978 INFO:tasks.ceph.mds.b:Restarting daemon 2020-08-04T09:18:21.978 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i b 2020-08-04T09:18:21.980 INFO:tasks.ceph.mds.b:Started 2020-08-04T09:18:21.981 INFO:tasks.ceph.mds.c:Restarting daemon 2020-08-04T09:18:21.981 INFO:teuthology.orchestra.run.smithi163:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i c 2020-08-04T09:18:21.983 INFO:tasks.ceph.mds.c:Started ... 2020-08-04T09:18:26.606 INFO:tasks.ceph.mds.c.smithi163.stderr:*** Caught signal (Segmentation fault) ** 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: ceph version 14.2.10-256-gf23ff76200 (f23ff7620014d0d1324261eb383e8e25c588bdae) nautilus (stable) 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 1: (()+0x128a0) [0x7fc4604408a0] 2020-08-04T09:18:26.607 INFO:tasks.ceph.mds.c.smithi163.stderr: 2: (MDCache::finish_uncommitted_slave(metareqid_t, bool)+0x21e) [0x55d4c5f8004e] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 3: (ESlaveUpdate::replay(MDSRank*)+0xf9) [0x55d4c617de89] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 4: (MDLog::_replay_thread()+0x8b2) [0x55d4c611b6f2] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 5: (MDLog::ReplayThread::entry()+0xd) [0x55d4c5e7d80d] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 6: (()+0x76db) [0x7fc4604356db] 2020-08-04T09:18:26.608 INFO:tasks.ceph.mds.c.smithi163.stderr: 7: (clone()+0x3f) [0x7fc45f61ba3f] 2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr:2020-08-04 09:18:26.600 7fc450483700 -1 *** Caught signal (Segmentation fault) ** 2020-08-04T09:18:26.609 INFO:tasks.ceph.mds.c.smithi163.stderr: in thread 7fc450483700 thread_name:md_log_replay
And, indeed, the MDS is replaying ESlaveUpdate. Zheng, can you take a closer look at this?
This ticket is targeting 16.0.0 for now assuming it's also a bug in master.
Updated by Zheng Yan over 3 years ago
- Status changed from New to Fix Under Review
Updated by Ramana Raja over 3 years ago
This issue was already reported at https://tracker.ceph.com/issues/46675
Updated by Patrick Donnelly over 3 years ago
- Target version changed from v16.0.0 to v14.2.11
- Backport deleted (
octopus,nautilus)
Bug is only in nautilus.
Updated by Patrick Donnelly over 3 years ago
- Related to Backport #45709: nautilus: mds: wrong link count under certain circumstance added
Updated by Patrick Donnelly over 3 years ago
- Has duplicate Bug #46675: nautilus: fs/upgrade test: Crash: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds added
Updated by Yuri Weinstein over 3 years ago
Updated by Zheng Yan over 3 years ago
- Status changed from Fix Under Review to Resolved
Actions