Actions
Bug #23249
closedceph osd safe-to-destroy crashes the mgr
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On luminous v12.2.4:
# ceph osd safe-to-destroy osd.240 Error EAGAIN: 34816 pgs have unknown state; cannot draw any conclusions
The mgr backtrace is missing the details:
-5> 2018-03-06 16:48:36.319467 7f311cddf700 5 -- 188.184.88.158:6800/673570 >> 137.138.121.140:0/19758288 4 conn(0x55e3e2649000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx client.158871276 3 seq 1 0x55e44f36d200 command(tid 0: {"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["mgr", ""]}) v1 -4> 2018-03-06 16:48:36.319531 7f30f70b0700 1 -- 188.184.88.158:6800/673570 <== client.1588712763 137.138 .121.140:0/197582884 1 ==== command(tid 0: {"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["m gr", ""]}) v1 ==== 100+0+0 (1483619597 0 0) 0x55e44f36d200 con 0x55e3e2649000 -3> 2018-03-06 16:48:36.319602 7f30f70b0700 4 mgr.server handle_command decoded 3 -2> 2018-03-06 16:48:36.319612 7f30f70b0700 4 mgr.server handle_command prefix=osd safe-to-destroy -1> 2018-03-06 16:48:36.319633 7f30f70b0700 0 log_channel(audit) log [DBG] : from='client.1588712763 137.138.121.140:0/197582884' entity='client.admin' cmd=[{"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["mgr", ""]}]: dispatch 0> 2018-03-06 16:48:36.322268 7f30f70b0700 -1 *** Caught signal (Segmentation fault) ** in thread 7f30f70b0700 thread_name:ms_dispatch ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 1: (()+0x3ee911) [0x55e3d4671911] 2: (()+0xf5e0) [0x7f3121a3b5e0] 3: (DaemonServer::handle_command(MCommand*)+0x606c) [0x55e3d453669c] 4: (DaemonServer::ms_dispatch(Message*)+0x105) [0x55e3d4536965] 5: (DispatchQueue::entry()+0x792) [0x55e3d496b372] 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x55e3d475a41d] 7: (()+0x7e25) [0x7f3121a33e25] 8: (clone()+0x6d) [0x7f3120b1634d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
The cluster status at the moment I ran ceph osd safe-to-destory was:
# ceph status cluster: id: eecca9ab-161c-474c-9521-0e5118612dbb health: HEALTH_WARN 386402/168248376 objects misplaced (0.230%) Degraded data redundancy: 4005/168248376 objects degraded (0.002%), 5 pgs degraded, 5 pgs undersized services: mon: 3 daemons, quorum cepherin-mon-084bea7b06,cepherin0,cepherin1 mgr: cepherin1(active), standbys: cepherin-mon-084bea7b06, cepherin0 osd: 1018 osds: 1006 up, 1004 in; 181 remapped pgs data: pools: 3 pools, 34816 pgs objects: 82115k objects, 1522 TB usage: 3048 TB used, 1146 TB / 4194 TB avail pgs: 4005/168248376 objects degraded (0.002%) 386402/168248376 objects misplaced (0.230%) 34531 active+clean 159 active+remapped+backfill_wait 62 active+clean+scrubbing+deep 42 active+clean+scrubbing 17 active+remapped+backfilling 5 active+undersized+degraded+remapped+backfilling
and osd.240 was down and fully out.
Updated by Greg Farnum about 6 years ago
- Project changed from Ceph to mgr
- Priority changed from Normal to High
Updated by Dan van der Ster about 6 years ago
Thanks, I do! ceph-post-file: afa874de-8d64-4748-b642-0de18c7835a7
Updated by Sage Weil almost 6 years ago
- Status changed from New to Fix Under Review
- Backport set to mimic,luminous
Updated by Sage Weil almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Patrick Donnelly almost 6 years ago
- Copied to Backport #24697: luminous: ceph osd safe-to-destroy crashes the mgr added
Updated by Patrick Donnelly almost 6 years ago
- Copied to Backport #24708: mimic: ceph osd safe-to-destroy crashes the mgr added
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved".
Actions