Project

General

Profile

Actions

Bug #23249

closed

ceph osd safe-to-destroy crashes the mgr

Added by Dan van der Ster about 6 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On luminous v12.2.4:

# ceph osd safe-to-destroy osd.240
Error EAGAIN: 34816 pgs have unknown state; cannot draw any conclusions

The mgr backtrace is missing the details:

    -5> 2018-03-06 16:48:36.319467 7f311cddf700  5 -- 188.184.88.158:6800/673570 >> 137.138.121.140:0/19758288
4 conn(0x55e3e2649000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=1). rx client.158871276
3 seq 1 0x55e44f36d200 command(tid 0: {"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["mgr",
""]}) v1
    -4> 2018-03-06 16:48:36.319531 7f30f70b0700  1 -- 188.184.88.158:6800/673570 <== client.1588712763 137.138
.121.140:0/197582884 1 ==== command(tid 0: {"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["m
gr", ""]}) v1 ==== 100+0+0 (1483619597 0 0) 0x55e44f36d200 con 0x55e3e2649000
    -3> 2018-03-06 16:48:36.319602 7f30f70b0700  4 mgr.server handle_command decoded 3
    -2> 2018-03-06 16:48:36.319612 7f30f70b0700  4 mgr.server handle_command prefix=osd safe-to-destroy
    -1> 2018-03-06 16:48:36.319633 7f30f70b0700  0 log_channel(audit) log [DBG] : from='client.1588712763 137.138.121.140:0/197582884' entity='client.admin' cmd=[{"prefix": "osd safe-to-destroy", "ids": ["osd.240"], "target": ["mgr", ""]}]: dispatch
     0> 2018-03-06 16:48:36.322268 7f30f70b0700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f30f70b0700 thread_name:ms_dispatch

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0x3ee911) [0x55e3d4671911]
 2: (()+0xf5e0) [0x7f3121a3b5e0]
 3: (DaemonServer::handle_command(MCommand*)+0x606c) [0x55e3d453669c]
 4: (DaemonServer::ms_dispatch(Message*)+0x105) [0x55e3d4536965]
 5: (DispatchQueue::entry()+0x792) [0x55e3d496b372]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x55e3d475a41d]
 7: (()+0x7e25) [0x7f3121a33e25]
 8: (clone()+0x6d) [0x7f3120b1634d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The cluster status at the moment I ran ceph osd safe-to-destory was:

# ceph status
  cluster:
    id:     eecca9ab-161c-474c-9521-0e5118612dbb
    health: HEALTH_WARN
            386402/168248376 objects misplaced (0.230%)
            Degraded data redundancy: 4005/168248376 objects degraded (0.002%), 5 pgs degraded, 5 pgs undersized

  services:
    mon: 3 daemons, quorum cepherin-mon-084bea7b06,cepherin0,cepherin1
    mgr: cepherin1(active), standbys: cepherin-mon-084bea7b06, cepherin0
    osd: 1018 osds: 1006 up, 1004 in; 181 remapped pgs

  data:
    pools:   3 pools, 34816 pgs
    objects: 82115k objects, 1522 TB
    usage:   3048 TB used, 1146 TB / 4194 TB avail
    pgs:     4005/168248376 objects degraded (0.002%)
             386402/168248376 objects misplaced (0.230%)
             34531 active+clean
             159   active+remapped+backfill_wait
             62    active+clean+scrubbing+deep
             42    active+clean+scrubbing
             17    active+remapped+backfilling
             5     active+undersized+degraded+remapped+backfilling

and osd.240 was down and fully out.


Related issues 2 (0 open2 closed)

Copied to mgr - Backport #24697: luminous: ceph osd safe-to-destroy crashes the mgrResolvedNathan CutlerActions
Copied to mgr - Backport #24708: mimic: ceph osd safe-to-destroy crashes the mgrResolvedNathan CutlerActions
Actions #1

Updated by Greg Farnum about 6 years ago

  • Project changed from Ceph to mgr
  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil about 6 years ago

You don't by chance have a core file?

Actions #3

Updated by Dan van der Ster about 6 years ago

Thanks, I do! ceph-post-file: afa874de-8d64-4748-b642-0de18c7835a7

Actions #4

Updated by Sage Weil almost 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to mimic,luminous
Actions #5

Updated by Sage Weil almost 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Patrick Donnelly almost 6 years ago

  • Copied to Backport #24697: luminous: ceph osd safe-to-destroy crashes the mgr added
Actions #7

Updated by Patrick Donnelly almost 6 years ago

  • Copied to Backport #24708: mimic: ceph osd safe-to-destroy crashes the mgr added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved".

Actions

Also available in: Atom PDF