Bug #41240: All of the cluster SSDs aborted at around the same time and will not start. - RADOS - Ceph

Actions

Copy link

Bug #41240

closed

All of the cluster SSDs aborted at around the same time and will not start.

Added by Troy Ablan almost 5 years ago. Updated about 4 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Brad Hubbard

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.6

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

There are 14 SSDs and a few hundred HDDs in this cluster.

The SSDs all crashed at around the same time, and they all abort immediately when started again.

They contain the rgw index and log pools, and a few smaller ones, and the bluestore side was nowhere near full.

Example logs via ceph-post-file

36177605-7519-437f-b4d7-0a2d9482d2f6
77bf5a07-1be3-4e77-badd-c6a9780aa011

Here's an exchange from IRC (times UTC-0700)

2019-08-13 17:05:12 < MooingLemur> I'm sorry to have this spill over into -devel, but, I have a cluster which had every device for several pools crash and won't restart. It immediately aborts. http://dpaste.com/0R5RFQ7
2019-08-13 17:05:47 < MooingLemur> I have a feeling this is due to bluefs not being balanced, because when I use ceph-kvstore-tool to compact, it aborts with bluefs enospc
2019-08-13 17:06:33 < MooingLemur> This is on Mimic 13.2.6, each OSD is a single device (not LVM, and no separate wal/db)
2019-08-13 17:07:35 < MooingLemur> Is there any way to verify that bluefs being short on space is what's going on, and if it is, what's my best remedy?
2019-08-13 17:07:37 * chardan (~chardan@97-120-4-46.ptld.qwest.net) has quit [Ping timeout: 480 seconds]
2019-08-13 17:11:08 tnielsen (~Adium@72-24-84-166.cpe.cableone.net) has quit [Quit: Leaving.]
2019-08-13 17:12:10 neha (~nojha@4.14.35.89) has quit [Quit: Leaving]
2019-08-13 17:13:42 < joshd> MooingLemur: you can set bluefs_alloc_size smaller than the default 1M, e.g. 256K or 64K to get ceph-kvstore-tool running. ENOSPC from bluefs means there aren't enough free
extents of the right size. compaction will end up using more space until it deletes old tables, so that's consistent with your observations
2019-08-13 17:15:34 beaver (~beaver@82.114.86.32) has quit [Ping timeout: 480 seconds]
2019-08-13 17:16:37 chardan (~chardan@2607:fb90:8068:253:82ef:2dc:618c:407e) has joined #ceph-devel
2019-08-13 17:20:22 huangjun (~huangjun@219.139.206.177) has quit [Remote host closed the connection]
2019-08-13 17:25:03 huangjun (~huangjun@219.139.206.177) has joined #ceph-devel
2019-08-13 17:25:08 < MooingLemur> joshd: that option passed on the command line, env var, or ceph.conf? (does ceph-kvstore-tool even read ceph.conf?)
2019-08-13 17:27:54 adam_ (~user@23.249.39.108) has joined #ceph-devel
2019-08-13 17:28:55 < MooingLemur> 2019-08-14 00:27:15.146 7ff5de70c700 1 bluefs allocate failed to allocate 0x4150000 on bdev 1, free 0xca0000; fallback to bdev 2
2019-08-13 17:28:56 < joshd> iirc it might only read ceph.conf
2019-08-13 17:29:30 < MooingLemur> looks like that works out to about 13MB free
2019-08-13 17:30:06 < joshd> that's when it'll fallback to getting space from bdev 2 (bluestore's block dev)
2019-08-13 17:30:18 < MooingLemur> 2019-08-14 00:27:15.146 7ff5de70c700 -1 bluefs _allocate failed to allocate 0x4150000 on bdev 2, dne
2019-08-13 17:31:26 < MooingLemur> if there's no separate db device, I would suspect that bdev 1 is the only place
2019-08-13 17:31:58 < joshd> there's still an internal separation between bluefs and bluestore
2019-08-13 17:32:55 < joshd> with separate bdev objects internally
2019-08-13 17:35:23 Tamil1 (~Adium@cpe-107-184-157-143.socal.res.rr.com) has joined #ceph-devel
2019-08-13 17:36:28 < MooingLemur> joshd: here's the full output of ceph-kvstore-tool http://dpaste.com/04Y5PHD
2019-08-13 17:37:19 < MooingLemur> (the option was in ceph.conf as well: "bluefs_alloc_size = 65536" under [global]
2019-08-13 17:42:12 Tamil (~Adium@cpe-107-184-157-143.socal.res.rr.com) has quit [Ping timeout: 480 seconds]
2019-08-13 17:44:02 wliu (~wliu@115.238.229.136) has quit [Ping timeout: 480 seconds]
2019-08-13 17:45:20 huangjun (~huangjun@219.139.206.177) has quit [Remote host closed the connection]
2019-08-13 17:46:05 < joshd> MooingLemur: I'd suggest trying to use ceph-objectstore-tool's get-map/set-map to replace the corrupted osdmap, without compacting
2019-08-13 17:48:19 < joshd> I've got to go now, good luck
2019-08-13 17:48:40 < MooingLemur> pull the osdmap from another osd?
2019-08-13 17:48:45 < MooingLemur> thanks for the pointers
2019-08-13 17:49:22 chardan (~chardan@2607:fb90:8068:253:82ef:2dc:618c:407e) has quit [Quit: Leaving]
2019-08-13 17:49:27 wliu_ (~wliu@115.238.229.136) has joined #ceph-devel
2019-08-13 17:50:58 huangjun (~huangjun@219.139.206.177) has joined #ceph-devel
2019-08-13 17:51:37 < MooingLemur> https://bpaste.net/show/z4y_ when running ceph-objectstore-tool
2019-08-13 17:57:57 < MooingLemur> aw man, can't even run `ceph-bluestore-tool repair` anymore without it aborting due to enospc
2019-08-13 17:58:25 zyan (~zhyan@240e:398:5f3:6880::f1f) has joined #ceph-devel
2019-08-13 17:59:00 distributedone (~distribut@100.42.98.196) has quit [Ping timeout: 480 seconds]
2019-08-13 18:09:30 tnielsen (~Adium@174.27.30.140) has joined #ceph-devel
2019-08-13 18:16:12 pdhange (~pdhange@59-100-169-49.cust.static-ipl.aapt.com.au) has quit [Ping timeout: 480 seconds]
2019-08-13 18:17:39 pdhange (~pdhange@61-69-103-54.mel.static-ipl.aapt.com.au) has joined #ceph-devel
2019-08-13 18:17:46 gsitlani (~gsitlani@1.23.226.86) has joined #ceph-devel
2019-08-13 18:28:16 * tnielsen (~Adium@174.27.30.140) has quit [Quit: Leaving.]
2019-08-13 18:33:49 < MooingLemur> welp, it appears that I can't set the osdmap (after getting the osdmap from another osd)
2019-08-13 18:34:02 < MooingLemur> osdmap (#-1:9b48f9f2:::osdmap.81046:0#) does not exist.
2019-08-13 18:34:26 < MooingLemur> perhaps I'm misunderstanding how this tool works, documentation is very sparse
2019-08-13 18:34:39 < MooingLemur> But it also could be that something more serious is amiss here

I tried compacting osd.53 using ceph-kvstore-tool and somehow I've made this OSD worse. I don't think I've made any other changes to any OSDs at the time of issue creation.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #41240

All of the cluster SSDs aborted at around the same time and will not start.

Updated by Troy Ablan almost 5 years ago

Updated by Troy Ablan almost 5 years ago

Updated by Troy Ablan over 4 years ago

Updated by Troy Ablan over 4 years ago

Updated by Brad Hubbard over 4 years ago

Updated by Brad Hubbard over 4 years ago

Updated by Troy Ablan over 4 years ago

Updated by Brad Hubbard over 4 years ago

Updated by Neha Ojha over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Brad Hubbard over 4 years ago

Updated by Dan van der Ster about 4 years ago

Updated by Brad Hubbard about 4 years ago