Project

General

Profile

Actions

Bug #24196

closed

when i used 5 mechines,3monitor,3 mon mgr,5osd,but one of the osd cannot alive

Added by li li almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

hello,everyone:
i confused this question,
the version ceph:

#ceph -v
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

#ceph -s
cluster:
id: 242af076-e2b8-42c0-87b9-8b724ceff26a
health: HEALTH_OK

services:
mon: 3 daemons, quorum fuxi-dl-master-01,fuxi-dl-master-02,fuxi-dl-master-03
mgr: mon_mgr(active, starting)
osd: 5 osds: 4 up, 4 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 4113 MB used, 291 TB / 291 TB avail
pgs:

osd log:

2018-05-18 18:05:17.946046 7f575ea81700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.4 down, but it is still running
2018-05-18 18:05:17.946061 7f575ea81700 0 log_channel(cluster) log [DBG] : map e32 wrongly marked me down at e32
2018-05-18 18:05:17.946065 7f575ea81700 0 osd.4 32 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
2018-05-18 18:05:17.946073 7f575ea81700 1 osd.4 32 start_waiting_for_healthy
2018-05-18 18:05:17.947562 7f575ea81700 0 osd.4 32 _committed_osd_maps shutdown OSD via async signal
2018-05-18 18:05:17.947643 7f5750264700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory
2018-05-18 18:05:17.947668 7f5750264700 -1 received signal: Interrupt from PID: 0 task name: <unknown> UID: 0
2018-05-18 18:05:17.947673 7f5750264700 -1 osd.4 32 * Got signal Interrupt *
2018-05-18 18:05:17.947679 7f5750264700 0 osd.4 32 prepare_to_stop starting shutdown
2018-05-18 18:05:17.947682 7f5750264700 -1 osd.4 32 shutdown
2018-05-18 18:05:21.950289 7f5750264700 1 bluestore(/var/lib/ceph/osd/ceph-4) umount
2018-05-18 18:05:21.953154 7f5750264700 1 stupidalloc 0x0x56408b759f70 shutdown
2018-05-18 18:05:21.953168 7f5750264700 1 freelist shutdown
2018-05-18 18:05:21.953220 7f5750264700 4 rocksdb: [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work
2018-05-18 18:05:21.953445 7f5750264700 4 rocksdb: [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2018-05-18 18:05:21.953487 7f5750264700 1 bluefs umount
2018-05-18 18:05:21.953495 7f5750264700 1 stupidalloc 0x0x56408b759bf0 shutdown
2018-05-18 18:05:21.953539 7f5750264700 1 bdev(0x56408b459200 /var/lib/ceph/osd/ceph-4/block) close
2018-05-18 18:05:22.256879 7f5750264700 1 bdev(0x56408b458d80 /var/lib/ceph/osd/ceph-4/block) close

what should i do?thank you for your reply

Actions #1

Updated by John Spray almost 6 years ago

If any OSD is getting wrongly marked down, it is usually either a network issue, or an over-loaded host. Check if all your OSDs can see each other and the mons over the network, and check if any of your OSD or mon nodes is out of memory or swapping.

Actions #2

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from New to Closed

Please take this question to the ceph-users mailing list.

Actions

Also available in: Atom PDF