Project

General

Profile

Actions

Bug #9077

closed

Cluster is up in MON node even if Ceph is uninstalled in OSD node

Added by Ramakrishnan P over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Configuration:
1 mon and 1 osd node, number of OSD's 7

Steps followed:

1. Make Cluster up in single node and ensure Cluster state is "OK"
2. do purge and uninstall ceph on OSD node
3. After uninstallation OSD node status

:~$ sudo ceph -s
sudo: ceph: command not found

:~$ sudo ceph -v
sudo: ceph: command not found

:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 55.4G 0 disk
??sda1 8:1 0 9.3G 0 part /
??sda2 8:2 0 1K 0 part
??sda5 8:5 0 1.9G 0 part [SWAP]
??sda6 8:6 0 44.2G 0 part /home
sdb 8:16 0 372.6G 0 disk
??sdb1 8:17 0 367.6G 0 part
??sdb2 8:18 0 5G 0 part
sdc 8:32 0 372.6G 0 disk
??sdc1 8:33 0 367.6G 0 part
??sdc2 8:34 0 5G 0 part
sdd 8:48 0 372.6G 0 disk
??sdd1 8:49 0 367.6G 0 part
??sdd2 8:50 0 5G 0 part
sde 8:64 0 372.6G 0 disk
??sde1 8:65 0 367.6G 0 part
??sde2 8:66 0 5G 0 part
sdf 8:80 0 372.6G 0 disk
sdg 8:96 0 372.6G 0 disk
sdh 8:112 0 372.6G 0 disk
sdi 8:128 0 372.6G 0 disk
??sdi1 8:129 0 367.6G 0 part
??sdi2 8:130 0 5G 0 part
sdj 8:144 0 372.6G 0 disk
??sdj1 8:145 0 367.6G 0 part
??sdj2 8:146 0 5G 0 part
sdk 8:160 0 372.6G 0 disk
??sdk1 8:161 0 367.6G 0 part
??sdk2 8:162 0 5G 0 part
sdl 8:176 0 372.6G 0 disk
??sdl1 8:177 0 367.6G 0 part
??sdl2 8:178 0 5G 0 part
sdm 8:192 0 372.6G 0 disk
sdn 8:208 0 745.2G 0 disk
sr0 11:0 1 1024M 0 rom

4. MON node status
:~$ sudo ceph -s
cluster 443ad6db-f0f0-44d5-b44c-36f86d29abee
health HEALTH_WARN 1391 pgs stale; 1391 pgs stuck stale
monmap e1: 1 mons at

{rack1-2=0.0.0.0:6789/0}
, election epoch 1, quorum 0 rack1-2
osdmap e1592: 7 osds: 2 up, 2 in
pgmap v63658: 1878 pgs, 7 pools, 389 GB data, 99717 objects
318 GB used, 416 GB / 734 GB avail
1391 stale+active+clean
487 active+clean

:~$ sudo ceph osd tree

1.id weight type name up/down reweight
-1 2.54 root default
-2 2.54 host rack1-1
0 0.38 osd.0 down 0
1 0.36 osd.1 down 0
2 0.36 osd.2 up 1
4 0.36 osd.4 down 0
6 0.36 osd.6 down 0
3 0.36 osd.3 up 1
5 0.36 osd.5 down 0

:~$ sudo ceph osd stat
osdmap e1592: 7 osds: 2 up, 2 in

:~$ sudo ceph pg stat
v63658: 1878 pgs: 1391 stale+active+clean, 487 active+clean; 389 GB data, 318 GB used, 416 GB / 734 GB avail

:~$ sudo ceph mon stat
e1: 1 mons at {rack1-2=0.0.0.0:6789/0}
, election epoch 1, quorum 0 rack1-2

Still mon node shows 2 OSD's are up, but there is no OSD mount present in OSD node and observed cluster in same state after 2 days.
Tried running IO, IO's not proceded.
There is no OSD or other logs since ceph is uninstalled.


Files

Logs.zip (149 KB) Logs.zip Ramakrishnan P, 08/13/2014 12:27 AM
ceph-mon.ems2.log1.gz (51.8 KB) ceph-mon.ems2.log1.gz MOn logs Ramakrishnan P, 09/22/2014 02:55 AM
dmesg.log (54.9 KB) dmesg.log dmesg info of mon node Ramakrishnan P, 09/22/2014 02:55 AM
dmesg1.log (82.3 KB) dmesg1.log dmesg info of OSD node Ramakrishnan P, 09/22/2014 02:55 AM
Actions #1

Updated by Sage Weil over 9 years ago

  • Status changed from New to Need More Info

can you turn up mon logging (if it isn't up already) and attach teh log from the leader? tehse should get marked down after 10-15 minutes.

Actions #2

Updated by Josh Durgin over 9 years ago

  • Project changed from rbd to Ceph
Actions #3

Updated by Ramakrishnan P over 9 years ago

Mon logs and dmesg logs of mon node are attached

Actions #4

Updated by Samuel Just over 9 years ago

Can you reproduce and verify that the two osds are actually not still running?

Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce

Updated by Ramakrishnan P over 9 years ago

Issue reproduced, find the following info

Attaching mon and dmesg log of monitor node

Executed following commands in OSD node and check status of cluster in mon node.
ceph-deploy ppurge ems3
ceph-deploy uninstall ems3
ceph-deploy purgedata ems3
ceph-deploy pure ems3

Cluster status:

ems@ems2:~$ sudo ceph -s
cluster 7149b6de-40f7-4816-ab85-b89ed136a108
health HEALTH_WARN 251 pgs stale; 251 pgs stuck stale; crush map has legacy tunables
monmap e1: 1 mons at {ems2=10.242.28.255:6789/0}, election epoch 2, quorum 0 ems2
osdmap e81: 11 osds: 3 up, 3 in
pgmap v164: 348 pgs, 2 pools, 32 bytes data, 5 objects
112 MB used, 1288 GB / 1288 GB avail
251 stale+active+clean
97 active+clean

Note: There is only one MON and OSD node in cluster (single node cluster with mon running separately)

Actions #7

Updated by Ramakrishnan P over 9 years ago

as per this document "http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/", that mon will get 3 acknowledgement from 1 OSd or based on "mon osd min down reports" value, so in this case if mon gets 2 acknowledgement as osd down and before getting 3rd acknowledgement if that particular OSD also goes down, then will it cause above reported problem ?

Actions #8

Updated by Ramakrishnan P over 9 years ago

as per this document "http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/", that mon will get 3 acknowledgement from 1 OSd or based on "mon osd min down reports" value, in this case if mon gets 2 acknowledgement as particular OSD down and before getting 3rd acknowledgement if that OSD also goes down, then will it cause above reported problem ?

Actions #9

Updated by Ramakrishnan P over 9 years ago

What will be state of OSD in 3 node cluster ?, in 3 node cluster there will be other OSD's running on other nodes, so will they ack to mon that these particular OSD's are down.
Is this behaviour expected ?

Actions

Also available in: Atom PDF