Project

General

Profile

Actions

Support #10024

open

Cluster unreachable after restart

Added by Luca Mazzaferro over 9 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

Dear Support,
I'm quite a new user, I already asked for this question to the users lists without any solution.
I see that others had the same problem.
So, following the instruction here:
http://ceph.com/docs/master/start/quick-ceph-deploy/
I was able to install a new cluster.
I have on the same host 4 VMs, 3 for the ods/monitoring nodes and 1 for the
admin-node.
Everything went fine.
Then I turned off all the machine and when I turned on again all the OSDs were impossible to restart
while the monitoring are running.
From one node:

[root@ceph-node1 ~]# service ceph -a status === mon.ceph-node1 ===
mon.ceph-node1: running {"version":"0.80.7"} === osd.0 ===
osd.0: not running.
[root@ceph-node1 ~]# service ceph restart osd.0 === osd.0 === === osd.0 ===
Stopping Ceph osd.0 on ceph-node1...done === osd.0 ===
failed: 'timeout 30 /usr/bin/ceph c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move - 0 0.02 host=ceph-node1 root=default'

The same for the others.

From the admin node when I try to run ceph -w or ceph health:

[rzgceph@admin-node my-cluster]$ ceph -w
2014-11-06 17:38:28.840380 7f97267ac700 0 monclient(hunting): authenticate timed out after 300
2014-11-06 17:38:28.840454 7f97267ac700 0 librados: client.admin authentication error (110) Connection timed out
Error connecting to cluster: TimedOut

So the cluster seems to be unreachable. In the log files there are any errors,

I also removed all the cluster and re-installed. Everything fine until I decided to
stop and start it again.

Can you help me to solve this issue please?
Thank you.
Best Regards

Luca Mazzaferro
Actions #1

Updated by Luca Mazzaferro over 9 years ago

Hi,
I've missed anything?
Did I do something wrong?
Because I didn't get any answer after more than 1 week.
Thanks
Best Regards,

Luca
Actions

Also available in: Atom PDF