Project

General

Profile

Actions

Bug #4924

closed

ceph-deploy: gatherkeys fails on raring (cuttlefish)

Added by Tamilarasi muthamizhan almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
ceph-deploy
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5)

trying to test ceph-deploy on raring and it fails at gatherkeys,

raring@ubuntu:~/ceph-dep/ceph-deploy$ ./ceph-deploy mon create ubuntu
raring@ubuntu:~/ceph-dep/ceph-deploy$ ps -ef | grep ceph
root 64906 1 0 13:09 ? 00:00:00 /usr/bin/ceph-mon --cluster=ceph -i ubuntu -f
raring 65116 8119 0 13:22 pts/5 00:00:00 grep --color=auto ceph

raring@ubuntu:~/ceph-dep/ceph-deploy$ ./ceph-deploy gatherkeys ubuntu
Unable to find /etc/ceph/ceph.client.admin.keyring on ['ubuntu']
Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring on ['ubuntu']
Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring on ['ubuntu']

mon.log:

2013-05-07 13:09:45.928544 7fb255cbe7c0 0 ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5), process ceph-mon, pid 64900
2013-05-07 13:09:46.022762 7fb255cbe7c0 -1 asok(0x1736000) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-mon.ubuntu.asok': (2) No such file or directory
2013-05-07 13:09:46.154531 7ff6e145a7c0 0 ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5), process ceph-mon, pid 64906
2013-05-07 13:09:46.200257 7ff6dd25a700 -1 asok(0x26100e0) AdminSocket: request 'mon_status' not defined
2013-05-07 13:09:46.248738 7ff6e145a7c0 1 mon.ubuntu@-1(probing) e0 preinit fsid ea6647ef-4812-4b9f-bfdf-0c60fa6a8a5d
2013-05-07 13:09:46.248869 7ff6e145a7c0 1 mon.ubuntu@-1(probing) e0 initial_members ubuntu, filtering seed monmap
2013-05-07 13:09:46.249356 7ff6e145a7c0 0 mon.ubuntu@-1(probing) e0 my rank is now 0 (was -1)
2013-05-07 13:09:46.249396 7ff6e145a7c0 1 mon.ubuntu@0(probing) e0 win_standalone_election
2013-05-07 13:09:46.273579 7ff6e145a7c0 0 log [INF] : mon.ubuntu@0 won leader election with quorum 0
2013-05-07 13:09:46.276367 7ff6e145a7c0 0 log [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
2013-05-07 13:09:46.278029 7ff6e145a7c0 0 log [INF] : mdsmap e1: 0/0/1 up
2013-05-07 13:09:46.293720 7ff6e145a7c0 1 mon.ubuntu@0(leader).osd e1 e1: 0 osds: 0 up, 0 in
2013-05-07 13:09:46.318449 7ff6e145a7c0 0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
2013-05-07 13:09:46.318932 7ff6e145a7c0 0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
2013-05-07 13:09:46.338564 7ff6e145a7c0 0 mon.ubuntu@0(leader).paxos(paxos active c 1..6) prepare_bootstrap
2013-05-07 13:09:46.338658 7ff6e145a7c0 0 mon.ubuntu@0(leader).paxos(paxos active c 1..6) finish_proposal no more proposals; bootstraping.
2013-05-07 13:09:46.338709 7ff6e145a7c0 1 mon.ubuntu@0(probing) e1 win_standalone_election
2013-05-07 13:09:46.340577 7ff6e145a7c0 0 log [INF] : mon.ubuntu@0 won leader election with quorum 0
2013-05-07 13:09:46.340756 7ff6e145a7c0 0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
2013-05-07 13:09:46.340919 7ff6e145a7c0 0 log [INF] : mdsmap e1: 0/0/1 up
2013-05-07 13:09:46.341074 7ff6e145a7c0 0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
2013-05-07 13:09:46.341742 7ff6e145a7c0 0 log [INF] : monmap e1: 1 mons at {ubuntu=192.168.73.155:6789/0}
2013-05-07 13:10:46.249668 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3370836 avail 15111272
2013-05-07 13:11:46.250612 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3370824 avail 15111284
2013-05-07 13:12:46.251666 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3370824 avail 15111284
2013-05-07 13:13:46.252631 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3370824 avail 15111284
2013-05-07 13:14:46.253753 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:15:46.254635 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:16:46.255684 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:17:46.256401 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:18:46.257684 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:19:46.258639 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:20:46.259686 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369024 avail 15113084
2013-05-07 13:21:46.260719 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369056 avail 15113052
2013-05-07 13:22:46.261764 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369080 avail 15113028
2013-05-07 13:23:46.262737 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369080 avail 15113028
2013-05-07 13:24:46.263682 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369080 avail 15113028
2013-05-07 13:25:46.264719 7ff6dba57700 0 mon.ubuntu@0(leader).data_health(1) update_stats avail 77% total 19478204 used 3369092 avail 15113016


Files

ceph-mon.ceph.log (39.5 KB) ceph-mon.ceph.log Rob Taylor, 08/15/2013 04:32 PM
ceph_debug_div (16.8 KB) ceph_debug_div diverse info bernhard glomm, 08/27/2013 07:46 AM
ceph_debug_logfiles (490 KB) ceph_debug_logfiles first part of /var/log/ceph/ceph-mon.* bernhard glomm, 08/27/2013 07:46 AM
Actions #1

Updated by Anonymous almost 11 years ago

I saw a similar behavior on a 12.04 (precise) installation. Here are my logs.

2013-05-07 17:21:17.546091 7f7ced601780 0 ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5), process ceph-mon, pid 1210
2013-05-07 17:21:18.571090 7f7ced601780 1 mon.issdm-44@-1(probing) e0 preinit fsid 32044951-1acc-468d-acb4-47aef08dd408
2013-05-07 17:21:18.571201 7f7ced601780 1 mon.issdm-44@-1(probing) e0 initial_members issdm-44-c, filtering seed monmap
2013-05-07 17:21:18.687516 7f7ced5ee700 0 -- 192.168.140.144:6789/0 >> 0.0.0.0:0/1 pipe(0x25b6280 sd=21 :0 s=1 pgs=0 cs=0 l=0).fault
2013-05-07 17:22:18.626053 7f7ce7ea8700 0 mon.issdm-44@-1(probing).data_health(0) update_stats avail 85% total 235530756 used 22536532 avail 201203736
2013-05-07 17:23:18.626187 7f7ce7ea8700 0 mon.issdm-44@-1(probing).data_health(0) update_stats avail 85% total 235530756 used 22536536 avail 201203732

Actions #2

Updated by Anonymous almost 11 years ago

Although, now that I look at it, my logs do have this line:
2013-05-07 17:21:18.687516 7f7ced5ee700 0 -- 192.168.140.144:6789/0 >> 0.0.0.0:0/1 pipe(0x25b6280 sd=21 :0 s=1 pgs=0 cs=0 l=0).fault

which does indicate a fault of some sort. I'll uninstall/reinstall and see if that behavior changes.

Actions #3

Updated by Sage Weil almost 11 years ago

  • Status changed from New to 7

this looks like a timing thing.. the ceph-create-keys is racing with ceph-mon startup and ceph is wrongly returning successful return value when the command is not recognized. wip-ceph-tool has a fix..

Actions #4

Updated by Tamilarasi muthamizhan almost 11 years ago

  • Status changed from 7 to Resolved

tested the fix on wip-ceph-tool. works fine.

Actions #5

Updated by Tamilarasi muthamizhan almost 11 years ago

  • Status changed from Resolved to 7
Actions #6

Updated by Sage Weil almost 11 years ago

  • Status changed from 7 to Resolved

commit:393c9372f82ef37fc6497dd46fc453507a463d42

Actions #7

Updated by Greg Poirier almost 11 years ago

I hate to kick a dead horse, but did this make it into 0.63 or will it be available in a later release? Ran into this on Scientific Linux last night. Was able to get around it by manually starting ceph-mon on my three mon nodes and then waiting for quorum before creating keys. I assume it is the same race.

Actions #8

Updated by Ian Colle almost 11 years ago

This fix landed in 0.61.1. Please try that (or a newer) version and see if you're still hitting it.

Actions #9

Updated by Greg Poirier almost 11 years ago

:/

0.61.2

[root@test-ceph-1001 ~]# yum list ceph
Loaded plugins: security
Installed Packages
ceph.x86_64 0.61.2-0.el6 @ceph

Scientific Linux 6.4

To reproduce, I installed ceph using ceph-deploy. Ran ceph-deploy new. Added MONs on all three of the nodes. Then running ceph-deploy gatherkeys saw the same error message above.

The ceph-create-keys process was running indefinitely on each of the three machines in my cluster. I probably should have straced it or something useful, but then I found this bug and figured out the way around it.

If I can provide any additional information, logs, help, reproducing with configuration options, etc, please let me know.

Actions #10

Updated by Greg Poirier almost 11 years ago

Okay so I tried duplicating this again today. And now I can't. I think it was due to an iptables issue at first, but was able to duplicate even after setting iptables to have a default allow policy. But now, no matter what I try I can't duplicate it. The first mon created takes a while to return, but once the cluster is in quorum it returns as expected.

So, nevermind.

Actions #11

Updated by Noah Watkins almost 11 years ago

I am seeing this same problem. I am using the latest master version of ceph-deploy, and the target node is Ubuntu 12.10.

After `mon create`, `gatherkeys` reports that it cannot find keys. I see that the `ceph-create-keys` process is still running after `mon create` returns.

nwatkins@kyoto:~$ ps aux | grep ceph
root     11847  0.0  0.0  35616  7040 ?        Ss   11:13   0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i kyoto
Actions #12

Updated by Noah Watkins almost 11 years ago

Actually, this looks like it is being caused by the monitor crashing, and ceph-create-keys not being able to connect to admin socket. I submitted an error report on the monitor issue to the mailing list.

Actions #13

Updated by Zoltan Arnold Nagy over 10 years ago

I'm still seeing this with the latest cuttlefish on ubuntu 13.04.

after doing ceph-deploy install ceph-dmon1 and then ceph-deploy mon create ceph-dmon1, gatherkeys fails.

logging in to ceph-dmon1 this is what I see:
root@ceph-dmon1:~# ps faux | grep ceph
root 9010 0.0 0.0 9436 912 pts/2 S+ 16:14 0:00 \_ grep --color=auto ceph
root 8764 0.1 0.0 132756 5972 ? Ssl 16:14 0:00 /usr/bin/ceph-mon --cluster=ceph -i ceph-dmon1 -f
root 8765 0.5 0.0 33808 7096 ? Ss 16:14 0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ceph-dmon1
root@ceph-dmon1:~#

Actions #14

Updated by Sage Weil over 10 years ago

  • Status changed from Resolved to Need More Info

Can you add 'debug mon = 20' and 'debug ms = 1' and 'debug monc = 20' to your ceph.conf, restart ceph-mon, and attach the resulting /var/log/ceph/ceph-mon.dmon1.log?

Also, canyou run ceph-create-keys manually (ceph-create-keys -i ceph-dmon1) and attach include that output as well?

Thanks!

Actions #15

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #16

Updated by Zoltan Arnold Nagy over 10 years ago

So after reinstalling the server, this went away. Next time I run into this, I'll update.

Actions #17

Updated by Rob Taylor over 10 years ago

This has just happened to me, so log with 'debug mon = 20' and 'debug ms = 1' and 'debug monc = 20' is attached.

runing ceph-create-keys manually gives:

INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'

repeated ad-infinitum.

I think I've been bit by various guises of this issue over multiple attempts to get ceph working in a vbox VM (for test harness purposes). At least, I've seen that ceph-create-keys hasn't terminated in all cases, but the install failing at different points - partiularly this and ceph-deploy osd activate/start ceph-osd-all failing to return.

Actions #18

Updated by bernhard glomm over 10 years ago

runing ceph-create-keys manually gives:

INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
repeated ad-infinitum.

I can confirm the broken ceph-deploy / ceph-create-keys

I like to get familiar with ceph so I'm running an automatically freshly installed uptodate raring ringtail system.
Than I would like the following code to be executed from one of the mon systems ('nike' in this case)

------------------------------------------------------------------------

#!/bin/bash
# (c) bglomm@ecologic.eu 2013
# initialize the ceph cluster
#

# our ceph systems
ceph_osds="ping pong" 
ceph_mons="nike ping pong king kong" 
ceph_mds="nike king kong" 

# prepare them
#for host in $ceph_mons; do
#   getent hosts $ceph_mons | cut -f-4 -d\.| ssh cephmngmnt@$host sudo cat \>\> /etc/hosts
#done
# that has been done during installation meanwhile

# install ceph
ceph-deploy install $ceph_mons

# deploy ceph
ceph-deploy new $ceph_mons

# create the monitors
ceph-deploy --overwrite-conf mon create $ceph_mons

# get the keys
for host in $ceph_mons; do
    ceph-deploy gatherkeys $host
done

for host in $ceph_osds;do
    ceph-deploy osd create $host:/dev/sdb2
done

------------------------------------------------------------------------

The first error is the missing keys when try to gather them
taking a look at the hosts, on all five there is the create-key process

/usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ping

running infinetly, no keys are created.

all /var/lib/ceph/bootstrap-*
are empty
nothing else (ceph -s etc) works since:
ERROR: missing keyring, cannot use cephx for authentication

btw:
/usr/sbin/ceph-create-keys is starting automatically after a reboot again,
service ceph restart doesn't work at all?

................................

root@ping[/0]:~ # ps ax | egrep ceph
 3257 ?        Ssl    0:03 /usr/bin/ceph-mon --cluster=ceph -i ping -f
 3258 ?        Ss     0:03 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ping
17774 pts/0    S+     0:00 egrep --color=auto ceph
root@ping[/0]:~ # service ceph restart
root@ping[/0]:~ # ps ax | egrep ceph
 3257 ?        Ssl    0:03 /usr/bin/ceph-mon --cluster=ceph -i ping -f
 3258 ?        Ss     0:03 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ping
17816 pts/0    S+     0:00 egrep --color=auto ceph
root@ping[/0]:~ # kill -9 3257 3258 && sleep 5
root@ping[/0]:~ # ps ax | egrep ceph
17998 ?        Ss     0:00 /bin/sh -e -c /usr/bin/ceph-mon --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh
17999 ?        Sl     0:00 /usr/bin/ceph-mon --cluster=ceph -i ping -f
18000 ?        Ss     0:00 /bin/sh -e -c /usr/sbin/ceph-create-keys --cluster="${cluster:-ceph}" -i "${id:-$(hostname)}" /bin/sh
18001 ?        S      0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ping
18073 pts/0    S+     0:00 egrep --color=auto ceph

Anybody any ideas how to deploy ceph at the moment

best regards

Bernhard

Actions #19

Updated by bernhard glomm over 10 years ago

P.S.: 
root@nike[/1]:~ # ceph-deploy --version
1.2.1
root@nike[/1]:~ # ceph --version
ceph version 0.67 (e3b7bc5bce8ab330ec1661381072368af3c218a0)
Actions #20

Updated by Sage Weil over 10 years ago

bernhard: i think the problem in your case is that you have old keyrings in /etc/ceph from prior cluster instances. can you add, at the top of your script before the install,

ceph-deploy purge $all
ceph-dpeloy purgedata $all

to blow away /etc/ceph and /var/lib/ceph contents.

Actions #21

Updated by Bartlomiej Palmowski over 10 years ago

Hi,

I'm hitting the same bug on red hat 6.3 (Santiago), purging /var/lib/ceph and /etc/ceph doesn't help.

Actions #22

Updated by Michael Potter over 10 years ago

Getting the same thing on a fresh install of CentOS 6.4

[## ceph-cluster]# ceph-deploy mon create ##
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ##
[ceph_deploy.mon][DEBUG ] detecting platform for host ## ...
[ceph_deploy.mon][INFO  ] distro info: CentOS 6.4 Final
[##][DEBUG ] deploying mon to ##
[##][DEBUG ] remote hostname: ##
[##][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[##][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-##
[##][INFO  ] create a done file to avoid re-doing the mon deployment
[##][INFO  ] create the init path if it does not exist
[##][INFO  ] locating `service` executable...
[##][INFO  ] found `service` executable: /sbin/service
[##][INFO  ] Running command: /sbin/service ceph start mon.##
[root@## ceph-cluster]# ceph-deploy gatherkeys ##
[ceph_deploy.gatherkeys][DEBUG ] Checking ## for /etc/ceph/ceph.client.admin
[ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on ['##']
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
[ceph_deploy.gatherkeys][DEBUG ] Checking ## for /var/lib/ceph/bootstrap-osd
[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring on ['##']
[ceph_deploy.gatherkeys][DEBUG ] Checking ## for /var/lib/ceph/bootstrap-mds
[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring on ['##']
[root@## ceph-cluster]# ceph-deploy --version
1.2.1
[root@## ceph-cluster]# ceph --version
ceph version 0.67 (e3b7bc5bce8ab330ec1661381072368af3c218a0)
[root@## ceph-cluster]#  /usr/bin/python /usr/sbin/ceph-create-keys -i ##
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
Actions #23

Updated by Sage Weil over 10 years ago

Michael Potter wrote:

Getting the same thing on a fresh install of CentOS 6.4

[...]

Can you post the output of 'ceph daemon mon.### mon_status', and contents of /etc/ceph/ceph.conf?

Thanks!

Actions #24

Updated by Diego Woitasen over 10 years ago

I think the documentation is a little confusing. I had the same problems minutes ago I fixed it. In my escenario I have two nodes right now, ceph-admin and ceph-server-01.

Steps that failed:
root@ceph-admin:~/mycluster# ceph-deploy new ceph-admin
root@ceph-admin:~/mycluster# ceph-deploy install ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy mon create ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy gatherkeys ceph-server-01
Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph-server-01']
Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring on ['ceph-server-01']
Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring on ['ceph-server-01']

To clean everything:
root@ceph-admin:~/mycluster# ceph-deploy purge ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy purgedata ceph-server-01

Steps that worked:
root@ceph-admin:~/mycluster# ceph-deploy new ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy install ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy mon create ceph-server-01
root@ceph-admin:~/mycluster# ceph-deploy gatherkeys ceph-server-01

Documentation isn't very clear about that we should specify as node when we execute "new".

Actions #26

Updated by Sage Weil over 10 years ago

Diego Woitasen wrote:

What do you think? https://github.com/ceph/ceph/pull/510

aha, i bet this is what is tripping up a lot of people. it's specifically the mon nodes that need to be named in the new step. i'll tweak the docs!

Actions #27

Updated by Michael Potter over 10 years ago

Sage Weil wrote:

Michael Potter wrote:

Getting the same thing on a fresh install of CentOS 6.4

[...]

Can you post the output of 'ceph daemon mon.### mon_status', and contents of /etc/ceph/ceph.conf?

Thanks!

Cheers Sage, hoping it's just an obvious trap some of us newcomers are falling into. Tried with and without osd/mon settings in the conf (Not got to the stage of making some osd's yet anyway!).
The "ceph-mon is not in quorum" error looks like it should be so obvious to resolve but when there's only one node I'm not sure how that's even possible.

[root@#subdomain-identifier# ceph-cluster]# ceph daemon mon.#subdomain-identifier# mon_status
{ "name": "#subdomain-identifier#",
  "rank": 0,
  "state": "leader",
  "election_epoch": 1,
  "quorum": [
        0],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 1,
      "fsid": "f40acb72-33fb-43a9-bf62-691bb047f0b6",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "#subdomain-identifier#",
              "addr": "[::1]:6789\/0"}]}}

[root@#subdomain-identifier# ceph-cluster]# cat /etc/ceph/ceph.conf
[mon.a]
mon_addr = #subdomain-identifier#.#resolveable-domain#
host = #subdomain-identifier#

[osd.0]
host = #subdomain-identifier#

[global]
filestore_xattr_use_omap = true
mon_host = #ipaddr#
osd_journal_size = 1024
mon_initial_members = #subdomain-identifier#
auth_supported = cephx
osd_crush_chooseleaf_type = 0
fsid = 77cdf265-09d1-42a6-98b9-56a39fdefea1

[osd.1]
host = #subdomain-identifier#

[osd.2]
host = #subdomain-identifier#

Also -

Nice ideas Diego, I'm still on the 'Get a single node running Ceph' stage of usage though so there might be more than one way to cause this one (from my failed google-fu on solving the issue that seems quite probable) .

Actions #28

Updated by Sage Weil over 10 years ago

Ah, I think the problem is

{ "rank": 0,
"name": "#subdomain-identifier#",
"addr": "[::1]:6789\/0"}]}}

It needs to be the IP address that matches the mon addr line. My first guess is that there is an /etc/hosts entry on the mon machine that maps the host.domain to [::1]. Try changing that to whatever dns resolves to?

Actions #29

Updated by Michael Potter over 10 years ago

Hi Sage, took everything out of the host for except for #ipaddr# #subdomain-identifier#.#resolveable-domain#
Cleaned and re-installed everything. Now getting the below output.

[root@#subdomain-identifier# ceph-cluster]# ceph daemon mon.#subdomain-identifier# mon_status
{ "name": "#subdomain-identifier#",
  "rank": 0,
  "state": "leader",
  "election_epoch": 1,
  "quorum": [
        0],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 1,
      "fsid": "77cdf265-09d1-42a6-98b9-56a39fdefea1",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "#subdomain-identifier#",
              "addr": "#ipaddr#:6789\/0"}]}}

[root@#subdomain-identifier# ceph-cluster]# cat /etc/ceph/ceph.conf
[mon.a]
mon_addr = #ipaddr#:6789
...

Noticed I was now getting OSD warnings on service ceph restart, huzzah progress! But it still wasn't working...

[root@#subdomain-identifier# ceph-cluster]# ps ax | egrep ceph
 4570 pts/0    Sl     0:00 /usr/bin/ceph-mon -i #subdomain-identifier# --pid-file /var/run/ceph/mon.#subdomain-identifier#.pid -c /etc/ceph/ceph.conf
 4794 pts/0    S      0:00 /usr/bin/python /usr/sbin/ceph-create-keys -i a
 5768 pts/0    S+     0:00 egrep ceph
[root@#subdomain-identifier# ceph-cluster]# ceph-create-keys -i #subdomain-identifier#.#resolveable-domain#
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

Hmm...
Starting to look a bit more like [[http://tracker.ceph.com/issues/5777]] now.

[root@#subdomain-identifier# ceph-cluster]# ls /var/run/ceph/
ceph-mon.#subdomain-identifier#.asok  mon.#subdomain-identifier#.pid

[root@#subdomain-identifier# ceph-cluster]# ceph --admin-daemon /var/run/ceph/ceph-mon.#subdomain-identifier#.asok mon_status
Returns the same as ceph daemon mon.#subdomain-identifier# mon_status

Actions #30

Updated by Sage Weil over 10 years ago

Michael Potter wrote:

Hi Sage, took everything out of the host for except for #ipaddr# #subdomain-identifier#.#resolveable-domain#
Cleaned and re-installed everything. Now getting the below output.

[...]

Noticed I was now getting OSD warnings on service ceph restart, huzzah progress!

Yay! So just to confirm, the problem was that /etc/hosts has the loopback addr?

But it still wasn't working...

[...]

Hmm...
Starting to look a bit more like [[http://tracker.ceph.com/issues/5777]] now.

[...]

[root@#subdomain-identifier# ceph-cluster]# ceph --admin-daemon /var/run/ceph/ceph-mon.#subdomain-identifier#.asok mon_status
Returns the same as ceph daemon mon.#subdomain-identifier# mon_status

What do you get if you run

ceph-create-keys -v -i `hostname` ?
Actions #31

Updated by Michael Potter over 10 years ago

Hi Sage, sorted it just before your reply. Was idly scrolling back through the thread when I spotted the word 'iptables' and smacked my head on the desk.

[root@#subdomain-identifier# ceph-cluster]# ceph-deploy gatherkeys #subdomain-identifier#.#resolveable-domain#
[ceph_deploy.gatherkeys][DEBUG ] Checking #subdomain-identifier#.#resolveable-domain# for /etc/ceph/ceph.client.admin.keyring
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.client.admin.keyring key from #subdomain-identifier#.#resolveable-domain#.
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
[ceph_deploy.gatherkeys][DEBUG ] Checking #subdomain-identifier#.#resolveable-domain# for /var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-osd.keyring key from #subdomain-identifier#.#resolveable-domain#.
[ceph_deploy.gatherkeys][DEBUG ] Checking #subdomain-identifier#.#resolveable-domain# for /var/lib/ceph/bootstrap-mds/ceph.keyring
[ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-mds.keyring key from #subdomain-identifier#.#resolveable-domain#.
Actions #32

Updated by Sage Weil over 10 years ago

  • Status changed from Need More Info to Resolved

closing out this bug. i think i captured everything we learned in http://pad.ceph.com/p/quorum_pitfalls along with action items for making ceph-deploy more bullet-proof

Actions #33

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

bernhard: i think the problem in your case is that you have old keyrings in /etc/ceph from prior cluster instances. can you add, at the top of your script before the install,

ceph-deploy purge $all
ceph-dpeloy purgedata $all

to blow away /etc/ceph and /var/lib/ceph contents.

thnx Sage, but fresh install is fresh install ;-).
Install as in install, not image replay.
ceph-create-keys just runs on all mons without end, without doing anything...
Is there another way to create the keys?
Haven't found a straight howto for deploying the cluster without ceph-deploy/mkcephfs yet,
could use my fai/cfengine to do the job is I know what needs to be done.

Actions #34

Updated by Sage Weil over 10 years ago

bernhard glomm wrote:

Sage Weil wrote:

bernhard: i think the problem in your case is that you have old keyrings in /etc/ceph from prior cluster instances. can you add, at the top of your script before the install,

ceph-deploy purge $all
ceph-dpeloy purgedata $all

to blow away /etc/ceph and /var/lib/ceph contents.

thnx Sage, but fresh install is fresh install ;-).
Install as in install, not image replay.
ceph-create-keys just runs on all mons without end, without doing anything...

Can you post the contents of /etc/ceph/ceph.conf, /etc/hosts, and on a couple of mons the output from 'ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname`.asok mon_status`? That should tell us what is going on.

Is there another way to create the keys?
Haven't found a straight howto for deploying the cluster without ceph-deploy/mkcephfs yet,
could use my fai/cfengine to do the job is I know what needs to be done.

Key creation can be done in a few different ways, but whatever is preventing ceph-create-keys from working may throw a wrench in other methods too, and I'd really like to get to the bottom of it :) Thanks!

Actions #35

Updated by bernhard glomm over 10 years ago

Sage,

sorry for being late on this, other tasks kept me busy,
but here the infos you were asking for:

ceph-create-keys just runs on all mons without end, without doing anything...

Can you post the contents of /etc/ceph/ceph.conf, /etc/hosts, and on a couple of mons the output from 'ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname`.asok mon_status`? That should tell us what is going on.

Is there another way to create the keys?
Haven't found a straight howto for deploying the cluster without ceph-deploy/mkcephfs yet,
could use my fai/cfengine to do the job is I know what needs to be done.

Key creation can be done in a few different ways, but whatever is preventing ceph-create-keys from working may throw a wrench in other methods too, and I'd really like to get to the bottom of it :) Thanks!

the results looks quite strange to me,
yes the 5 hosts names can be resolved back and forth through DNS
AND have /etc/hosts entries an all nodes:

# ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname`.asok mon_status
# on:
# nike:
{ "name": "nike",
  "rank": 1,
  "state": "electing",
  "election_epoch": 1,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "192.168.242.31:6789\/0",
        "192.168.242.32:6789\/0",
        "192.168.242.92:6789\/0",
        "192.168.242.93:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "65c1781b-95f0-40ec-b317-f2f0729a46ff",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "king",
              "addr": "192.168.242.31:6789\/0"},
            { "rank": 1,
              "name": "nike",
              "addr": "192.168.242.36:6789\/0"},
            { "rank": 2,
              "name": "ping",
              "addr": "192.168.242.92:6789\/0"},
            { "rank": 3,
              "name": "pong",
              "addr": "192.168.242.93:6789\/0"},
            { "rank": 4,
              "name": "kong",
              "addr": "0.0.0.0:0\/4"}]}}

# king
{ "name": "king",
  "rank": 0,
  "state": "electing",
  "election_epoch": 1,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "192.168.242.32:6789\/0",
        "192.168.242.36:6789\/0",
        "192.168.242.92:6789\/0",
        "192.168.242.93:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "65c1781b-95f0-40ec-b317-f2f0729a46ff",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "king",
              "addr": "192.168.242.31:6789\/0"},
            { "rank": 1,
              "name": "ping",
              "addr": "192.168.242.92:6789\/0"},
            { "rank": 2,
              "name": "pong",
              "addr": "192.168.242.93:6789\/0"},
            { "rank": 3,
              "name": "nike",
              "addr": "0.0.0.0:0\/3"},
            { "rank": 4,
              "name": "kong",
              "addr": "0.0.0.0:0\/4"}]}}

# kong
{ "name": "kong",
  "rank": 0,
  "state": "electing",
  "election_epoch": 1,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "192.168.242.31:6789\/0",
        "192.168.242.36:6789\/0",
        "192.168.242.92:6789\/0",
        "192.168.242.93:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "65c1781b-95f0-40ec-b317-f2f0729a46ff",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "kong",
              "addr": "192.168.242.32:6789\/0"},
            { "rank": 1,
              "name": "ping",
              "addr": "192.168.242.92:6789\/0"},
            { "rank": 2,
              "name": "pong",
              "addr": "192.168.242.93:6789\/0"},
            { "rank": 3,
              "name": "nike",
              "addr": "0.0.0.0:0\/3"},
            { "rank": 4,
              "name": "king",
              "addr": "0.0.0.0:0\/4"}]}}

# ping
{ "name": "ping",
  "rank": 1,
  "state": "electing",
  "election_epoch": 1,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "192.168.242.31:6789\/0",
        "192.168.242.32:6789\/0",
        "192.168.242.36:6789\/0",
        "192.168.242.93:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "65c1781b-95f0-40ec-b317-f2f0729a46ff",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "nike",
              "addr": "192.168.242.36:6789\/0"},
            { "rank": 1,
              "name": "ping",
              "addr": "192.168.242.92:6789\/0"},
            { "rank": 2,
              "name": "pong",
              "addr": "192.168.242.93:6789\/0"},
            { "rank": 3,
              "name": "king",
              "addr": "0.0.0.0:0\/3"},
            { "rank": 4,
              "name": "kong",
              "addr": "0.0.0.0:0\/4"}]}}

# pong
{ "name": "pong",
  "rank": 2,
  "state": "electing",
  "election_epoch": 1,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "192.168.242.31:6789\/0",
        "192.168.242.32:6789\/0",
        "192.168.242.36:6789\/0",
        "192.168.242.92:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "65c1781b-95f0-40ec-b317-f2f0729a46ff",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "nike",
              "addr": "192.168.242.36:6789\/0"},
            { "rank": 1,
              "name": "ping",
              "addr": "192.168.242.92:6789\/0"},
            { "rank": 2,
              "name": "pong",
              "addr": "192.168.242.93:6789\/0"},
            { "rank": 3,
              "name": "king",
              "addr": "0.0.0.0:0\/3"},
            { "rank": 4,
              "name": "kong",
              "addr": "0.0.0.0:0\/4"}]}}

# /etc/ceph/ceph.conf
[global]
fsid = 65c1781b-95f0-40ec-b317-f2f0729a46ff
mon_initial_members = ping, pong, nike, king, kong
mon_host = 192.168.242.92,192.168.242.93,192.168.242.36,192.168.242.31,192.168.242.32
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

# files/folders created during ceph_deploy
# on ping=OSD

root@ping[/1]:~ # tree -A /var/lib/ceph/
/var/lib/ceph/
??? bootstrap-mds
??? bootstrap-osd
??? mds
??? mon
?   ??? ceph-ping
?       ??? done
?       ??? keyring
?       ??? store.db
?       ?   ??? 000005.sst
?       ?   ??? 000006.log
?       ?   ??? CURRENT
?       ?   ??? LOCK
?       ?   ??? LOG
?       ?   ??? LOG.old
?       ?   ??? MANIFEST-000004
?       ??? upstart
??? osd
??? tmp

root@ping[/0]:~ # tree -A /etc/ceph/
/etc/ceph/
??? ceph.conf
??? rbdmap

0 directories, 2 files

# on nike
/var/lib/ceph/
??? bootstrap-mds
??? bootstrap-osd
??? mds
??? mon
?   ??? ceph-nike
?       ??? done
?       ??? keyring
?       ??? store.db
?       ?   ??? 000005.sst
?       ?   ??? 000006.log
?       ?   ??? CURRENT
?       ?   ??? LOCK
?       ?   ??? LOG
?       ?   ??? LOG.old
?       ?   ??? MANIFEST-000004
?       ??? upstart
??? osd
??? tmp

8 directories, 10 files

root@nike[/0]:~ # tree -A /etc/ceph/
/etc/ceph/
??? ceph.conf
??? ceph.log
??? ceph.mon.keyring
??? rbdmap

0 directories, 4 files

Actions #36

Updated by Sage Weil over 10 years ago

Ooh, I think I know what this is. This is probably cuttlefish v0.61.7 or older, right? There is a fix in dumpling (and backported for v0.61.8 cuttlefish) that forces and internal refresh of the monmap and prevents this sort of hang-up during the initial quorum convergence.

Actions #37

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

Ooh, I think I know what this is. This is probably cuttlefish v0.61.7 or older, right? There is a fix in dumpling (and backported for v0.61.8 cuttlefish) that forces and internal refresh of the monmap and prevents this sort of hang-up during the initial quorum convergence.

Sorry to dissapoint you:

root@nike[/0]:~ # ceph --version
ceph version 0.67.2 (eb4380dd036a0b644c6283869911d615ed729ac8)

Is there an alternativ howto to set the cluster up?
How can I generate the keys manually?

TIA

Actions #38

Updated by Sage Weil over 10 years ago

  • Status changed from Resolved to Need More Info

bernhard glomm wrote:

Sage Weil wrote:

Ooh, I think I know what this is. This is probably cuttlefish v0.61.7 or older, right? There is a fix in dumpling (and backported for v0.61.8 cuttlefish) that forces and internal refresh of the monmap and prevents this sort of hang-up during the initial quorum convergence.

Sorry to dissapoint you:

root@nike[/0]:~ # ceph --version
ceph version 0.67.2 (eb4380dd036a0b644c6283869911d615ed729ac8)

Is there an alternativ howto to set the cluster up?
How can I generate the keys manually?

TIA

In that case, can you set 'debug ms = 1' 'debug mon = 20' in the ceph.conf (after ceph-deploy new, before ceph-deploy mon create), and reproduce the situation, and then post the /var/log/ceph/ceph-mon.* logs somewhere where I can look? I would like to get to the bottom of this. I suspect you can work around it by starting with fewer mons, but if you have the time I'd like identify the bug first. Thanks!

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

bernhard glomm wrote:

Sage Weil wrote:

Ooh, I think I know what this is. This is probably cuttlefish v0.61.7 or older, right? There is a fix in dumpling (and backported for v0.61.8 cuttlefish) that forces and internal refresh of the monmap and prevents this sort of hang-up during the initial quorum convergence.

Sorry to dissapoint you:

root@nike[/0]:~ # ceph --version
ceph version 0.67.2 (eb4380dd036a0b644c6283869911d615ed729ac8)

Is there an alternativ howto to set the cluster up?
How can I generate the keys manually?

TIA

In that case, can you set 'debug ms = 1' 'debug mon = 20' in the ceph.conf (after ceph-deploy new, before ceph-deploy mon create), and reproduce the situation, and then post the /var/log/ceph/ceph-mon.* logs somewhere where I can look? I would like to get to the bottom of this. I suspect you can work around it by starting with fewer mons, but if you have the time I'd like identify the bug first. Thanks!

Here it comes...
I attached 2 files:
- "ceph_debug_div"
explains again what I want to do, has the ceph-deploy commands that I run from a script and a some more information, OS version, and some of the results, including the output of "ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname`.asok mon_status"

- "ceph_debug_logfiles"
the beginning part of the "var/log/ceph/ceph-mon.* logs" off the 5 machines. Since the logfiles grow by the second I cut them somwhere after the repitition is obvious.
Hope that helps
Can run further tests if you like, just let me know

Actions #40

Updated by Sage Weil over 10 years ago

  • Status changed from Need More Info to 7

Bernhard, thanks for those logs--I think I've identified the problem. Can you try with wip-4924 (based off of dumpling)? It will show up on gitbuilder.ceph.com shortly. Thanks!

Actions #41

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

Bernhard, thanks for those logs--I think I've identified the problem. Can you try with wip-4924 (based off of dumpling)? It will show up on gitbuilder.ceph.com shortly. Thanks!

sorry for my blankness ;-)
You got me two lines how I do that??

TIA

Actions #42

Updated by bernhard glomm over 10 years ago

sorry, didn't realised on the first glance it was a normal package repository...
got it, will post results ASAP

Actions #43

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

Bernhard, thanks for those logs--I think I've identified the problem. Can you try with wip-4924 (based off of dumpling)? It will show up on gitbuilder.ceph.com shortly. Thanks!

Hi Sage

wip-4924 seems to do the job,
now I run into another show stopper ;-)
OSD creation fails/disk preparation fails.

will have a look into that tomorrow,
here just the output of ceph-deploy

+ ceph-deploy -v gatherkeys atom02
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring
+ for host in '$ceph_osds'
+ ceph-deploy -v osd create ping:/dev/sdb
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ping:/dev/sdb:
[ceph_deploy.osd][DEBUG ] Deploying osd to ping
[ceph_deploy.osd][DEBUG ] Host ping is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host ping disk /dev/sdb journal None activate True
[ceph_deploy.osd][ERROR ] ceph-disk-prepare --cluster ceph -- /dev/sdb returned 1

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format.
***************************************************************

Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
Non-GPT disk; not saving changes. Use -g to override.

INFO:ceph-disk:Will colocate journal with data on /dev/sdb
ceph-disk: Error: Command '['sgdisk', '--new=2:0:1024M', '--change-name=2:ceph journal', '--partition-guid=2:81b63820-f2a9-433d-925d-d70b238292de', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--', '/dev/sdb']' returned non-zero exit status 3

[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

+ for host in '$ceph_osds'
+ ceph-deploy -v osd create pong:/dev/sdb
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks pong:/dev/sdb:
[ceph_deploy.osd][DEBUG ] Deploying osd to pong
[ceph_deploy.osd][DEBUG ] Host pong is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host pong disk /dev/sdb journal None activate True
[ceph_deploy.osd][ERROR ] ceph-disk-prepare --cluster ceph -- /dev/sdb returned 1

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format.
***************************************************************

Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
Non-GPT disk; not saving changes. Use -g to override.

INFO:ceph-disk:Will colocate journal with data on /dev/sdb
ceph-disk: Error: Command '['sgdisk', '--new=2:0:1024M', '--change-name=2:ceph journal', '--partition-guid=2:43eb9a2a-d315-4866-9fa7-e254ccc6d767', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--', '/dev/sdb']' returned non-zero exit status 3

[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
Actions #44

Updated by Sage Weil over 10 years ago

  • Status changed from 7 to Resolved

thanks for helping test this!

Actions #45

Updated by Sage Weil over 10 years ago

bernhard glomm wrote:

Sage Weil wrote:

Bernhard, thanks for those logs--I think I've identified the problem. Can you try with wip-4924 (based off of dumpling)? It will show up on gitbuilder.ceph.com shortly. Thanks!

Hi Sage

wip-4924 seems to do the job,
now I run into another show stopper ;-)
OSD creation fails/disk preparation fails.

will have a look into that tomorrow,
here just the output of ceph-deploy

[...]

add the --zap-disk argument to blow away the old partition table?

Actions #46

Updated by bernhard glomm over 10 years ago

Sage Weil wrote:

thanks for helping test this!

better free software ;-)

add the --zap-disk argument to blow away the old partition table?

arggghhh, thnx

Yepp, now it works as expected.
Let's c what I can break next...

Actions #47

Updated by bernhard glomm over 10 years ago

P.S.:

When does the fix will make it into the main branch (dumpling)?
I keep working with wip-4924 for now?

Actions #48

Updated by Sage Weil over 10 years ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from High to Urgent
Actions #49

Updated by Sage Weil over 10 years ago

this will go into the next dumpling point release. thanks again for helping track it down!

Actions #50

Updated by Sage Weil over 10 years ago

  • Status changed from Pending Backport to Resolved
Actions #51

Updated by Abhay Sachan over 10 years ago

Hi All,
I am still hitting this in RHEL 6.4 with latest dumpling release (ceph-0.67.3-0.el6.x86_64).
For my setup, I am trying to create 3 MONs, mon creation goes through but create-keys hangs indefinitely.

Regards,
Abhay

Actions #52

Updated by Edward Hope-Morley over 10 years ago

I have just managed to deploy 3 ceph nodes successfully in Ubuntu Raring using Dumpling 0.67.3. The above issue looks a lot like an issue I hit a while back whereby gather-keys was failing following mon creation. I resolved this issue by ensuring that the mons were in quorum before executing gather-keys and I no longer see this issue.

Actions

Also available in: Atom PDF