Project

General

Profile

Bug #12264

calamari:sees servers but no cluster

Added by Anthony Alba over 5 years ago. Updated about 4 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

  1. working ceph cluster 0.94.2
  2. installed (rebuilt from RHCEPH)
    calamari-clients-1.3-2.el7.centos.x86_64
    calamari-server-1.3-7.el7.centos.x86_64
    diamond-3.4.67-4.el7.centos.noarch
  3. salt-2014.7.5-1.el7.noarch
  4. /graphite/dashboard shows data
  5. salt '*' ceph.get_heartbeats works

Calamari dashboard shows
"Ceph servers are connected to Calamari, but no Ceph cluster has been created yet"

History

#1 Updated by Anthony Alba over 5 years ago

1. What event causes calamari to actually recognize a cluster?

2. salt-run state.event pretty=True shows lots of events from all the minions
The minions are running the 10s schedule job

3. cthulhu reports:
  1. _register_service looks encouraging
  2. but ignoring ceph/server looks bad

2015-07-10 09:09:42,038 - DEBUG - cthulhu TopLevelEvents: ignoring ceph/server
2015-07-10 09:09:46,318 - DEBUG - cthulhu TopLevelEvents: ignoring salt/auth
2015-07-10 09:09:46,420 - DEBUG - cthulhu TopLevelEvents: ignoring ceph/server
2015-07-10 09:09:46,421 - DEBUG - cthulhu.server_monitor ServerMonitor got ceph/server message from gordon1.tbs
2015-07-10 09:09:46,421 - DEBUG - cthulhu.server_monitor ServerMonitor.on_server_heartbeat: gordon1.tbs
2015-07-10 09:09:46,421 - DEBUG - cthulhu.server_monitor ServerMonitor._register_service: ServiceId(fsid='a2ca2b7f-4807-471b-80af-fc9778809bc9', service_type='osd', service_id='1')
2015-07-10 09:09:46,421 - DEBUG - cthulhu.server_monitor ServerMonitor._register_service: ServiceId(fsid='a2ca2b7f-4807-471b-80af-fc9778809bc9', service_type='osd', service_id='0')
2015-07-10 09:09:46,421 - DEBUG - cthulhu.server_monitor ServerMonitor._register_service: ServiceId(fsid='a2ca2b7f-4807-471b-80af-fc9778809bc9', service_type='mon', service_id='gordon1')
2015-07-10 09:09:50,743 - DEBUG - cthulhu utime: 11.383037
2015-07-10 09:09:50,743 - DEBUG - cthulhu stime: 1.089565
2015-07-10 09:09:50,744 - DEBUG - cthulhu maxrss: 90948
2015-07-10 09:09:50,744 - DEBUG - cthulhu ixrss: 0
2015-07-10 09:09:50,744 - DEBUG - cthulhu idrss: 0
2015-07-10 09:09:50,744 - DEBUG - cthulhu isrss: 0
2015-07-10 09:09:50,744 - DEBUG - cthulhu minflt: 32151
2015-07-10 09:09:50,745 - DEBUG - cthulhu majflt: 0
2015-07-10 09:09:50,745 - DEBUG - cthulhu nswap: 0
2015-07-10 09:09:50,745 - DEBUG - cthulhu inblock: 0
2015-07-10 09:09:50,745 - DEBUG - cthulhu oublock: 3584
2015-07-10 09:09:50,745 - DEBUG - cthulhu msgsnd: 0
2015-07-10 09:09:50,745 - DEBUG - cthulhu msgrcv: 0
2015-07-10 09:09:50,745 - DEBUG - cthulhu nsignals: 0
2015-07-10 09:09:50,746 - DEBUG - cthulhu nvcsw: 47257
2015-07-10 09:09:50,746 - DEBUG - cthulhu nivcsw: 16
2015-07-10 09:09:50,757 - DEBUG - cthulhu Eventer.on_tick

#2 Updated by Anthony Alba over 5 years ago

Major oddness:

#. If I manually poke cthulhu from a minion

salt-call ceph.heartbeat

(even though the minion is already running this job at 10s intervals)

Suddenly calamari recognises 3 mons/3 osds (there should be 6 osds)

2015-07-10 09:13:47,826 - DEBUG - cthulhu.request_collection on_completion: jid=201507100
91347589514 data={'fun_args': [{'cluster_name': 'ceph', 'since': None, 'sync_type': 'pg_s
ummary'}], 'jid': '20150710091347589514', 'return': 'The minion function caused an except
ion: Traceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/salt/mi
nion.py", line 1020, in _thread_return\n return_data = func(*args, **kwargs)\n File "
/var/cache/salt/minion/extmods/modules/ceph.py", line 349, in get_cluster_object\n clu
ster_handle.connect()\n File "/usr/lib/python2.7/site-packages/rados.py", line 429, in connect\n raise make_ex(ret, "error connecting to the cluster")\nPermissionError: error
connecting to the cluster\n', 'success': False, 'cmd': '_return', '_stamp': '2015-07-10T01:13:47.822836', 'fun': 'ceph.get_cluster_object', 'id': 'gordon3.tbs', 'out': 'nested'}
2015-07-10 09:13:47,826 - WARNING - cthulhu.request_collection on_completion: unknown jid 20150710091347589514, return: The minion function caused an exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1020, in _thread_return
return_data = func(*args, **kwargs)
File "/var/cache/salt/minion/extmods/modules/ceph.py", line 349, in get_cluster_object
cluster_handle.connect()
File "/usr/lib/python2.7/site-packages/rados.py", line 429, in connect
raise make_ex(ret, "error connecting to the cluster")
PermissionError: error connecting to the cluster

This also doesn't seem right at the minion doesnt' throw an exception.

  1. salt-call ceph.heartbeat
    [DEBUG ] Configuration file path: /etc/salt/minion
    [DEBUG ] Reading configuration from /etc/salt/minion
    [DEBUG ] Decrypting the current master AES key
    [DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
    [DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
    [DEBUG ] Mako not available
    [INFO ] Executing command "repoquery --plugins --queryformat '%{NAME}_|-%{VERSION}_|-%{RELEASE}_|-%{ARCH}_|-%{REPOID}' --all --pkgnarrow=installed" in directory '/root'
    [DEBUG ] MinionEvent PUB socket URI: ipc:///var/run/salt/minion/minion_event_713454f298_pub.ipc
    [DEBUG ] MinionEvent PULL socket URI: ipc:///var/run/salt/minion/minion_event_713454f298_pull.ipc
    [DEBUG ] Sending event - data = {'pretag': None, '_stamp': '2015-07-10T01:13:47.236751', 'tag': 'ceph/server', 'data': {'services': {'ceph-osd.5': {'status': None, 'cluster': 'ceph', 'version': u'0.94.2', 'type': 'osd', 'id': '5', 'fsid': u'a2ca2b7f-4807-471b-80af-fc9778809bc9'}, 'ceph-osd.4': {'status': None, 'cluster': 'ceph', 'version': u'0.94.2', 'type': 'osd', 'id': '4', 'fsid': u'a2ca2b7f-4807-471b-80af-fc9778809bc9'}, 'ceph-mon.gordon3': {'status': {u'election_epoch': 12, u'name': u'gordon3', u'outside_quorum': [], u'rank': 2, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'gordon1', u'rank': 0, u'addr': u'10.88.20.164:6789/0'}, {u'name': u'gordon2', u'rank': 1, u'addr': u'10.88.20.165:6789/0'}, {u'name': u'gordon3', u'rank': 2, u'addr': u'10.88.20.166:6789/0'}], u'modified': u'0.000000', u'fsid': u'a2ca2b7f-4807-471b-80af-fc9778809bc9', u'created': u'0.000000'}, u'state': u'peon', u'extra_probe_peers': [], u'sync_provider': [], u'quorum': [0, 1, 2]}, 'cluster': 'ceph', 'version': u'0.94.2', 'type': 'mon', 'id': 'gordon3', 'fsid': u'a2ca2b7f-4807-471b-80af-fc9778809bc9'}}, 'boot_time': 1436233118, 'ceph_version': '0.94.2-0.el7.centos'}, 'events': None}
    [DEBUG ] MinionEvent PUB socket URI: ipc:///var/run/salt/minion/minion_event_713454f298_pub.ipc
    [DEBUG ] MinionEvent PULL socket URI: ipc:///var/run/salt/minion/minion_event_713454f298_pull.ipc
    [DEBUG ] Sending event - data = {'pretag': None, '_stamp': '2015-07-10T01:13:47.263980', 'tag': 'ceph/cluster/a2ca2b7f-4807-471b-80af-fc9778809bc9', 'data': {'name': 'ceph', 'fsid': u'a2ca2b7f-4807-471b-80af-fc9778809bc9', 'versions': {'osd_map': 62, 'pg_summary': '48bdf5f4f800481bb2904a48b1098872', 'mds_map': 1, 'mon_status': 12, 'health': '90ce2e186bcbe148d30143c6b0edb5db', 'mon_map': 1, 'config': 'b337c6192fa275b4ccdb94355041d17d'}}, 'events': None}
    [DEBUG ] Decrypting the current master AES key
    [DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
    local:
    None

After this the UI is sorta working: lots of Server 500 errors

#3 Updated by Anthony Alba over 5 years ago

The scheduled 10s jobs are not accepted as 'updates'
Only manually poking salt-call ceph.heartbeat from a minion works.

Also from calamari.log:

2015-07-10 09:19:44,119 - DEBUG - cthulhu RpcInterface >> get_sync_object(('a2ca2b7f-4807
-471b-80af-fc9778809bc9', 'osd_map'), {})
response = self.handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/rpc_view.py", line 108, in handle_exception
return super(RPCViewSet, self).handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 396
, in dispatch
response = handler(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 417, in get
osds, osds_by_pg_state = self.generate(pg_summary, osd_map, server_info, servers)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 369, in generate
for osd_id, osd_pg_summary in pg_summary['by_osd'].items():
TypeError: 'NoneType' object has no attribute '__getitem__'
2015-07-09 20:21:07,411 - ERROR - django.request Internal Server Error: /api/v1/cluster/a
2ca2b7f-4807-471b-80af-fc9778809bc9/health_counters
Traceback (most recent call last):
File "/opt/calamari/venv/lib/python2.7/site-packages/django/core/handlers/base.py", lin
e 115, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/viewsets.py", line
78, in view
return self.dispatch(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/rpc_view.py", line 91, in dispatch
return super(RPCViewSet, self).dispatch(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py",
line 77, in wrapped_view
return view_func(*args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 399
, in dispatch
response = self.handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/rpc_view.py", line 108, in handle_exception
return super(RPCViewSet, self).handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 396
, in dispatch
response = handler(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 417, in get
osds, osds_by_pg_state = self.generate(pg_summary, osd_map, server_info, servers)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 369, in generate
for osd_id, osd_pg_summary in pg_summary['by_osd'].items():
TypeError: 'NoneType' object has no attribute '__getitem__'
2015-07-09 20:21:07,411 - ERROR - django.request Internal Server Error: /api/v1/cluster/a
2ca2b7f-4807-471b-80af-fc9778809bc9/health_counters
Traceback (most recent call last):
File "/opt/calamari/venv/lib/python2.7/site-packages/django/core/handlers/base.py", lin
e 115, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/viewsets.py", line
78, in view
return self.dispatch(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/rpc_view.py", line 91, in dispatch
return super(RPCViewSet, self).dispatch(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py",
line 77, in wrapped_view
return view_func(*args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 399
, in dispatch
response = self.handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/rpc_view.py", line 108, in handle_exception
return super(RPCViewSet, self).handle_exception(exc)
File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 396
, in dispatch
response = handler(request, *args, **kwargs)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 315, in get
counters = self.generate(osd_data, mds_data, mon_status, pg_summary)
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 166, in generate
'osd': cls._calculate_osd_counters(osd_map),
File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/ca
lamari_rest/views/v1.py", line 254, in calculate_osd_counters
osds = osd_map['osds']
TypeError: 'NoneType' object has no attribute '
_getitem__'

#4 Updated by Anthony Alba over 5 years ago

Is this a salt thing? Should I revert to salt.2014.1?

#5 Updated by Anthony Alba over 5 years ago

If I redo the setup and let Calamari accept the minions, the UI hangs until I manually poke it with 'salt-call ceph.heartbeat' from a minion.

#6 Updated by Anthony Alba over 5 years ago

It now works by reverting Salt to 2014.1.5.

Is this the blessed version?

#7 Updated by Gregory Meno over 5 years ago

Anthony,

That was a great idea to call the salt-master.
In general we test with salt 2014.1.5 so blessed I suppose.
I've had working instances with 2014.7.Z perhaps something has changed that I'm not aware of.

I will try to reproduce your findings today.
Thank you for providing the package versions you were running!

#8 Updated by Martin Palma about 5 years ago

Any updates here? We are having a hard time to setup calamari. We are getting the same error as described above.

#9 Updated by Brian Andrus about 5 years ago

Martin Palma wrote:

Any updates here? We are having a hard time to setup calamari. We are getting the same error as described above.

Did you try matching Salt versions as Anthony has done? That was a workaround for him. We have not been provided with information as to how salt was installed, so I'm not sure there is a bug to fix here.

#10 Updated by Martin Palma about 5 years ago

I tried and used salt 2014.7, but always the same, error.

#11 Updated by Martin Palma about 5 years ago

Ok the problem is related to salt. It works with a salt version of 2014.1.x. I found the salt packages on rpmfind.net (used the salt--2014.1.13-1.fc21.noarch.rpm) and it works now.

The ode thing is know to find a way to build my own salt packages then in production I don't wanna use some packages downloaded from some sources. Anyway with salt 2014.1.13 works with 2014.7 and 2015.5.5 not.

#12 Updated by Gregory Meno about 5 years ago

I can understand not wanting to use just any packages.

I am unable to reproduce this behavior.

If you can share steps to reproduce it'd would help in getting this fixed.

#13 Updated by Martin Palma about 5 years ago

Gregory Meno wrote:

If you can share steps to reproduce it'd would help in getting this fixed.

Currently I'm testing all on my local machine running everything in vagrant VMs using the box 'bento/centos-7.1'. It is composted of:
  • 1 monitore node
  • 2 osd nodes (each node with 3 x 10 GB disk besides the system disk)
  • 1 admin/mgmt node which also is the calamari server

Ceph was deploy using 'ceph-deploy' and it runs the latest Hammer release and ceph health gives HEALTH_OK

I build the calamari-server, calamari-client and diamond packages by my self according to the steps listed on the page here, using the provided vagrant boxes.

Installation steps on calamari server
sudo yum install calamari-server-1.3.0.1-139_g945d16a.el7.centos.x86_64.rpm
sudo yum install calamari-clients-1.2.2-34_g8bea195.el7.centos.x86_64.rpm
sudo yum install diamond-3.4.582-0.noarch.rpm
sudo calamari-ctl initialize

Next I fix the permission of /var/log/calamari by sudo chown apache:apache /var/log/calamari/* otherwise I get an Internal Server Error.

Then I login into calamari web ui and I get the following: New Calamari Installation...This appears to be the first time you have started Calamari and there are no clusters currently configured....

Connecting the Nodes

I copy the diamond-3.4.582-0.noarch.rpm package on every node and execute the following commands:

sudo yum install diamond-3.4.582-0.noarch.rpm
sudo yum install salt-minion

I then edit the /etc/salt/minion file and add the hostname of my calamari server as salt master. After that I restart salt-minion and diamond

When I go back to the web gui of calamari and do a refresh I see the exact message as already mention in this issure "Ceph servers are connected to Calamari, but no Ceph cluster has been created yet"

Running version

CALAMARI SERVER

rpm -qa | grep -i calamari
calamari-clients-1.2.2-34_g8bea195.el7.centos.x86_64
calamari-server-1.3.0.1-139_g945d16a.el7.centos.x86_64

SALT ON CALAMARI SERVER

rpm -qa | grep -i salt
salt-minion-2015.5.5-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-2015.5.5-1.el7.noarch

SALT ON CLIENTS

rpm -qa | grep -i salt
salt-minion-2015.5.5-1.el7.noarch
salt-2015.5.5-1.el7.noarch

Hope this helps.

#14 Updated by Benoit Petit almost 5 years ago

Hi,

My situation was:

- calamari server: Ubuntu 14.04 LTS
- ceph nodes: 3x Centos 7

The workaround that worked for me was to avoid installing calamari thanks to ceph-deploy.

As said here: http://calamari.readthedocs.org/en/latest/operations/minion_connect.html

"Note: Calamari does not currently support 2015.5 salt please consider using 2014.7 instead Also it is important for salt-master and minion versions to match."

So I installed salt-master v2014-7 on my Calamari server thanks to the ppa repo ppa:saltstack/salt2014-7 and salt-minion v2014-7 from rpm packages directly on my ceph nodes (not clean but it was for debugging purposes...).

Reinitialized calamari and now it's working.

I now have:

salt-master --version
salt-master 2014.7.5 (Helium)

salt-minion --version
salt-minion 2014.7.0 (Helium)

Hope this helps.

#15 Updated by Ian Colle about 4 years ago

  • Status changed from New to Won't Fix

Also available in: Atom PDF