Project

General

Profile

Actions

Bug #38806

closed

ceph mgr/ansible module doesn't work

Added by 一帆 师 about 5 years ago. Updated over 4 years ago.

Status:
Won't Fix
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I run ansible_runner_service as :

[root@node6 ansible-runner-service-master]# python3 ansible_runner_service.py 
Starting ansible-runner-service
Analysing local configuration options from ./config.yaml
/root/ansible-runner-service-master/runner_service/configuration.py:91: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  local_config = yaml.load(_cfg.read())
- setting passwords to {'admin': 'admin'}
- setting token_secret to secret
- setting token_hours to 24
Analysing runtime overrides from environment variables
No configuration settings overridden
2019-03-19 04:34:52,012 - root - INFO - Loaded logging configuration from ./logging.yaml
2019-03-19 04:34:52,012 - root - INFO - Run mode is: dev
2019-03-19 04:34:52,012 - runner_service.utils - DEBUG - Checking svctoken
2019-03-19 04:34:52,013 - runner_service.utils - INFO - svctoken exists, and is valid
2019-03-19 04:34:52,013 - root - INFO - SSH keys present in ./samples/env
2019-03-19 04:34:52,013 - runner_service.utils - DEBUG - Checking for the SSL keys in .
2019-03-19 04:34:52,013 - runner_service.utils - INFO - Using existing SSL files in .
 * Serving Flask app "runner_service" (lazy loading)
 * Environment: 
 * Debug mode: on
2019-03-19 04:34:52,041 - werkzeug - INFO -  * Running on https://0.0.0.0:5001/ (Press CTRL+C to quit)

and then

ceph mgr module enable ansible

and then set the mgr/ansible config and set the orchestrator backend as follows:

[root@node6 ceph-ansible-4.0.0beta1]# ceph config dump
WHO   MASK LEVEL    OPTION                            VALUE                    RO 
  mgr      advanced mgr/ansible/password              admin                    *  
  mgr      advanced mgr/ansible/server_url            http://192.168.10.3:5001 *  
  mgr      advanced mgr/ansible/username              admin                    *  
  mgr      advanced mgr/ansible/verify_server         true                     *  
  mgr      advanced mgr/orchestrator_cli/orchestrator ansible                  *  
[root@node6 ceph-ansible-4.0.0beta1]# 

the I try the to show devices,but there is an error. is there any step I lost or the "http://docs.ceph.com/docs/master/mgr/ansible/" docs is not corroct ?

[root@node6 ceph-ansible-4.0.0beta1]# ceph orchestrator device ls
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 864, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 336, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 21, in inner
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 79, in _list_devices
    completion = self.get_inventory(node_filter=nf, refresh=refresh)
  File "/usr/share/ceph/mgr/orchestrator.py", line 806, in inner
    return self._oremote(method_name, args, kwargs)
  File "/usr/share/ceph/mgr/orchestrator.py", line 830, in _oremote
    return self.remote(o, meth, *args, **kwargs)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1220, in remote
    args, kwargs)
RuntimeError: Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/ansible/module.py", line 388, in get_inventory
    self._launch_operation(ansible_operation)
  File "/usr/share/ceph/mgr/ansible/module.py", line 528, in _launch_operation
    ansible_operation.execute_playbook()
  File "/usr/share/ceph/mgr/ansible/module.py", line 126, in execute_playbook
    self.pb_execution.launch()
  File "/usr/share/ceph/mgr/ansible/ansible_runner_svc.py", line 75, in launch
    response = self.rest_client.http_post(endpoint,
AttributeError: 'NoneType' object has no attribute 'http_post'

any other orchestrator operation throws the same error either.

Actions #1

Updated by Sebastian Wagner about 5 years ago

  • Assignee set to Juan Miguel Olmo Martínez
  • Priority changed from Normal to High
Actions #2

Updated by Sebastian Wagner about 5 years ago

  • Category changed from ceph-mgr to orchestrator
Actions #3

Updated by Juan Miguel Olmo Martínez about 5 years ago

Probably the issue is caused by the value set in the setting "verify_server". This setting is used to verify the Ansible Runner Service server's TLS certificate.

Please change it to false:

# ceph config set mgr mgr/ansible/verify_server false

and restart the ansible orchstrator module:

# ceph mgr module disable ansible
# ceph mgr module enable ansible

It will be very useful to watch the manager log while the Ansible orchestrator restart is on going, because if the problem is different from the one pointed here, we will have very useful information about the root cause of the problem in the log lines that appears while the load of the orchestrator module is taking place.

Please follow the procedure suggested and report the result.

Actions #4

Updated by 一帆 师 about 5 years ago

Juan Miguel Olmo Martínez wrote:

Probably the issue is caused by the value set in the setting "verify_server". This setting is used to verify the Ansible Runner Service server's TLS certificate.

Please change it to false:

[...]

and restart the ansible orchstrator module:

[...]

It will be very useful to watch the manager log while the Ansible orchestrator restart is on going, because if the problem is different from the one pointed here, we will have very useful information about the root cause of the problem in the log lines that appears while the load of the orchestrator module is taking place.

Please follow the procedure suggested and report the result.

now I used the https.

[root@node6 ansible-runner-service-master]# python3 ansible_runner_service.py 
Starting ansible-runner-service
Analysing local configuration options from ./config.yaml
/root/ansible-runner-service-master/runner_service/configuration.py:91: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  local_config = yaml.load(_cfg.read())
- setting passwords to {'admin': 'admin'}
- setting token_secret to secret
- setting token_hours to 24
Analysing runtime overrides from environment variables
No configuration settings overridden
2019-03-21 22:08:07,443 - root - INFO - Loaded logging configuration from ./logging.yaml
2019-03-21 22:08:07,443 - root - INFO - Run mode is: dev
2019-03-21 22:08:07,443 - runner_service.utils - DEBUG - Checking svctoken
2019-03-21 22:08:07,444 - runner_service.utils - INFO - svctoken created
2019-03-21 22:08:07,444 - root - DEBUG - No SSH keys present in ./samples/env
2019-03-21 22:08:07,444 - root - INFO - Creating SSH keys
2019-03-21 22:08:08,313 - runner_service.utils - INFO - Created SSH public key @ './samples/env/ssh_key.pub'
2019-03-21 22:08:08,313 - runner_service.utils - INFO - Created SSH private key @ './samples/env/ssh_key'
2019-03-21 22:08:08,314 - runner_service.utils - DEBUG - Checking for the SSL keys in .
2019-03-21 22:08:08,314 - runner_service.utils - INFO - Existing SSL files not found in .
2019-03-21 22:08:08,314 - runner_service.utils - INFO - Self-signed cert will be created - expiring in 3 years
2019-03-21 22:08:08,425 - runner_service.utils - DEBUG - Writing crt file to ./ansible_runner_service.crt
2019-03-21 22:08:08,425 - runner_service.utils - DEBUG - Writing key file to ./ansible_runner_service.key
 * Serving Flask app "runner_service" (lazy loading)
 * Environment: 
 * Debug mode: on
2019-03-21 22:08:08,442 - werkzeug - INFO -  * Running on https://0.0.0.0:5001/ (Press CTRL+C to quit)


  mgr      advanced mgr/ansible/password                                admin                                    *  
  mgr      advanced mgr/ansible/server_url                              192.168.10.3:5001                        *  
  mgr      advanced mgr/ansible/username                                admin                                    *  
  mgr      advanced mgr/ansible/verify_server                           true 
[root@node6 ~]# ceph orchestrator status
Backend: ansible
Available: True
[root@node6 ~]# 

the same error is still here.


[root@node6 ~]# ceph orchestrator device ls
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 864, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 336, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 21, in inner
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 79, in _list_devices
    completion = self.get_inventory(node_filter=nf, refresh=refresh)
  File "/usr/share/ceph/mgr/orchestrator.py", line 806, in inner
    return self._oremote(method_name, args, kwargs)
  File "/usr/share/ceph/mgr/orchestrator.py", line 830, in _oremote
    return self.remote(o, meth, *args, **kwargs)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1220, in remote
    args, kwargs)
RuntimeError: Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/ansible/module.py", line 388, in get_inventory
    self._launch_operation(ansible_operation)
  File "/usr/share/ceph/mgr/ansible/module.py", line 528, in _launch_operation
    ansible_operation.execute_playbook()
  File "/usr/share/ceph/mgr/ansible/module.py", line 126, in execute_playbook
    self.pb_execution.launch()
  File "/usr/share/ceph/mgr/ansible/ansible_runner_svc.py", line 75, in launch
    response = self.rest_client.http_post(endpoint,
AttributeError: 'NoneType' object has no attribute 'http_post'

Actions #5

Updated by 一帆 师 about 5 years ago

I am sorry, I will try to set it to false. and try again

Actions #6

Updated by 一帆 师 about 5 years ago

Juan Miguel Olmo Martínez wrote:

Probably the issue is caused by the value set in the setting "verify_server". This setting is used to verify the Ansible Runner Service server's TLS certificate.

Please change it to false:

[...]

and restart the ansible orchstrator module:

[...]

It will be very useful to watch the manager log while the Ansible orchestrator restart is on going, because if the problem is different from the one pointed here, we will have very useful information about the root cause of the problem in the log lines that appears while the load of the orchestrator module is taking place.

Please follow the procedure suggested and report the result.

[root@node6 ~]# ceph config set mgr mgr/ansible/server_url https://192.168.10.3:5001
[root@node6 ~]# ceph config set mgr mgr/ansible/username admin
[root@node6 ~]# ceph config set mgr mgr/ansible/password admin
[root@node6 ~]# ceph config set mgr mgr/ansible/verify_server false
[root@node6 ~]# ceph orchestrator device ls
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 864, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 336, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 21, in inner
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 79, in _list_devices
    completion = self.get_inventory(node_filter=nf, refresh=refresh)
  File "/usr/share/ceph/mgr/orchestrator.py", line 806, in inner
    return self._oremote(method_name, args, kwargs)
  File "/usr/share/ceph/mgr/orchestrator.py", line 830, in _oremote
    return self.remote(o, meth, *args, **kwargs)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1220, in remote
    args, kwargs)
RuntimeError: Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/ansible/module.py", line 388, in get_inventory
    self._launch_operation(ansible_operation)
  File "/usr/share/ceph/mgr/ansible/module.py", line 528, in _launch_operation
    ansible_operation.execute_playbook()
  File "/usr/share/ceph/mgr/ansible/module.py", line 126, in execute_playbook
    self.pb_execution.launch()
  File "/usr/share/ceph/mgr/ansible/ansible_runner_svc.py", line 75, in launch
    response = self.rest_client.http_post(endpoint,
AttributeError: 'NoneType' object has no attribute 'http_post'

[root@node6 ~]# 

Actions #7

Updated by 一帆 师 about 5 years ago

一帆 师 wrote:

Juan Miguel Olmo Martínez wrote:

Probably the issue is caused by the value set in the setting "verify_server". This setting is used to verify the Ansible Runner Service server's TLS certificate.

Please change it to false:

[...]

and restart the ansible orchstrator module:

[...]

It will be very useful to watch the manager log while the Ansible orchestrator restart is on going, because if the problem is different from the one pointed here, we will have very useful information about the root cause of the problem in the log lines that appears while the load of the orchestrator module is taking place.

Please follow the procedure suggested and report the result.

[...]

I trid to delete the https,as follows,

same error.

[root@node6 ~]# ceph config set mgr mgr/ansible/server_url 192.168.10.3:5001
[root@node6 ~]# ceph config set mgr mgr/ansible/username admin
[root@node6 ~]# ceph config set mgr mgr/ansible/password admin
[root@node6 ~]# ceph config set mgr mgr/ansible/verify_server false
[root@node6 ~]# ceph orchestrator device ls
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 864, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 336, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 21, in inner
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 79, in _list_devices
    completion = self.get_inventory(node_filter=nf, refresh=refresh)
  File "/usr/share/ceph/mgr/orchestrator.py", line 806, in inner
    return self._oremote(method_name, args, kwargs)
  File "/usr/share/ceph/mgr/orchestrator.py", line 830, in _oremote
    return self.remote(o, meth, *args, **kwargs)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1220, in remote
    args, kwargs)
RuntimeError: Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/ansible/module.py", line 388, in get_inventory
    self._launch_operation(ansible_operation)
  File "/usr/share/ceph/mgr/ansible/module.py", line 528, in _launch_operation
    ansible_operation.execute_playbook()
  File "/usr/share/ceph/mgr/ansible/module.py", line 126, in execute_playbook
    self.pb_execution.launch()
  File "/usr/share/ceph/mgr/ansible/ansible_runner_svc.py", line 75, in launch
    response = self.rest_client.http_post(endpoint,
AttributeError: 'NoneType' object has no attribute 'http_post'

[root@node6 ~]# 

Actions #8

Updated by Juan Miguel Olmo Martínez about 5 years ago

Sorry for the delay answering.

The problem seems to be related with the "connection" with the external Ansible Runner Service. The logs that you provide are related to the error when we try to execute one operation, "device ls". and do not provide us the explanation of the root cause of the problem, just the error linked directly with the operation.

We need the logs that appear in the moment that you enable the ansible orchestrator module, because in these lines we can find the root cause of the problem.

Please follow this steps:

1. Disable the ansible module

# ceph mgr module disable ansible

2. Open in a different terminal the log of the manager. This will help us to see what is happening when you "enable" the ansible module, and all the configuration variables values are used.

3. Set the configuration (i put the same ansible runner service url and credentials used by you in previous posts)

# ceph config set mgr mgr/ansible/server_url 192.168.10.3:5001
# ceph config set mgr mgr/ansible/username admin
# ceph config set mgr mgr/ansible/password admin
# ceph config set mgr mgr/ansible/verify_server false

4. Set Ansible orchestrator as backend module

# ceph orchestrator set backend ansible

# ceph orchestrator status
Backend: ansible

4. Enable the ansible module. This will cause that all the configuration settings will be applied. In this moment is when we must check the manager log to see if some kind of error appears

# ceph mgr module enable ansible

5.If no errors appear in manager log, the operation to get the device inventory must work.

# ceph orchestrator device ls
Actions #9

Updated by 一帆 师 about 5 years ago

Juan Miguel Olmo Martínez wrote:

Sorry for the delay answering.

The problem seems to be related with the "connection" with the external Ansible Runner Service. The logs that you provide are related to the error when we try to execute one operation, "device ls". and do not provide us the explanation of the root cause of the problem, just the error linked directly with the operation.

We need the logs that appear in the moment that you enable the ansible orchestrator module, because in these lines we can find the root cause of the problem.

Please follow this steps:

1. Disable the ansible module

[...]

2. Open in a different terminal the log of the manager. This will help us to see what is happening when you "enable" the ansible module, and all the configuration variables values are used.

3. Set the configuration (i put the same ansible runner service url and credentials used by you in previous posts)

[...]

4. Set Ansible orchestrator as backend module

[...]

4. Enable the ansible module. This will cause that all the configuration settings will be applied. In this moment is when we must check the manager log to see if some kind of error appears

[...]

5.If no errors appear in manager log, the operation to get the device inventory must work.

[...]

what log should I look for when I do the step 2, there is no any error throws when I enabled the ansible.

Actions #10

Updated by Juan Miguel Olmo Martínez about 5 years ago

The manager log is placed by default in the folder "/var/log/ceph" in the server where the active manager daemon is running.

Example: In my "Octopus" ceph cluster installed with ceph-ansible:

1. Locate where is your active manager daemon running.

[vagrant@mon0 ~]$ ceph status
  cluster:
    id:     30d61f3e-7ee4-4bdc-8fe7-2ad5bb3f5317
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum mon0 (age 34m)
    mgr: mgr0(active, since 31m)                     <------ Manager daemon placement
    osd: 4 osds: 4 up (since 3m), 4 in (since 3m)

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   4.0 GiB used, 192 GiB / 196 GiB avail
    pgs:     

2. Connect to the machine where your manager is running. (It could be the same machine where you are running the monitor daemon).

In this example the machine is "mgr0" 

...
mgr: mgr0(active, since 31m) 
...

3. The manager daemon log file is placed by default in the folder "/var/log/ceph". Probably you will need "root" rights in order to access this file:

[vagrant@mgr0]$ pwd
[vagrant@mgr0 ~]$ sudo su -
[root@mgr0 ~]# cd /var/log/ceph/
[root@mgr0 ceph]# ls
ceph-mgr.mgr0.log   <------------------------  This is the manager log file

4. Watch the file when you are enabling the ansible orchestrator in the monitor server. You will see something like:


[root@mgr0 ceph]# tail -f ceph-mgr.mgr0.log
2019-03-26 10:24:08.961 7faea2213b80  0 ceph version 14.2.0-507-g049ce9f (049ce9f40b8e6ec1dfcc77266d5d7035e6154140) octopus (dev), process ceph-mgr, pid 4973
2019-03-26 10:24:08.961 7faea2213b80  0 pidfile_write: ignore empty --pid-file
2019-03-26 10:24:08.995 7faea2213b80  1 mgr[py] Loading python module 'ansible'
2019-03-26 10:24:09.082 7faea2213b80  1 mgr[py] Loading python module 'balancer'
2019-03-26 10:24:09.097 7faea2213b80  1 mgr[py] Loading python module 'crash'
2019-03-26 10:24:09.111 7faea2213b80  1 mgr[py] Loading python module 'deepsea'
2019-03-26 10:24:09.185 7faea2213b80  1 mgr[py] Loading python module 'devicehealth'
2019-03-26 10:24:09.200 7faea2213b80  1 mgr[py] Loading python module 'influx'
2019-03-26 10:24:09.213 7faea2213b80  1 mgr[py] Loading python module 'insights'
2019-03-26 10:24:09.228 7faea2213b80  1 mgr[py] Loading python module 'iostat'
2019-03-26 10:24:09.241 7faea2213b80  1 mgr[py] Loading python module 'localpool'
2019-03-26 10:24:09.270 7faea2213b80  1 mgr[py] Loading python module 'orchestrator_cli'
2019-03-26 10:24:09.287 7faea2213b80  1 mgr[py] Loading python module 'pg_autoscaler'
2019-03-26 10:24:09.328 7faea2213b80  1 mgr[py] Loading python module 'progress'
2019-03-26 10:24:09.358 7faea2213b80  1 mgr[py] Loading python module 'prometheus'
2019-03-26 10:24:09.418 7faea2213b80  1 mgr[py] Loading python module 'rbd_support'
2019-03-26 10:24:09.433 7faea2213b80  1 mgr[py] Loading python module 'restful'
2019-03-26 10:24:09.555 7faea2213b80  1 mgr[py] Loading python module 'selftest'
2019-03-26 10:24:09.569 7faea2213b80  1 mgr[py] Loading python module 'status'
2019-03-26 10:24:09.626 7faea2213b80  1 mgr[py] Loading python module 'telegraf'
2019-03-26 10:24:09.650 7faea2213b80  1 mgr[py] Loading python module 'telemetry'
2019-03-26 10:24:09.727 7faea2213b80  1 mgr[py] Loading python module 'test_orchestrator'
2019-03-26 10:24:09.759 7faea2213b80  1 mgr[py] Loading python module 'volumes'
2019-03-26 10:24:09.796 7faea2213b80  1 mgr[py] Loading python module 'zabbix'
2019-03-26 10:24:09.814 7fae8dad7700  0 ms_deliver_dispatch: unhandled message 0x5596879afc00 mon_map magic: 0 v1 from mon.0 v2:192.168.42.10:3300/0
2019-03-26 10:24:09.908 7fae8dad7700  1 mgr handle_mgr_map Activating!
2019-03-26 10:24:09.909 7fae8dad7700  1 mgr handle_mgr_map I am now activating
2019-03-26 10:24:09.928 7fae7f6f4700  1 mgr load Constructed class from module: ansible
2019-03-26 10:24:09.929 7fae7f6f4700  1 mgr load Constructed class from module: balancer
2019-03-26 10:24:09.930 7fae7f6f4700  1 mgr load Constructed class from module: crash
2019-03-26 10:24:09.930 7fae7f6f4700  1 mgr load Constructed class from module: devicehealth
2019-03-26 10:24:09.931 7fae7f6f4700  1 mgr load Constructed class from module: orchestrator_cli
2019-03-26 10:24:09.931 7fae7f6f4700  1 mgr load Constructed class from module: progress
2019-03-26 10:24:09.932 7fae7f6f4700  1 mgr load Constructed class from module: status
2019-03-26 10:24:09.932 7fae7f6f4700  1 mgr load Constructed class from module: volumes

IN THIS CASE I HAVE FORCED AN ERROR TO SHOW YOU THE "LOOK" OF AN ERROR INITIALIZING THE ANSIBLE ORCHESTRATOR MODULE

2019-03-26 10:24:09.969 7fae7d6f0700  0 mgr[ansible] login error <<https://192.168.121.1:5001/api/v1/login>> (400):<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
<hr><center>nginx/1.12.2</center>
</body>
</html>

Actions #11

Updated by Sage Weil over 4 years ago

  • Status changed from New to Won't Fix
Actions #12

Updated by Sage Weil over 4 years ago

  • Project changed from mgr to Orchestrator
  • Category deleted (orchestrator)
Actions

Also available in: Atom PDF