Project

General

Profile

Actions

Bug #64712

closed

the node-proxy daemon fails to send data to the mgr endpoint

Added by Guillaume Abrioux 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When the node-proxy daemon tries to send its data to the mgr endpoint, it fails with a 500 Error.

Typical failure:

Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,049 - ceph_node_proxy.redfish_client - INFO - Initializing redfish client ceph_node_proxy.redfish_client
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,050 - ceph_node_proxy.baseredfishsystem - INFO - redfish system initialization, host: 169.254.1.1, user: root
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,051 - ceph_node_proxy.util - INFO - Starting RedfishDellSystem
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,051 - ceph_node_proxy.reporter - INFO - Reporter url set to https://10.250.2.12:7150/node-proxy/data
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,051 - ceph_node_proxy.redfish_client - INFO - Logging in to https://169.254.1.1:443 as 'root'
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,052 - ceph_node_proxy.util - INFO - Starting Reporter
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,052 - ceph_node_proxy.main - INFO - Starting node-proxy API...
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,053 - ceph_node_proxy.api - INFO - node-proxy API configuration...
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: [01/Mar/2024:18:22:22] ENGINE Bus STARTING
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: [01/Mar/2024:18:22:22] ENGINE Serving on https://0.0.0.0:9456
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: [01/Mar/2024:18:22:22] ENGINE Bus STARTED
Mar 01 13:22:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:22:22,276 - ceph_node_proxy.api - INFO - node-proxy API started.
Mar 01 13:30:20 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:30:20,991 - ceph_node_proxy.reporter - INFO - data has changed since last iteration.
Mar 01 13:30:20 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:30:20,992 - ceph_node_proxy.reporter - INFO - sending data to https://10.250.2.12:7150/node-proxy/data
Mar 01 13:30:21 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:30:21,046 - ceph_node_proxy.reporter - ERROR - The reporter couldn't send data to the mgr: HTTP Error 500: Internal Server Error
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib/python3.9/site-packages/ceph_node_proxy/util.py", line 94, in run
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     self.main()
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib/python3.9/site-packages/ceph_node_proxy/reporter.py", line 50, in main
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     http_req(hostname=self.reporter_hostname,
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib/python3.9/site-packages/ceph_node_proxy/util.py", line 178, in http_req
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     with urlopen(req, context=ssl_ctx, timeout=timeout) as response:
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 214, in urlopen
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     return opener.open(url, data, timeout)
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 523, in open
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     response = meth(req, response)
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 632, in http_response
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     response = self.parent.error(
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 561, in error
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     return self._call_chain(*args)
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 494, in _call_chain
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     result = func(*args)
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:   File "/usr/lib64/python3.9/urllib/request.py", line 641, in http_error_default
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]:     raise HTTPError(req.full_url, code, msg, hdrs, fp)
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:30:22,461 - ceph_node_proxy.util - ERROR - Caught exception: HTTPError
Mar 01 13:30:22 ceph-node-01.pok.stglabs.ibm.com ceph-65942800-6254-11ee-b57b-c84bd69757d6-node-proxy-ceph-node-01[1389543]: 2024-03-01 18:30:22,461 - ceph_node_proxy.main - ERROR - <Reporter(Reporter, stopped daemon 140681760683584)> not running: HTTPError: HTTP Error 500: Internal Server Error

Related issues 2 (0 open2 closed)

Copied to Orchestrator - Backport #64749: squid: the node-proxy daemon fails to send data to the mgr endpointResolvedGuillaume AbriouxActions
Copied to Orchestrator - Backport #64750: reef: the node-proxy daemon fails to send data to the mgr endpointResolvedGuillaume AbriouxActions
Actions #1

Updated by Guillaume Abrioux 2 months ago

It turns out this is because of RedFish which returns unexpected data.

For instance:

[root@ceph-node-01 ~]# curl -s -k -X GET https://169.254.1.1/redfish/v1/Systems/System.Embedded.1/Storage/AHCI.SL.6-1/Drives/Disk.Direct.0-0:AHCI.SL.6-1 -H "X-Auth-Token: 3264251c28191fa5e7c9ebec49ef90fc"  | jq .Status
{
  "Health": "OK",
  "HealthRollup": "OK",
  "State": "Enabled" 
}
[root@ceph-node-01 ~]# curl -s -k -X GET https://169.254.1.1/redfish/v1/Systems/System.Embedded.1/Storage/NonRAID.Slot.2-1/Drives/Disk.Bay.0:Enclosure.Internal.0-1:NonRAID.Slot.2-1 -H "X-Auth-Token: 3264251c28191fa5e7c9ebec49ef90fc" | jq .Status
{
  "Health": null,
  "HealthRollup": null,
  "State": "Enabled" 
}
[root@ceph-node-01 ~]#

This is unclear to me why it returns "null" here for some devices but this is most likely a bug from the hardware/RedFish.

Actions #2

Updated by Guillaume Abrioux 2 months ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Guillaume Abrioux 2 months ago

  • Pull request ID set to 55955
Actions #4

Updated by Adam King 2 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot 2 months ago

  • Copied to Backport #64749: squid: the node-proxy daemon fails to send data to the mgr endpoint added
Actions #6

Updated by Backport Bot 2 months ago

  • Copied to Backport #64750: reef: the node-proxy daemon fails to send data to the mgr endpoint added
Actions #7

Updated by Backport Bot 2 months ago

  • Tags set to backport_processed
Actions #8

Updated by Adam King 2 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF