Project

General

Profile

Bug #56391

Teuthology jobs scheduled with rhel result in "the output has been hidden due to the fact that 'no_log: true' was specified for this result"

Added by Laura Flores over 1 year ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-06-24_15:13:42-rados-wip-yuri4-testing-2022-06-22-1415-pacific-distro-default-smithi/teuthology.log

2022-06-24T15:59:07.388 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main to origin/main
2022-06-24T15:59:07.410 INFO:teuthology.task.ansible:Playbook: [{'import_playbook': 'ansible_managed.yml'}, {'import_playbook': 'teuthology.yml'}, {'hosts': 'testnodes', 'tasks': [{'set_fact': {'ran_from_cephlab_playbook': True}}]}, {'import_playbook': 'testnodes.yml'}, {'import_playbook': 'container-host.yml'}, {'import_playbook': 'cobbler.yml'}, {'import_playbook': 'paddles.yml'}, {'import_playbook': 'pulpito.yml'}, {'hosts': 'testnodes', 'become': True, 'tasks': [{'name': 'Touch /ceph-qa-ready', 'file': {'path': '/ceph-qa-ready', 'state': 'touch'}, 'when': 'ran_from_cephlab_playbook|bool'}]}]
2022-06-24T15:59:07.412 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi114.front.sepia.ceph.com,smithi175.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-06-24T16:01:50.160 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-06-24_15:13:42-rados-wip-yuri4-testing-2022-06-22-1415-pacific-distro-default-smithi/6897479/ansible_failures.yaml
2022-06-24T16:01:50.163 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/run_tasks.py", line 106, in run_tasks
    manager.__enter__()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/task/__init__.py", line 123, in __enter__
    self.begin()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/task/ansible.py", line 422, in begin
    super(CephLab, self).begin()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/task/ansible.py", line 265, in begin
    self.execute_playbook()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/task/ansible.py", line 291, in execute_playbook
    self._handle_failure(command, status)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_082ae7ef4302fa54665ed0a2535e8e254118dcfd/teuthology/task/ansible.py", line 316, in _handle_failure
    raise AnsibleFailedError(failures)
teuthology.exceptions.AnsibleFailedError: {'smithi175.front.sepia.ceph.com': {'attempts': 12, 'censored': "the output has been hidden due to the fact that 'no_log: true' was specified for this result", 'changed': True}, 'smithi114.front.sepia.ceph.com': {'attempts': 12, 'censored': "the output has been hidden due to the fact that 'no_log: true' was specified for this result", 'changed': True}}
2022-06-24T16:01:50.332 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=53b2d09a94b44934b3b4565b4e7a2b17

Seen in Pacific and Octopus:
https://pulpito.ceph.com/yuriw-2022-06-24_15:13:42-rados-wip-yuri4-testing-2022-06-22-1415-pacific-distro-default-smithi/
https://pulpito.ceph.com/yuriw-2022-06-24_16:58:58-rados-wip-yuri-testing-2022-06-24-0817-octopus-distro-default-smithi/

There seems to be an old yaml fragment somewhere that needs to be updated.

/a/yuriw-2022-06-24_15:13:42-rados-wip-yuri4-testing-2022-06-22-1415-pacific-distro-default-smithi/6897479/ansible_failures.yaml

smithi175.front.sepia.ceph.com:
  attempts: 12
  censored: 'the output has been hidden due to the fact that ''no_log: true'' was
    specified for this result'
  changed: true

smithi114.front.sepia.ceph.com:
  attempts: 12
  censored: 'the output has been hidden due to the fact that ''no_log: true'' was
    specified for this result'
  changed: true


Related issues

Related to sepia - Bug #41387: "Register with subscription-manager."fails after 5 retries Resolved 08/22/2019

History

#1 Updated by Laura Flores over 1 year ago

  • Subject changed from Teuthology jobs scheduled with rhel 8.3 result in "the output has been hidden due to the fact that 'no_log: true' was specified for this result" to Teuthology jobs scheduled with rhel 8.3 and 8.4 result in "the output has been hidden due to the fact that 'no_log: true' was specified for this result"

#2 Updated by Laura Flores over 1 year ago

Sentry reports this failure happening across the board: https://sentry.ceph.com/organizations/ceph/issues/36/events/53b2d09a94b44934b3b4565b4e7a2b17/events/?project=2

All occurrences started happening today on June 24th. The main branch is getting affected on rhel 8.6.

#3 Updated by Neha Ojha over 1 year ago

I wonder if this has something to do with the issue

/a/prsrivas-2022-06-24_15:53:46-rgw-wip-rgw-gc-break-chain-distro-default-smithi/6897697/ansible.log

Friday 24 June 2022  21:52:37 +0000 (0:00:00.043)       0:03:04.413 *********** ^M

^M
TASK [common : Determine if node is registered with subscription-manager.] *****^M

fatal: [smithi107.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}^M
...ignoring^M

Friday 24 June 2022  21:52:37 +0000 (0:00:00.922)       0:03:05.336 *********** ^M

fatal: [smithi165.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}^M
...ignoring^M
...
FAILED - RETRYING: Register with subscription-manager. (12 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (12 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (11 retries left).^M
FAILED - RETRYING: Register with subscription-manager. (11 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (10 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (10 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (9 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (9 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (8 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (8 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (7 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (7 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (6 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (6 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (5 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (5 retries left).^M

FAILED - RETRYING: Register with subscription-manager. (4 retries left).^M

#4 Updated by Laura Flores over 1 year ago

Yeah, that's coming from this task here:

https://github.com/ceph/ceph-cm-ansible/blob/main/roles/common/tasks/rhel-entitlements.yml#L38-L43

- name: Determine if node is registered with subscription-manager.
  command: subscription-manager identity
  register: subscription
  ignore_errors: true
  changed_when: false
  no_log: true

Doesn't seem like there's anything wrong with the tasks itself, and this file hasn't undergone any recent changes. But the problem seems to be that we can't successfully register a rhel node with the subscription manager.

#5 Updated by Laura Flores over 1 year ago

  • Related to Bug #41387: "Register with subscription-manager."fails after 5 retries added

#6 Updated by Laura Flores over 1 year ago

The same issue was fixed before in the related Tracker I linked.

#7 Updated by David Galloway over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to David Galloway

So the Satellite VM's disk filled up. I suspect there was already filesystem corruption but growing the partition, logical volume, and filesystem caused xfs to be very unhappy. The postgres DB won't start anymore.

I am restoring the Satellite server from a backup but anticipate it will take days to sync and stabilize as the most recent snapshot was from 2019.

#8 Updated by Laura Flores over 1 year ago

I see. Thanks for looking into it David.

#9 Updated by Laura Flores over 1 year ago

  • Subject changed from Teuthology jobs scheduled with rhel 8.3 and 8.4 result in "the output has been hidden due to the fact that 'no_log: true' was specified for this result" to Teuthology jobs scheduled with rhel result in "the output has been hidden due to the fact that 'no_log: true' was specified for this result"

#11 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6907408

#12 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905506/

#13 Updated by Matan Breizman over 1 year ago

/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973864

#14 Updated by Matan Breizman over 1 year ago

Not the same error message but it's also a failure with same ansible stage on a rhel node:

2022-08-15T23:56:29.852 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi078.front.sepia.ceph.com,smithi165.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-08-16T00:05:36.030 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973866/ansible_failures.yaml

/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973866

from ansible.log (/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973866/ansible.log) :

TASK [common : Determine if node is registered with subscription-manager.] *****
fatal: [smithi165.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
...ignoring

FAILED - RETRYING: Register with subscription-manager. (12 retries left).

#15 Updated by David Galloway over 1 year ago

Matan Breizman wrote:

Not the same error message but it's also a failure with same ansible stage on a rhel node:
[...]

/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973866

from ansible.log (/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973866/ansible.log) :

[...]

That is a non-fatal error and the system did go on to register successfully. The real error was

failed: [smithi165.front.sepia.ceph.com] (item={'key': 'nvme0n1', 'value': {'virtual': 1, 'links': {'ids': ['lvm-pv-uuid-O1eT7A-94aA-KB2Q-NwlE-Lq6O-L8F2-03YTlQ', 'nvme-INTEL_SSDPEDMD400G4_CVFT623300CE400BGN', 'nvme-nvme.8086-43564654363233333030434534303042474e-494e54454c205353445045444d443430304734-00000001'], 'uuids': [], 'labels': [], 'masters': ['dm-0', 'dm-1', 'dm-2', 'dm-3', 'dm-4']}, 'vendor': None, 'model': 'INTEL SSDPEDMD400G4', 'sas_address': None, 'sas_device_handle': None, 'removable': '0', 'support_discard': '512', 'partitions': {}, 'rotational': '0', 'scheduler_mode': 'none', 'sectors': '781422768', 'sectorsize': '512', 'size': '372.61 GB', 'host': 'Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)', 'holders': ['vg_nvme-lv_2', 'vg_nvme-lv_5', 'vg_nvme-lv_3', 'vg_nvme-lv_1', 'vg_nvme-lv_4']}}) => {"ansible_loop_var": "item", "item": {"key": "nvme0n1", "value": {"holders": ["vg_nvme-lv_2", "vg_nvme-lv_5", "vg_nvme-lv_3", "vg_nvme-lv_1", "vg_nvme-lv_4"], "host": "Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)", "links": {"ids": ["lvm-pv-uuid-O1eT7A-94aA-KB2Q-NwlE-Lq6O-L8F2-03YTlQ", "nvme-INTEL_SSDPEDMD400G4_CVFT623300CE400BGN", "nvme-nvme.8086-43564654363233333030434534303042474e-494e54454c205353445045444d443430304734-00000001"], "labels": [], "masters": ["dm-0", "dm-1", "dm-2", "dm-3", "dm-4"], "uuids": []}, "model": "INTEL SSDPEDMD400G4", "partitions": {}, "removable": "0", "rotational": "0", "sas_address": null, "sas_device_handle": null, "scheduler_mode": "none", "sectors": "781422768", "sectorsize": "512", "size": "372.61 GB", "support_discard": "512", "vendor": null, "virtual": 1}}, "msg": "Failed to connect to the host via ssh: ssh: connect to host smithi165.front.sepia.ceph.com port 22: No route to host", "unreachable": true}

I suspect this was caused by https://tracker.ceph.com/issues/45512

Also available in: Atom PDF