Project

General

Profile

Bug #53582

common: ceph_argparse: python error when running vstart.sh after build including tests

Added by Alfonso Martínez 12 months ago. Updated 4 months ago.

Status:
New
Priority:
Urgent
% Done:

0%

Source:
Tags:
test-failure
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Observed a (recently) repeating error in all "https://jenkins.ceph.com/job/ceph-dashboard-pull-requests" job runs:
"waiting for mgr dashboard module to start"

Example of E2E Job failed:
https://jenkins.ceph.com/job/ceph-dashboard-pull-requests/9539/consoleText

I logged in to braggi02 to check mgr.x.log file and I found these errors:

2021-12-10T11:27:01.531+0000 7fe36de73980  1 mgr[py] Loading python module 'prometheus'
2021-12-10T11:27:01.647+0000 7fe36de73980 10 mgr[py] Computed sys.path '/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages:/usr/lib/python3.8/dist-packages:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/build/lib/cython_modules/lib.3:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/python-common::/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload'
2021-12-10T11:27:01.859+0000 7fe36de73980 -1 mgr[py] Module not found: 'prometheus'
2021-12-10T11:27:01.859+0000 7fe36de73980 -1 mgr[py] Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/prometheus/__init__.py", line 2, in <module>
    from .module import Module, StandbyModule
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/prometheus/module.py", line 451, in <module>
    class Module(MgrModule):
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/prometheus/module.py", line 1645, in Module
    def _list_healthchecks(self, format: Format = Format.plain) -> HandleCommandResult:
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 387, in __call__
    self.store_func_metadata(func)
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 384, in store_func_metadata
    self.load_func_metadata(f)
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 376, in load_func_metadata
    args.append(CephArgtype.to_argdesc(arg_spec[arg],
TypeError: to_argdesc() takes from 2 to 3 positional arguments but 4 were given

[...]

2021-12-10T11:26:59.791+0000 7fe36de73980  1 mgr[py] Loading python module 'dashboard'
2021-12-10T11:26:59.803+0000 7fe36de73980 10 mgr[py] Computed sys.path '/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages:/usr/lib/python3.8/dist-packages:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/build/lib/cython_modules/lib.3:/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/python-common::/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload'
2021-12-10T11:27:00.007+0000 7fe36de73980 -1 mgr[py] Module not found: 'dashboard'
2021-12-10T11:27:00.007+0000 7fe36de73980 -1 mgr[py] Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/__init__.py", line 52, in <module>
    from .module import Module, StandbyModule  # noqa: F401
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/module.py", line 28, in <module>
    from .controllers import Router, json_error_page
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/controllers/__init__.py", line 1, in <module>
    from ._api_router import APIRouter
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/controllers/_api_router.py", line 1, in <module>
    from ._router import Router
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/controllers/_router.py", line 7, in <module>
    from ._base_controller import BaseController
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/controllers/_base_controller.py", line 11, in <module>
    from ..services.auth import AuthManager, JwtManager
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/services/auth.py", line 15, in <module>
    from .access_control import LocalAuthenticator, UserDoesNotExist
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/services/access_control.py", line 579, in <module>
    def set_login_credentials_cmd(_, username: str, inbuf: str):
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 387, in __call__
    self.store_func_metadata(func)
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 384, in store_func_metadata
    self.load_func_metadata(f)
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/mgr_module.py", line 376, in load_func_metadata
    args.append(CephArgtype.to_argdesc(arg_spec[arg],
TypeError: to_argdesc() takes from 2 to 3 positional arguments but 4 were given

[...]

2021-12-10T11:27:33.388+0000 7fe354318700 20 mgr.server operator() health checks:
{
    "MGR_MODULE_DEPENDENCY": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "12 mgr modules have failed dependencies",
            "count": 12
        },
        "detail": [
            {
                "message": "Module 'balancer' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'crash' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'dashboard' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'devicehealth' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'iostat' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'nfs' has failed dependency: cannot import name 'OSDMethod' from 'ceph.deployment.drive_group' (/usr/lib/python3/dist-packages/ceph/deployment/drive_group.py)" 
            },
            {
                "message": "Module 'orchestrator' has failed dependency: cannot import name 'OSDMethod' from 'ceph.deployment.drive_group' (/usr/lib/python3/dist-packages/ceph/deployment/drive_group.py)" 
            },
            {
                "message": "Module 'pg_autoscaler' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'rbd_support' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'status' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'telemetry' has failed dependency: to_argdesc() takes from 2 to 3 positional arguments but 4 were given" 
            },
            {
                "message": "Module 'volumes' has failed dependency: cannot import name 'OSDMethod' from 'ceph.deployment.drive_group' (/usr/lib/python3/dist-packages/ceph/deployment/drive_group.py)" 
            }
        ]
    },

It seems related to ceph_argparse and I noticed that Ceph E2E nightly job for master was successful.
Example of E2E nightly successful:
https://jenkins.ceph.com/view/all/job/ceph-api-nightly-master-e2e/770/consoleText

The only difference that I found is that in nightly job we do not build the tests:

export FOR_MAKE_CHECK=1; timeout 2h ./src/script/run-make.sh --cmake-args '-DWITH_TESTS=OFF -DENABLE_GIT_VERSION=OFF'

But in E2E PR job we do:
export NPROC=$(nproc) CHECK_MAKEOPTS='-j$(nproc) -N -Q'; timeout 7200 ./run-make-check.sh

After editing manually the PR E2E job in jenkins to do the same as nightly job, then the dashboard module was starting correctly,
so it seems that somehow building the tests has some side effect impacting ceph_argparse.

History

#1 Updated by Alfonso Martínez 12 months ago

  • Description updated (diff)

#2 Updated by Alfonso Martínez 12 months ago

  • Description updated (diff)

#3 Updated by Alfonso Martínez 12 months ago

ceph-build PR that updates job definition in order to aovid building tests:
https://github.com/ceph/ceph-build/pull/1940

#4 Updated by Ernesto Puerta 8 months ago

This is now affecting Ceph API tests too (e.g.: https://jenkins.ceph.com/job/ceph-api/34315/), but on those the tests are not built:

timeout 2h ./src/script/run-make.sh \
        --cmake-args '-DWITH_TESTS=OFF -DENABLE_GIT_VERSION=OFF'

#5 Updated by Radoslaw Zarzynski 8 months ago

Just grepped over the https://jenkins.ceph.com/job/ceph-api/34315/'s full log for TypeError (as in the original report) but I can't find a match. Which exact error are you seeing there?

#6 Updated by Ernesto Puerta 6 months ago

Recorded in this API run in 172.21.2.11+braggi11

#7 Updated by Ernesto Puerta 6 months ago

Now the failure was in braggi12

#8 Updated by Ernesto Puerta 6 months ago

  • Tags set to test-failure

#10 Updated by Ernesto Puerta 6 months ago

  • Assignee set to David Galloway
  • Priority changed from Normal to Urgent
  • Tags set to test-failure
  • Severity changed from 3 - minor to 2 - major

#11 Updated by Radoslaw Zarzynski 6 months ago

Hello Ernesto!

The links in #6 and #7 are leading to 404, unfortunately :-(.

Anyway, what's the reason behind the reprioritization?

#12 Updated by Ernesto Puerta 6 months ago

Radoslaw Zarzynski wrote:

Hello Ernesto!

The links in #6 and #7 are leading to 404, unfortunately :-(.

Anyway, what's the reason behind the reprioritization?

Hi Radoslaw, I changed the project to CI (since it's an issue with the Jenkins-Sepia nodes) and increased the prio, but it seems that the project change failed and this remained in RADOS. I tried again and no better luck... Anyway, it's assigned to David so it's nothing strictly RADOS.

#13 Updated by Radoslaw Zarzynski 5 months ago

  • Project changed from RADOS to CI
  • Category deleted (Tests)
  • Target version deleted (v17.0.0)

Moving from RADOS to CI per the prev's comment.

Also available in: Atom PDF