Project

General

Profile

Bug #42764

Test failure: test_diskprediction_local (tasks.mgr.test_module_selftest.TestModuleSelftest)

Added by Kefu Chai 9 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

2019-11-12T04:51:42.274 INFO:teuthology.orchestra.run.smithi204.stderr:2019-11-12T04:51:42.246+0000 7efec11ac700 20 mgrc start_command cmd: [{"prefix": "mgr self-test module", "target": ["mon-mgr", ""], "module": "diskprediction_local"}]
2019-11-12T04:51:42.274 INFO:teuthology.orchestra.run.smithi204.stderr:2019-11-12T04:51:42.246+0000 7efec11ac700  1 -- 172.21.15.204:0/264776426 --> [v2:172.21.15.204:6812/21695,v1:172.21.15.204:6813/21695] -- mgr_command(tid 0: {"prefix": "mgr self-test module", "target": ["mon-mgr", ""], "module": "diskprediction_local"}) v1 -- 0x7efe94001cf0 con 0x7efea00254e0
2019-11-12T04:51:51.972 INFO:teuthology.orchestra.run.smithi204.stderr:2019-11-12T04:51:51.945+0000 7efec09ab700  1 --2- 172.21.15.204:0/264776426 >> [v2:172.21.15.204:3301/0,v1:172.21.15.204:6790/0] conn(0x7efea0004a00 0x7efea0004ea0 secure :-1 s=READY pgs=206 cs=0 l=1 rx=0x7efeb40107b0 tx=0x7efeb4010b50).handle_read_frame_epilogue_main read frame epilogue bytes=32
2019-11-12T04:52:01.973 INFO:teuthology.orchestra.run.smithi204.stderr:2019-11-12T04:52:01.945+0000 7efec09ab700  1 --2- 172.21.15.204:0/264776426 >> [v2:172.21.15.204:3301/0,v1:172.21.15.204:6790/0] conn(0x7efea0004a00 0x7efea0004ea0 secure :-1 s=READY pgs=206 cs=0 l=1 rx=0x7efeb40107b0 tx=0x7efeb4010b50).handle_read_frame_epilogue_main read frame epilogue bytes=32
...
2019-11-12T04:53:41.847 DEBUG:teuthology.orchestra.run:got remote process result: 124
2019-11-12T04:53:41.854 INFO:tasks.cephfs_test_runner:test_diskprediction_local (tasks.mgr.test_module_selftest.TestModuleSelftest) ... ERROR

the client timed out.

2019-11-12T04:51:42.248+0000 7f7c26781700 20 mgr dispatch_remote Calling diskprediction_local.self_test...
2019-11-12T04:51:42.248+0000 7f7c26781700 20 mgr Gil Switched to new thread state 0x558000aa13f0
2019-11-12T04:51:42.248+0000 7f7c26781700 20 mgr ~Gil Destroying new thread state 0x558000aa13f0
2019-11-12T04:51:42.248+0000 7f7c26781700 20 mgr Gil Switched to new thread state 0x558000aa13f0
...

/a/kchai-2019-11-12_04:19:53-rados-master-distro-basic-smithi/4499185/


2019-11-21T16:35:53.540 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 --2- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:6800/44185,v1:172.21.15.47:6801/44185] conn(0x7f18c00227c0 0x7f18c0024c50 crc :-1 s=READY pgs=19 cs=0 l=1 rx=0 tx=0).stop
2019-11-21T16:35:53.540 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 -- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:3300/0,v1:172.21.15.47:6789/0] conn(0x7f18e810da50 msgr2=0x7f18e8110b70 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-11-21T16:35:53.540 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 --2- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:3300/0,v1:172.21.15.47:6789/0] conn(0x7f18e810da50 0x7f18e8110b70 secure :-1 s=READY pgs=508 cs=0 l=1 rx=0x7f18e4010460 tx=0x7f18e4010260).stop
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 -- 172.21.15.47:0/66082515 shutdown_connections
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 --2- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:6800/44185,v1:172.21.15.47:6801/44185] conn(0x7f18c00227c0 0x7f18c0024c50 unknown :-1 s=CLOSED pgs=19 cs=0 l=1 rx=0 tx=0).stop
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 --2- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:3301/0,v1:172.21.15.47:6790/0] conn(0x7f18e810ffb0 0x7f18e8110440 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rx=0 tx=0).stop
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 --2- 172.21.15.47:0/66082515 >> [v2:172.21.15.47:3300/0,v1:172.21.15.47:6789/0] conn(0x7f18e810da50 0x7f18e8110b70 unknown :-1 s=CLOSED pgs=508 cs=0 l=1 rx=0 tx=0).stop
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 -- 172.21.15.47:0/66082515 shutdown_connections
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 -- 172.21.15.47:0/66082515 wait complete.
2019-11-21T16:35:53.541 INFO:teuthology.orchestra.run.smithi047.stderr:2019-11-21T16:35:53.545+0000 7f18ef212700  1 -- 172.21.15.47:0/66082515 >> 172.21.15.47:0/66082515 conn(0x7f18e81494f0 msgr2=0x7f18e8107080 unknown :-1 s=STATE_NONE l=0).mark_down
2019-11-21T16:35:53.550 INFO:tasks.cephfs_test_runner:
2019-11-21T16:35:53.551 INFO:tasks.cephfs_test_runner:======================================================================
2019-11-21T16:35:53.551 INFO:tasks.cephfs_test_runner:ERROR: test_diskprediction_local (tasks.mgr.test_module_selftest.TestModuleSelftest)
2019-11-21T16:35:53.551 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-11-21T16:35:53.551 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2019-11-21T16:35:53.551 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/mgr/test_module_selftest.py", line 51, in test_diskprediction_local
2019-11-21T16:35:53.552 INFO:tasks.cephfs_test_runner:    self._selftest_plugin("diskprediction_local")
2019-11-21T16:35:53.552 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/mgr/test_module_selftest.py", line 34, in _selftest_plugin
2019-11-21T16:35:53.553 INFO:tasks.cephfs_test_runner:    "mgr", "self-test", "module", module_name)
2019-11-21T16:35:53.553 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/ceph_manager.py", line 1249, in raw_cluster_cmd
2019-11-21T16:35:53.553 INFO:tasks.cephfs_test_runner:    stdout=StringIO(),
2019-11-21T16:35:53.553 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 199, in run
2019-11-21T16:35:53.553 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:    r.wait()
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
2019-11-21T16:35:53.554 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2019-11-21T16:35:53.555 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on smithi047 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr self-test module diskprediction_local'
2019-11-21T16:35:53.555 INFO:tasks.cephfs_test_runner:
2019-11-21T16:35:53.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-11-21T16:35:53.556 INFO:tasks.cephfs_test_runner:Ran 4 tests in 278.791s
2019-11-21T16:35:53.556 INFO:tasks.cephfs_test_runner:
2019-11-21T16:35:53.556 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2019-11-21T16:35:53.556 INFO:tasks.cephfs_test_runner:
2019-11-21T16:35:53.556 INFO:tasks.cephfs_test_runner:======================================================================
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:ERROR: test_diskprediction_local (tasks.mgr.test_module_selftest.TestModuleSelftest)
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/mgr/test_module_selftest.py", line 51, in test_diskprediction_local
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:    self._selftest_plugin("diskprediction_local")
2019-11-21T16:35:53.557 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/mgr/test_module_selftest.py", line 34, in _selftest_plugin
2019-11-21T16:35:53.558 INFO:tasks.cephfs_test_runner:    "mgr", "self-test", "module", module_name)
2019-11-21T16:35:53.558 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-master_baselline_11.20.19/qa/tasks/ceph_manager.py", line 1249, in raw_cluster_cmd
2019-11-21T16:35:53.558 INFO:tasks.cephfs_test_runner:    stdout=StringIO(),
2019-11-21T16:35:53.558 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 199, in run
2019-11-21T16:35:53.558 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2019-11-21T16:35:53.559 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:    r.wait()
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2019-11-21T16:35:53.560 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on smithi047 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr self-test module diskprediction_local'


Related issues

Related to mgr - Bug #43230: mgr/dashboard: test_diskprediction_local failure when running the backend API tests (vstart) Resolved
Related to mgr - Bug #45147: Module 'diskprediction_local' takes forever to load New

History

#2 Updated by Laura Paduano 9 months ago

  • Description updated (diff)

#3 Updated by Sage Weil 9 months ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent

/a/sage-2019-11-22_21:29:00-rados-wip-sage-testing-2019-11-22-1122-distro-basic-smithi/4534269

#4 Updated by Sage Weil 8 months ago

/a/sage-2019-12-04_19:33:15-rados-wip-sage2-testing-2019-12-04-0856-distro-basic-smithi/4567007

#5 Updated by Sage Weil 8 months ago

i think this might be a centos thing

#7 Updated by Patrick Donnelly 8 months ago

  • Status changed from 12 to New

#8 Updated by Sage Weil 8 months ago

ok, it hangs when predictor.py imports scipy. this hangs right when the module loads:

diff --git a/src/pybind/mgr/diskprediction_local/module.py b/src/pybind/mgr/diskprediction_local/module.py
index 24f92c2c28a..5e34bdea983 100644
--- a/src/pybind/mgr/diskprediction_local/module.py
+++ b/src/pybind/mgr/diskprediction_local/module.py
@@ -9,6 +9,7 @@ import time

 from mgr_module import MgrModule, CommandResult

+from .predictor import get_diskfailurepredictor_path

 TIME_FORMAT = '%Y%m%d-%H%M%S'
 TIME_DAYS = 24*60*60

but this does not:

diff --git a/src/pybind/mgr/diskprediction_local/module.py b/src/pybind/mgr/diskprediction_local/module.py
index 24f92c2c28a..5e34bdea983 100644
--- a/src/pybind/mgr/diskprediction_local/module.py
+++ b/src/pybind/mgr/diskprediction_local/module.py
@@ -9,6 +9,7 @@ import time

 from mgr_module import MgrModule, CommandResult

+from .predictor import get_diskfailurepredictor_path

 TIME_FORMAT = '%Y%m%d-%H%M%S'
 TIME_DAYS = 24*60*60
diff --git a/src/pybind/mgr/diskprediction_local/predictor.py b/src/pybind/mgr/diskprediction_local/predictor.py
index 548454145ce..c61d739d0fd 100644
--- a/src/pybind/mgr/diskprediction_local/predictor.py
+++ b/src/pybind/mgr/diskprediction_local/predictor.py
@@ -27,7 +27,7 @@ import pickle
 import logging

 import numpy as np
-from scipy import stats
+#from scipy import stats

 def get_diskfailurepredictor_path():

also, this does not... and the self-test works.

diff --git a/src/pybind/mgr/diskprediction_local/module.py b/src/pybind/mgr/diskprediction_local/module.py
index 24f92c2c28a..fa48c9e15b4 100644
--- a/src/pybind/mgr/diskprediction_local/module.py
+++ b/src/pybind/mgr/diskprediction_local/module.py
@@ -9,6 +9,7 @@ import time

 from mgr_module import MgrModule, CommandResult

+import scipy

 TIME_FORMAT = '%Y%m%d-%H%M%S'
 TIME_DAYS = 24*60*60

maybe there is some hidden statup going on with scipy, and doing the import later in the command thread leads to some weird problem?

#9 Updated by Sage Weil 8 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32102

#10 Updated by Sage Weil 8 months ago

  • Status changed from Fix Under Review to Resolved

#11 Updated by Laura Paduano 8 months ago

  • Related to Bug #43230: mgr/dashboard: test_diskprediction_local failure when running the backend API tests (vstart) added

#12 Updated by Kefu Chai 4 months ago

  • Related to Bug #45147: Module 'diskprediction_local' takes forever to load added

Also available in: Atom PDF