Project

General

Profile

Actions

Bug #45345

closed

tasks/rados.py fails with "psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=17392)"

Added by Brad Hubbard almost 4 years ago. Updated over 2 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2020-04-28_21:58:13-rados-wip-yuri-testing-2020-04-24-1941-master-distro-basic-smithi/4995279

Looking at what rados.py is doing we see it launch 'ceph_test_rados' at 22:58.

2020-04-28T22:58:26.786 INFO:teuthology.run_tasks:Running task rados...
2020-04-28T22:58:26.802 INFO:tasks.rados:Beginning rados...
2020-04-28T22:58:26.803 DEBUG:teuthology.run_tasks:Unwinding manager rados
2020-04-28T22:58:26.815 INFO:tasks.rados:joining rados
2020-04-28T22:58:26.816 INFO:tasks.rados:clients are ['client.0', 'client.1', 'client.2']
2020-04-28T22:58:26.816 INFO:tasks.rados:starting run 0 out of 1
2020-04-28T22:58:26.816 INFO:teuthology.orchestra.run.smithi183:> true
2020-04-28T22:58:26.916 INFO:teuthology.orchestra.run.smithi183:> CEPH_CLIENT_ID=2 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op read 100 --op write 50 --op delete 50 --op snap_create 50 --op snap_remove 50 --op rollback 50 --op copy_from 50 --op cache_flush 50 --op cache_try_flush 50 --op cache_evict 50 --op write_excl 50 --pool base

The last output from 'ceph_test_rados' is recorded at 22:58.

$ grep "tasks\.rados" teuthology.log|tail -5
2020-04-28T22:58:29.468 INFO:tasks.rados.rados.2.smithi183.stdout:102:  seq_num 101 ranges {497507=712738,1710446=661696,3140781=556726}
2020-04-28T22:58:29.475 INFO:tasks.rados.rados.2.smithi183.stdout:102:  writing smithi18317443-102 from 497507 to 1210245 tid 1
2020-04-28T22:58:29.481 INFO:tasks.rados.rados.2.smithi183.stdout:102:  writing smithi18317443-102 from 1710446 to 2372142 tid 2
2020-04-28T22:58:29.489 INFO:tasks.rados.rados.2.smithi183.stdout:102:  writing smithi18317443-102 from 3140781 to 3697507 tid 3
2020-04-28T22:58:29.490 INFO:tasks.rados.rados.2.smithi183.stdout: waiting on 16

When rados.py finally tries to reap the process at 10:30 the next day the pid (presumably of 'ceph_test_rados') is no longer around.

2020-04-29T10:30:51.937 ERROR:teuthology.run_tasks:Manager failed: rados
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 167, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-04-24-1941-master/qa/tasks/rados.py", line 278, in task
    running.get()
  File "src/gevent/greenlet.py", line 704, in gevent._greenlet.Greenlet.get
  File "src/gevent/greenlet.py", line 692, in gevent._greenlet.Greenlet.get
  File "src/gevent/_greenlet_primitives.py", line 60, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 64, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=17392)

I strongly suspect 'ceph_test_rados' crashed but there are no logs to prove it.

Actions #1

Updated by Brad Hubbard almost 4 years ago

/a/teuthology-2020-04-26_07:01:02-rados-master-distro-basic-smithi/4985956

Actions #2

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF