Project

General

Profile

Actions

Bug #19901

closed

LibRadosMiscConnectFailure.ConnectFailure hang

Added by Sage Weil almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel,kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have been seeing test.sh failures for a while where the only clue (that I see) is

2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ wait 269197
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ for t in '"${!pids[@]}"'
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ pid=269342
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ wait 269342
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ for t in '"${!pids[@]}"'
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ pid=269169
2017-05-10T05:15:33.027 INFO:tasks.workunit.client.0.smithi183.stderr:+ wait 269169
2017-05-10T08:05:35.700 INFO:tasks.workunit.client.0.smithi183.stderr:++ cleanup
2017-05-10T08:05:35.840 INFO:tasks.workunit.client.0.smithi183.stderr:++ pkill -P 269150
2017-05-10T08:05:35.844 INFO:tasks.workunit.client.0.smithi183.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh: line 9: 269169 Terminated              bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$f.log | sed \
"s/^/$r: /\"" 
2017-05-10T08:05:35.846 INFO:tasks.workunit.client.0.smithi183.stderr:++ true
2017-05-10T08:05:35.849 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...
2017-05-10T08:05:35.852 INFO:teuthology.orchestra.run.smithi183:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2017-05-10T08:05:35.972 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-sage-testing2/qa/tasks/workunit.py", line 176, in task
    config.get('env'), timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:

I can't tell which child is being killed. This is been happening for a while but I haven't been tracking it. Time to start!

/a/sage-2017-05-10_03:08:19-rados-wip-sage-testing2---basic-smithi/1119773


Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #20270: jewel: LibRadosMiscConnectFailure.ConnectFailure hangResolvedNathan CutlerActions
Copied to Ceph - Backport #20271: kraken: LibRadosMiscConnectFailure.ConnectFailure hangResolvedNathan CutlerActions
Actions

Also available in: Atom PDF