Project

General

Profile

Bug #8327

hang on SFTP file transfers

Added by Sage Weil almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2014-05-09T17:43:15.577 DEBUG:teuthology.misc:Transferring archived files from ubuntu@plana52.front.sepia.ceph.com:/var/log/ceph to /var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245/remote/ubuntu@plana52.front.sepia.ceph.com/log
2014-05-09T17:43:15.577 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2014-05-09T17:43:15.625 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'sudo tar cz -f /tmp/tmpRPiw0d -C /var/log/ceph -- .'
2014-05-09T17:44:22.032 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'sudo chmod 0666 /tmp/tmpRPiw0d'
2014-05-09T17:45:22.147 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'rm -fr /tmp/tmpRPiw0d'
2014-05-09T17:46:15.311 DEBUG:teuthology.misc:Transferring archived files from ubuntu@plana29.front.sepia.ceph.com:/var/log/ceph to /var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245/remote/ubuntu@plana29.front.sepia.ceph.com/log
2014-05-09T17:46:15.311 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2014-05-09T17:46:15.622 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: 'sudo tar cz -f /tmp/tmpZCpr8w -C /var/log/ceph -- .'
2014-05-09T17:53:50.925 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: 'sudo chmod 0666 /tmp/tmpZCpr8w'

lots of runs seem to be hanging in this way. if I ssh into the host, I see nothing running.

ubuntu@teuthology:/var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245

is one example

History

#1 Updated by Sage Weil almost 10 years ago

more of these still happening...

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-05-09_23:02:16-kcephfs-master-testing-basic-plana/247463

#2 Updated by Ian Colle almost 10 years ago

  • Assignee set to Zack Cerza
  • Target version set to sprint6

#3 Updated by Zack Cerza almost 10 years ago

I don't think this is just hanging on a chmod call. I just noticed this job had been running for a long time:
http://pulpito.front.sepia.ceph.com/teuthology-2014-05-12_23:00:44-rgw-master-testing-basic-plana/251859/

On plana74 I had to kill -9 a ceph-osd process, at which point the job resumed.

#4 Updated by Zack Cerza almost 10 years ago

This is almost certainly due to the way paramiko handles sftp in newer versions. I added some debug output:

2014-05-14 18:38:03,061.061 DEBUG:teuthology.misc:Transferring archived files from plana59:/var/log/ceph to 8327/remote/plana59/log
2014-05-14 18:38:03,061.061 INFO:teuthology.orchestra.run.plana59:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2014-05-14 18:38:03,381.381 INFO:teuthology.orchestra.run.plana59:Running: 'sudo tar cz -f /tmp/tmpc4XBsD -C /var/log/ceph -- .'
2014-05-14 18:45:48,811.811 INFO:teuthology.orchestra.run.plana59:Running: 'sudo chmod 0666 /tmp/tmpc4XBsD'
2014-05-14 18:45:49,418.418 DEBUG:teuthology.orchestra.remote:SFTP session start on Remote(name=u'ubuntu@plana59.front.sepia.ceph.com')
2014-05-14 18:45:49,436.436 DEBUG:teuthology.orchestra.remote:SFTP get: plana59:/tmp/tmpc4XBsD -> /home/ubuntu/zack/teuthology/8327/remote/plana59/log/tmpSiE7OC

That was almost 2 hours ago, and though the file in question is 7.6GB, iftop on plana59 isn't showing any significant network traffic.

I ran the exact same test under:
  • teuthology's master branch
  • teuthology's master branch with the debug output I showed above
  • teuthology's firefly branch
  • teuthology's master branch with an old copy of paramiko (1.7.7.1-2ubuntu1, yes that is the system package)

I see the hang on the two master branch runs that each use newer, non-distro versions of paramiko. I do not see it on the firefly branch run, or the master branch run that I forced to use the distro paramiko.

I am going to work around this for now by downgrading paramiko again, but to me this clearly is not a real solution; that version is several years old and we can't be stuck with it forever.

#6 Updated by Zack Cerza almost 10 years ago

  • Subject changed from hang on sudo chmod 0666 /tmp/tmpZCpr8w to hang on SFTP file transfers
  • Status changed from New to Resolved

Marking as resolved. Opened #8365 for the future paramiko issue.

Also available in: Atom PDF