Bug #8327
hang on SFTP file transfers
0%
Description
2014-05-09T17:43:15.577 DEBUG:teuthology.misc:Transferring archived files from ubuntu@plana52.front.sepia.ceph.com:/var/log/ceph to /var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245/remote/ubuntu@plana52.front.sepia.ceph.com/log 2014-05-09T17:43:15.577 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2014-05-09T17:43:15.625 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'sudo tar cz -f /tmp/tmpRPiw0d -C /var/log/ceph -- .' 2014-05-09T17:44:22.032 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'sudo chmod 0666 /tmp/tmpRPiw0d' 2014-05-09T17:45:22.147 DEBUG:teuthology.orchestra.run:Running [10.214.132.26]: 'rm -fr /tmp/tmpRPiw0d' 2014-05-09T17:46:15.311 DEBUG:teuthology.misc:Transferring archived files from ubuntu@plana29.front.sepia.ceph.com:/var/log/ceph to /var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245/remote/ubuntu@plana29.front.sepia.ceph.com/log 2014-05-09T17:46:15.311 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2014-05-09T17:46:15.622 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: 'sudo tar cz -f /tmp/tmpZCpr8w -C /var/log/ceph -- .' 2014-05-09T17:53:50.925 DEBUG:teuthology.orchestra.run:Running [10.214.131.11]: 'sudo chmod 0666 /tmp/tmpZCpr8w'
lots of runs seem to be hanging in this way. if I ssh into the host, I see nothing running.
ubuntu@teuthology:/var/lib/teuthworker/archive/ubuntu-2014-05-09_08:59:48-fs-wip-5021-testing-basic-plana/245245
is one example
History
#1 Updated by Sage Weil almost 10 years ago
more of these still happening...
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-05-09_23:02:16-kcephfs-master-testing-basic-plana/247463
#2 Updated by Ian Colle almost 10 years ago
- Assignee set to Zack Cerza
- Target version set to sprint6
#3 Updated by Zack Cerza almost 10 years ago
I don't think this is just hanging on a chmod
call. I just noticed this job had been running for a long time:
http://pulpito.front.sepia.ceph.com/teuthology-2014-05-12_23:00:44-rgw-master-testing-basic-plana/251859/
On plana74 I had to kill -9
a ceph-osd
process, at which point the job resumed.
#4 Updated by Zack Cerza almost 10 years ago
This is almost certainly due to the way paramiko handles sftp in newer versions. I added some debug output:
2014-05-14 18:38:03,061.061 DEBUG:teuthology.misc:Transferring archived files from plana59:/var/log/ceph to 8327/remote/plana59/log 2014-05-14 18:38:03,061.061 INFO:teuthology.orchestra.run.plana59:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2014-05-14 18:38:03,381.381 INFO:teuthology.orchestra.run.plana59:Running: 'sudo tar cz -f /tmp/tmpc4XBsD -C /var/log/ceph -- .' 2014-05-14 18:45:48,811.811 INFO:teuthology.orchestra.run.plana59:Running: 'sudo chmod 0666 /tmp/tmpc4XBsD' 2014-05-14 18:45:49,418.418 DEBUG:teuthology.orchestra.remote:SFTP session start on Remote(name=u'ubuntu@plana59.front.sepia.ceph.com') 2014-05-14 18:45:49,436.436 DEBUG:teuthology.orchestra.remote:SFTP get: plana59:/tmp/tmpc4XBsD -> /home/ubuntu/zack/teuthology/8327/remote/plana59/log/tmpSiE7OC
That was almost 2 hours ago, and though the file in question is 7.6GB, iftop on plana59 isn't showing any significant network traffic.
I ran the exact same test under:- teuthology's master branch
- teuthology's master branch with the debug output I showed above
- teuthology's firefly branch
- teuthology's master branch with an old copy of paramiko (1.7.7.1-2ubuntu1, yes that is the system package)
I see the hang on the two master branch runs that each use newer, non-distro versions of paramiko. I do not see it on the firefly branch run, or the master branch run that I forced to use the distro paramiko.
I am going to work around this for now by downgrading paramiko again, but to me this clearly is not a real solution; that version is several years old and we can't be stuck with it forever.
#6 Updated by Zack Cerza almost 10 years ago
- Subject changed from hang on sudo chmod 0666 /tmp/tmpZCpr8w to hang on SFTP file transfers
- Status changed from New to Resolved
Marking as resolved. Opened #8365 for the future paramiko issue.