Bug #6609
closed
teuthology rsync workunit failure
Added by Greg Farnum over 10 years ago.
Updated almost 10 years ago.
Description
2013-10-20T07:24:04.860 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: sent 3440408971 bytes received 976712 bytes 2308880.03 bytes/sec
2013-10-20T07:24:04.860 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: total size is 3436620004 speedup is 1.00
2013-10-20T07:24:04.863 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + echo we should get 4 here if no additional files are transfered
2013-10-20T07:24:04.863 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: we should get 4 here if no additional files are transfered
2013-10-20T07:24:04.864 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + rsync -auv --exclude local/ /usr/ usr.1
2013-10-20T07:24:04.864 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + tee /tmp/9567
2013-10-20T07:24:04.897 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: sending incremental file list
2013-10-20T07:24:53.855 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: share/doc/
We clearly lost some file, but it looks like there wasn't any kernel logging enabled, so this ticket can serve as a placeholder for any similar failures.
/a/teuthology-2013-10-20_02:13:31-kcephfs-next-testing-basic-plana/60994
/a/teuthology-2013-10-20_02:13:10-fs-next-testing-basic-plana/60899/
This second one is all userspace and so has plenty of logs available.
both tests only sent directory share/doc (but didn't sent files in share/doc) when rsync was executed for the second time. sounds like a timestamp issue, no idea how this can happen.
I didn't look at the details much (even to figure out what the file transfer issues were). What kind of timestamp issue could have caused it to not sync the files appropriately?
files were synced appropriately. rsync only sync directory share/doc/ 's timestamp or mode when it was executed for the second time. maybe someone else modified 'share/doc/' when rsync was running. I think we should re-run the test, check how reliable the issue can be reproduced.
/a/teuthology-2013-10-31_23:01:45-kcephfs-next-testing-basic-plana/78406
I haven't checked what this is doing any more, but if it's because the timestamps are different, could this just be a new incarnation of the issue where the client sets a timestamp and the server doesn't take it because their clocks aren't synchronized?
2013-11-01T13:37:12.841 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'sudo rm rf - /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2013-11-01T13:37:12.994 INFO:teuthology.task.workunit:Stopping misc on client.0...
2013-11-01T13:37:12.994 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rm rf - /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2013-11-01T13:37:13.009 DEBUG:teuthology.parallel:result is None
2013-11-01T13:37:13.010 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rm rf - /home/ubuntu/cephtest/mnt.0/client.0'
2013-11-01T13:37:13.196 INFO:teuthology.orchestra.run.err:[10.214.133.35]: rm: cannot remove `/home/ubuntu/cephtest/mnt.0/client.0': Permission denied
2013-11-01T13:37:13.196 ERROR:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/mnt.0/client.0
Traceback (most recent call last):
File "/home/teuthworker/teuthology-next/teuthology/task/workunit.py", line 132, in _delete_dir
client,
File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
r = self._runner(client=self.ssh, **kwargs)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
r.exitstatus = _check_status(r.exitstatus)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 267, in _check_status
raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.133.35 with status 1: 'rm rf - /home/ubuntu/cephtest/mnt.0/client.0'
2013-11-01T13:37:13.197 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
2013-11-01T13:37:13.204 INFO:teuthology.orchestra.run.err:[10.214.133.35]: rmdir: failed to remove `/home/ubuntu/cephtest/mnt.0': Device or resource busy
2013-11-01T13:37:13.204 ERROR:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/mnt.0
Traceback (most recent call last):
File "/home/teuthworker/teuthology-next/teuthology/task/workunit.py", line 144, in _delete_dir
mnt,
File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
r = self._runner(client=self.ssh, **kwargs)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
r.exitstatus = _check_status(r.exitstatus)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 267, in _check_status
raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.133.35 with status 1: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
the issue of 78406 is not the same as previous issues. It's more like test script/env issue.
- Priority changed from Normal to High
I haven't noticed this in a while, but upgrading as it was a failure across both clients.
- Status changed from New to Can't reproduce
Also available in: Atom
PDF