Project

General

Profile

Bug #6609

teuthology rsync workunit failure

Added by Greg Farnum almost 6 years ago. Updated about 5 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
10/21/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

2013-10-20T07:24:04.860 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: sent 3440408971 bytes  received 976712 bytes  2308880.03 bytes/sec
2013-10-20T07:24:04.860 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: total size is 3436620004  speedup is 1.00
2013-10-20T07:24:04.863 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + echo we should get 4 here if no additional files are transfered
2013-10-20T07:24:04.863 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: we should get 4 here if no additional files are transfered
2013-10-20T07:24:04.864 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + rsync -auv --exclude local/ /usr/ usr.1
2013-10-20T07:24:04.864 INFO:teuthology.task.workunit.client.0.err:[10.214.132.13]: + tee /tmp/9567
2013-10-20T07:24:04.897 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: sending incremental file list
2013-10-20T07:24:53.855 INFO:teuthology.task.workunit.client.0.out:[10.214.132.13]: share/doc/

We clearly lost some file, but it looks like there wasn't any kernel logging enabled, so this ticket can serve as a placeholder for any similar failures.

History

#1 Updated by Greg Farnum almost 6 years ago

  • Category deleted (53)

/a/teuthology-2013-10-20_02:13:31-kcephfs-next-testing-basic-plana/60994
/a/teuthology-2013-10-20_02:13:10-fs-next-testing-basic-plana/60899/
This second one is all userspace and so has plenty of logs available.

#2 Updated by Zheng Yan almost 6 years ago

both tests only sent directory share/doc (but didn't sent files in share/doc) when rsync was executed for the second time. sounds like a timestamp issue, no idea how this can happen.

#3 Updated by Greg Farnum almost 6 years ago

I didn't look at the details much (even to figure out what the file transfer issues were). What kind of timestamp issue could have caused it to not sync the files appropriately?

#4 Updated by Zheng Yan almost 6 years ago

files were synced appropriately. rsync only sync directory share/doc/ 's timestamp or mode when it was executed for the second time. maybe someone else modified 'share/doc/' when rsync was running. I think we should re-run the test, check how reliable the issue can be reproduced.

#5 Updated by Greg Farnum almost 6 years ago

/a/teuthology-2013-10-31_23:01:45-kcephfs-next-testing-basic-plana/78406

I haven't checked what this is doing any more, but if it's because the timestamps are different, could this just be a new incarnation of the issue where the client sets a timestamp and the server doesn't take it because their clocks aren't synchronized?

#6 Updated by Zheng Yan almost 6 years ago

2013-11-01T13:37:12.841 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'sudo rm rf - /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2013-11-01T13:37:12.994 INFO:teuthology.task.workunit:Stopping misc on client.0...
2013-11-01T13:37:12.994 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rm rf - /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2013-11-01T13:37:13.009 DEBUG:teuthology.parallel:result is None
2013-11-01T13:37:13.010 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rm rf - /home/ubuntu/cephtest/mnt.0/client.0'
2013-11-01T13:37:13.196 INFO:teuthology.orchestra.run.err:[10.214.133.35]: rm: cannot remove `/home/ubuntu/cephtest/mnt.0/client.0': Permission denied
2013-11-01T13:37:13.196 ERROR:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/mnt.0/client.0
Traceback (most recent call last):
File "/home/teuthworker/teuthology-next/teuthology/task/workunit.py", line 132, in _delete_dir
client,
File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
r = self._runner(client=self.ssh, **kwargs)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
r.exitstatus = _check_status(r.exitstatus)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 267, in _check_status
raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.133.35 with status 1: 'rm rf - /home/ubuntu/cephtest/mnt.0/client.0'
2013-11-01T13:37:13.197 DEBUG:teuthology.orchestra.run:Running [10.214.133.35]: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
2013-11-01T13:37:13.204 INFO:teuthology.orchestra.run.err:[10.214.133.35]: rmdir: failed to remove `/home/ubuntu/cephtest/mnt.0': Device or resource busy
2013-11-01T13:37:13.204 ERROR:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/mnt.0
Traceback (most recent call last):
File "/home/teuthworker/teuthology-next/teuthology/task/workunit.py", line 144, in _delete_dir
mnt,
File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
r = self._runner(client=self.ssh, **kwargs)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
r.exitstatus = _check_status(r.exitstatus)
File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 267, in _check_status
raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.133.35 with status 1: 'rmdir -- /home/ubuntu/cephtest/mnt.0'

the issue of 78406 is not the same as previous issues. It's more like test script/env issue.

#7 Updated by Greg Farnum over 5 years ago

  • Priority changed from Normal to High

I haven't noticed this in a while, but upgrading as it was a failure across both clients.

#8 Updated by Sage Weil about 5 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF