Project

General

Profile

Bug #5453

kclient: multiple_rsync tee output partially zeroed

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
06/25/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

latest run:

2013-06-25T10:29:15.811 INFO:teuthology.task.workunit.client.0.err:+ rsync -auv --exclude local/ /usr/ usr.2
2013-06-25T10:29:15.811 INFO:teuthology.task.workunit.client.0.err:+ tee a
2013-06-25T10:29:15.902 INFO:teuthology.task.workunit.client.0.out:sending incremental file list
2013-06-25T10:29:48.738 INFO:teuthology.task.workunit.client.0.out:
2013-06-25T10:29:48.740 INFO:teuthology.task.workunit.client.0.out:sent 1449972 bytes  received 7477 bytes  43505.94 bytes/sec
2013-06-25T10:29:48.740 INFO:teuthology.task.workunit.client.0.out:total size is 3205268241  speedup is 2199.23
2013-06-25T10:29:48.740 INFO:teuthology.task.workunit.client.0.err:+ hexdump -C a
2013-06-25T10:29:48.741 INFO:teuthology.task.workunit.client.0.out:00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
2013-06-25T10:29:48.741 INFO:teuthology.task.workunit.client.0.out:00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 0a 73  |...............s|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000020  65 6e 74 20 31 34 34 39  39 37 32 20 62 79 74 65  |ent 1449972 byte|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000030  73 20 20 72 65 63 65 69  76 65 64 20 37 34 37 37  |s  received 7477|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000040  20 62 79 74 65 73 20 20  34 33 35 30 35 2e 39 34  | bytes  43505.94|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000050  20 62 79 74 65 73 2f 73  65 63 0a 74 6f 74 61 6c  | bytes/sec.total|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000060  20 73 69 7a 65 20 69 73  20 33 32 30 35 32 36 38  | size is 3205268|
2013-06-25T10:29:48.742 INFO:teuthology.task.workunit.client.0.out:00000070  32 34 31 20 20 73 70 65  65 64 75 70 20 69 73 20  |241  speedup is |
2013-06-25T10:29:48.743 INFO:teuthology.task.workunit.client.0.out:00000080  32 31 39 39 2e 32 33 0a                           |2199.23.|
2013-06-25T10:29:48.743 INFO:teuthology.task.workunit.client.0.out:00000088
2013-06-25T10:29:48.744 INFO:teuthology.task.workunit.client.0.err:+ wc -l a
2013-06-25T10:29:48.744 INFO:teuthology.task.workunit.client.0.err:+ grep 4

no idea why teh first line is turning into 0's...

0001-mds-fix-O_TRUNC-locking.patch View (900 Bytes) Zheng Yan, 06/29/2013 05:35 PM

History

#1 Updated by Sage Weil over 6 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil

#2 Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to fs
  • Subject changed from multiple_rsync test fails to count the number of lines to kclient: multiple_rsync tee output partially zeroed
  • Status changed from In Progress to Verified
  • Priority changed from Urgent to High

putting the tee'd file in /tmp fixes the problem, implying this is a kclient/cephfs bug of some sort. moving this into that project and fixing the test in master so we stop tripping over it in the nightlies.

#3 Updated by Zheng Yan over 6 years ago

I can't reproduce this locally. how difficult to reproduce this? what's the backend fs for osd?

#4 Updated by Sage Weil over 6 years ago

i hit it after just a couple iterations of the teuthology test. i'll capture the osd log...

#5 Updated by Zheng Yan over 6 years ago

please check if the attached patch solves this issue

#6 Updated by Zheng Yan over 6 years ago

patch "ceph: fix pending vmtruncate race" should fix the issue.

#7 Updated by Sage Weil over 6 years ago

in combination with the mds s/wrlock/xlock/ change?

#8 Updated by Sage Weil over 6 years ago

btw the mds change passed 11 iterations before it stopped because of a chef/network hiccup.

#9 Updated by Zheng Yan over 6 years ago

Sage Weil wrote:

in combination with the mds s/wrlock/xlock/ change?

the kclient patch fixes this issue alone. the mds patch is for a potential issue.

#10 Updated by Sage Weil over 6 years ago

patch is in testing branch (tho i'm tracking down a different regression in that branch).

btw, yan, were you able to come up with a better reproducer here? would be nice to add something more targetted into the test suite.

#11 Updated by Zheng Yan over 6 years ago

my reproducer

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char *argv[])
{
        int fd, ret;
        char obuf[32], ibuf[1024];

        memset(obuf, 0xff, sizeof(obuf));

        while(1) {
                fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0644);
                printf("open fd = %d\n", fd);

                ret = write(fd, obuf, sizeof(obuf));
                printf("write ret = %d\n", ret);

                sleep(1);

                ret = write(fd, obuf, sizeof(obuf));
                printf("write ret = %d\n", ret);

                ret = pread(fd, ibuf, sizeof(ibuf), 0);
                printf("pread ret = %d\n", ret);

                if (memcmp(obuf, ibuf, sizeof(obuf))) {
                        printf("mismatch\n");
                        break;
                }
                close(fd);
        }
        return 0;
}

#12 Updated by Sage Weil over 6 years ago

  • Status changed from Verified to Resolved

thanks, added this to the test suite

Also available in: Atom PDF