Subtask #10489
teuthology - Feature #1398: qa: multiclient file io test
Mpi tests fail on both ceph-fuse and kclient
0%
Description
Mpi tests fail on ceph-fuse and kclient. I will post some tests and results that demonstrate this failure.
History
#1 Updated by Warren Usui over 8 years ago
Running teuthology using the following yaml demonstrates this problem:
interactive-on-error: true roles: - [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2] - [client.2] - [client.1] - [client.0] overrides: ceph: conf: mds: debug ms: 1 debug mds: 20 client: debug ms: 1 debug client: 20 overrides: ceph: fs: xfs conf: osd: osd sloppy crc: true osd op thread timeout: 60 # make sure we get the same MPI version on all hosts os_type: ubuntu os_version: "14.04" tasks: - chef: null - install: - ceph: - kclient: - pexec: clients: - cd $TESTDIR - wget http://ceph.com/qa/fsx-mpi.c - mpicc fsx-mpi.c -o fsx-mpi - rm fsx-mpi.c - ln -s $TESTDIR/mnt.* $TESTDIR/gmnt - ssh_keys: - mpi: exec: sudo $TESTDIR/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 $TESTDIR/gmnt workdir: $TESTDIR/gmnt - pexec: all: - rm -rf $TESTDIR/gmnt - rm -rf $TESTDIR/fsx-mpi
At the point where the teuthology-task goes interactive, ssh to one of the
clients. df gmnt shows:
10.214.138.154:6789,10.214.138.154:6791,10.214.138.154:6790:/ 628838400 413696 628424704 1% /home/ubuntu/cephtest/mnt.1
running the following command (from the cephtest directory) shows the corresponding following failure:
COMMAND:
mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /home/ubuntu/cephtest/gmnt
FAILURE:
Warning: Permanently added '10.214.138.156' (ECDSA) to the list of known hosts. Warning: Permanently added '10.214.138.142' (ECDSA) to the list of known hosts. skipping zero size read truncating to largest ever: 0x7cccb READ BAD DATA: offset = 0x2568c, size = 0xe8a1 OFFSET GOOD BAD RANGE 0x2568c 0x068f 0x0000 0x e838 operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops LOG DUMP (8 total operations): 1(1 mod 256): SKIPPED (no operation) 2(2 mod 256): WRITE 0x1c748 thru 0xa6f7b (0x8a834 bytes) HOLE ***WWWW 3(3 mod 256): WRITE 0x7ea33 thru 0x7fc14 (0x11e2 bytes) 4(4 mod 256): READ 0x21437 thru 0x42e18 (0x219e2 bytes) ***RRRR*** 5(5 mod 256): MAPREAD 0x6f92f thru 0x77b31 (0x8203 bytes) 6(6 mod 256): WRITE 0x1860 thru 0x8dc6f (0x8c410 bytes) ***WWWW 7(7 mod 256): TRUNCATE DOWN from 0xa6f7c to 0x7cccb 8(8 mod 256): READ 0x2568c thru 0x33f2c (0xe8a1 bytes) ***RRRR*** Correct content saved for comparison (maybe hexdump "mnt.1" vs "mnt.1.fsxgood") =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 110 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== [proxy:0:1@vpm152] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed [proxy:0:1@vpm152] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1@vpm152] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event [proxy:0:2@vpm067] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed [proxy:0:2@vpm067] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:2@vpm067] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
This test completes when run on a non-ceph device:
mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /tmp/foobar
Similar problems can be seen if one uses ceph-fuse instead of kclient.
#2 Updated by Warren Usui over 8 years ago
- Source changed from other to Q/A
Running teuthology using the following yaml demonstrates this problem:
interactive-on-error: true roles: - [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2] - [client.2] - [client.1] - [client.0] overrides: ceph: conf: mds: debug ms: 1 debug mds: 20 client: debug ms: 1 debug client: 20 overrides: ceph: fs: xfs conf: osd: osd sloppy crc: true osd op thread timeout: 60 # make sure we get the same MPI version on all hosts os_type: ubuntu os_version: "14.04" tasks: - chef: null - install: - ceph: - kclient: - pexec: clients: - cd $TESTDIR - wget http://ceph.com/qa/fsx-mpi.c - mpicc fsx-mpi.c -o fsx-mpi - rm fsx-mpi.c - ln -s $TESTDIR/mnt.* $TESTDIR/gmnt - ssh_keys: - mpi: exec: sudo $TESTDIR/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 $TESTDIR/gmnt workdir: $TESTDIR/gmnt - pexec: all: - rm -rf $TESTDIR/gmnt - rm -rf $TESTDIR/fsx-mpi
At the point where the teuthology-task goes interactive, ssh to one of the
clients. df gmnt shows:
10.214.138.154:6789,10.214.138.154:6791,10.214.138.154:6790:/ 628838400 413696 628424704 1% /home/ubuntu/cephtest/mnt.1
running the following command (from the cephtest directory) shows the corresponding following failure:
COMMAND:
mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /home/ubuntu/cephtest/gmnt
FAILURE:
Warning: Permanently added '10.214.138.156' (ECDSA) to the list of known hosts. Warning: Permanently added '10.214.138.142' (ECDSA) to the list of known hosts. skipping zero size read truncating to largest ever: 0x7cccb READ BAD DATA: offset = 0x2568c, size = 0xe8a1 OFFSET GOOD BAD RANGE 0x2568c 0x068f 0x0000 0x e838 operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops LOG DUMP (8 total operations): 1(1 mod 256): SKIPPED (no operation) 2(2 mod 256): WRITE 0x1c748 thru 0xa6f7b (0x8a834 bytes) HOLE ***WWWW 3(3 mod 256): WRITE 0x7ea33 thru 0x7fc14 (0x11e2 bytes) 4(4 mod 256): READ 0x21437 thru 0x42e18 (0x219e2 bytes) ***RRRR*** 5(5 mod 256): MAPREAD 0x6f92f thru 0x77b31 (0x8203 bytes) 6(6 mod 256): WRITE 0x1860 thru 0x8dc6f (0x8c410 bytes) ***WWWW 7(7 mod 256): TRUNCATE DOWN from 0xa6f7c to 0x7cccb 8(8 mod 256): READ 0x2568c thru 0x33f2c (0xe8a1 bytes) ***RRRR*** Correct content saved for comparison (maybe hexdump "mnt.1" vs "mnt.1.fsxgood") =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 110 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== [proxy:0:1@vpm152] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed [proxy:0:1@vpm152] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1@vpm152] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event [proxy:0:2@vpm067] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed [proxy:0:2@vpm067] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:2@vpm067] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event <pre> This test completes when run on a non-ceph device: <pre> mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /tmp/foobar </pre> Similar problems can be seen if one uses ceph-fuse instead of kclient.
#3 Updated by Warren Usui over 8 years ago
- Parent task set to #1398
#4 Updated by John Spray over 8 years ago
Is this a new failure on the existing cluster, or something that's come up on the new cluster?
#5 Updated by Greg Farnum over 8 years ago
If I'm understanding my quick skim correctly, this is not "MPI tests are failing" but "this mpi-fsx" test is failing, right? The suites/fs/multiclient/tasks/mdtest.yaml fragment is still being executed regularly and seems to be doing fine, whereas the suites/fs/multiclient/tasks/fsx-mpi.yaml.disabled fragment that I think you're trying to get going here was disabled a long time ago (and maybe it really is turning up a bug in CephFS).
#6 Updated by Zheng Yan over 8 years ago
please remove the sudo before /home/ubuntu/cephtest/fsx-mpi. Otherwise rank of all processes will be zero
#7 Updated by Zheng Yan over 8 years ago
--- fsx-mpi.c.old 2015-01-13 11:57:51.703656062 +0800 +++ fsx-mpi.c 2015-01-13 11:58:51.521022162 +0800 @@ -1212,6 +1212,9 @@ } else { + for (i = 0; i < maxfilelen; i++) + random(); + original_buf = (char *) malloc(maxfilelen); good_buf = (char *) malloc(maxfilelen); memset(good_buf, '\0', maxfilelen);
the test passes after applying above patch and adding "-W -R" options to fsx-mpi, memory mapped IOs are disabled because cephfs don't synchronize memory mapped data among different hosts.
#8 Updated by Zack Cerza over 7 years ago
- Target version deleted (
sprint30)
#9 Updated by Greg Farnum about 7 years ago
- Category set to Correctness/Safety
#10 Updated by Ivan Guan over 6 years ago
Zheng Yan wrote:
[...]
the test passes after applying above patch and adding "-W -R" options to fsx-mpi, memory mapped IOs are disabled because cephfs don't synchronize memory mapped data among different hosts.
I appled above patch and add "-W -R" options,but it still report failures below.Could you give some suggestions to me? thanks ...
My enviroment:
centos7
ceph jewl version
mpi:
[root@xt2 cephtest]# rpm -qa |grep mpi
mpich-3.2-devel-3.2-2.el7.x86_64
mpich-3.2-3.2-2.el7.x86_64
mapped writes DISABLED
mapped reads DISABLED
mapped writes DISABLED
mapped reads DISABLED
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
truncating to largest ever: 0x8c4d6
truncating to largest ever: 0xdfb00
READ BAD DATA: offset = 0xaa620, size = 0x559e0
OFFSET GOOD BAD RANGE
0xaa620 0x1921 0x1820 0x559e0
operation# (mod 256) for the bad data may be 24
LOG DUMP (31 total operations):
1(1 mod 256): SKIPPED (no operation)
2(2 mod 256): SKIPPED (no operation)
3(3 mod 256): SKIPPED (no operation)
4(4 mod 256): SKIPPED (no operation)
5(5 mod 256): SKIPPED (no operation)
6(6 mod 256): TRUNCATE UP from 0x0 to 0x8c4d6
7(7 mod 256): READ 0x4c995 thru 0x8c4d5 (0x3fb41 bytes)
8(8 mod 256): TRUNCATE DOWN from 0x8c4d6 to 0xed51
9(9 mod 256): WRITE 0x5e672 thru 0xd9905 (0x7b294 bytes) HOLE
10(10 mod 256): READ 0x5e672 thru 0xd9905 (0x7b294 bytes)
11(11 mod 256): READ 0x8c27f thru 0xd9905 (0x4d687 bytes)
12(12 mod 256): READ 0xa3dae thru 0xd9905 (0x35b58 bytes)
13(13 mod 256): READ 0xcdb26 thru 0xd9905 (0xbde0 bytes)
14(14 mod 256): READ 0x63979 thru 0xd9905 (0x75f8d bytes)
15(15 mod 256): READ 0x66b3c thru 0xd9905 (0x72dca bytes)
16(16 mod 256): READ 0x33679 thru 0xd9905 (0xa628d bytes)
17(17 mod 256): TRUNCATE UP from 0xd9906 to 0xdfb00
18(18 mod 256): WRITE 0xdbf9 thru 0x32548 (0x24950 bytes)
19(19 mod 256): READ 0xdbf9 thru 0x32548 (0x24950 bytes)
20(20 mod 256): WRITE 0xb3fc0 thru 0xfffff (0x4c040 bytes) EXTEND WWWW
21(21 mod 256): READ 0xb3fc0 thru 0xfffff (0x4c040 bytes) **RRRR
22(22 mod 256): TRUNCATE DOWN from 0x100000 to 0x71fec **WWWW
23(23 mod 256): TRUNCATE DOWN from 0x71fec to 0x5c60a
24(24 mod 256): TRUNCATE UP from 0x5c60a to 0xb4cb3
25(25 mod 256): READ 0x39cd1 thru 0x97911 (0x5dc41 bytes)
26(26 mod 256): WRITE 0x21b75 thru 0xedde7 (0xcc273 bytes) EXTEND
27(27 mod 256): READ 0x21b75 thru 0xedde7 (0xcc273 bytes)
28(28 mod 256): WRITE 0x7fc0c thru 0xfffff (0x803f4 bytes) EXTEND **WWWW
29(29 mod 256): READ 0x7fc0c thru 0xfffff (0x803f4 bytes) **RRRR
30(30 mod 256): WRITE 0xaa620 thru 0xfffff (0x559e0 bytes) WWWW
31(31 mod 256): READ 0xaa620 thru 0xfffff (0x559e0 bytes) **RRRR*
Correct content saved for comparison
(maybe hexdump "/mnt/cephtest/gmnt/test" vs "/mnt/cephtest/gmnt/test.fsxgood")
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(425).....................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(332)................: Failure during collective
MPIR_Barrier_impl(327)................:
MPIR_Barrier(292).....................:
MPIR_Barrier_intra(169)...............:
MPIDU_Complete_posted_with_error(1137): Process failed