Subtask #10489: Mpi tests fail on both ceph-fuse and kclient - CephFS - Ceph

Actions

Copy link

Subtask #10489

open

Feature #1398: qa: multiclient file io test

Mpi tests fail on both ceph-fuse and kclient

Added by Anonymous over 9 years ago. Updated about 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Reviewed:

Affected Versions:

Labels (FS):

Pull request ID:

Description

Mpi tests fail on ceph-fuse and kclient. I will post some tests and results that demonstrate this failure.

Actions

Copy link

Updated by Anonymous over 9 years ago

Running teuthology using the following yaml demonstrates this problem:

interactive-on-error: true
roles:
- [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2]
- [client.2]
- [client.1]
- [client.0]
overrides:
  ceph:
    conf:
      mds:
        debug ms: 1
        debug mds: 20
      client:
        debug ms: 1
        debug client: 20
overrides:
  ceph:
    fs: xfs
    conf:
      osd:
        osd sloppy crc: true
        osd op thread timeout: 60
# make sure we get the same MPI version on all hosts
os_type: ubuntu
os_version: "14.04" 
tasks:
- chef: null
- install:
- ceph:
- kclient:
- pexec:
    clients:
      - cd $TESTDIR
      - wget http://ceph.com/qa/fsx-mpi.c
      - mpicc fsx-mpi.c -o fsx-mpi
      - rm fsx-mpi.c
      - ln -s $TESTDIR/mnt.* $TESTDIR/gmnt
- ssh_keys:
- mpi:
    exec: sudo $TESTDIR/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 $TESTDIR/gmnt
    workdir: $TESTDIR/gmnt
- pexec:
    all:
      - rm -rf $TESTDIR/gmnt
      - rm -rf $TESTDIR/fsx-mpi

At the point where the teuthology-task goes interactive, ssh to one of the
clients. df gmnt shows:

10.214.138.154:6789,10.214.138.154:6791,10.214.138.154:6790:/ 628838400 413696 628424704   1% /home/ubuntu/cephtest/mnt.1

running the following command (from the cephtest directory) shows the corresponding following failure:

COMMAND:

mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /home/ubuntu/cephtest/gmnt

FAILURE:


Warning: Permanently added '10.214.138.156' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.214.138.142' (ECDSA) to the list of known hosts.
skipping zero size read
truncating to largest ever: 0x7cccb
READ BAD DATA: offset = 0x2568c, size = 0xe8a1
OFFSET    GOOD    BAD    RANGE
0x2568c    0x068f    0x0000    0x e838
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (8 total operations):
1(1 mod 256): SKIPPED (no operation)
2(2 mod 256): WRITE    0x1c748 thru 0xa6f7b    (0x8a834 bytes) HOLE    ***WWWW
3(3 mod 256): WRITE    0x7ea33 thru 0x7fc14    (0x11e2 bytes)
4(4 mod 256): READ    0x21437 thru 0x42e18    (0x219e2 bytes)    ***RRRR***
5(5 mod 256): MAPREAD    0x6f92f thru 0x77b31    (0x8203 bytes)
6(6 mod 256): WRITE    0x1860 thru 0x8dc6f    (0x8c410 bytes)    ***WWWW
7(7 mod 256): TRUNCATE DOWN    from 0xa6f7c to 0x7cccb
8(8 mod 256): READ    0x2568c thru 0x33f2c    (0xe8a1 bytes)    ***RRRR***
Correct content saved for comparison
(maybe hexdump "mnt.1" vs "mnt.1.fsxgood")

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 110
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@vpm152] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@vpm152] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@vpm152] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2@vpm067] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:2@vpm067] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@vpm067] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event

This test completes when run on a non-ceph device:

mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /tmp/foobar

Similar problems can be seen if one uses ceph-fuse instead of kclient.

Actions

Copy link

Updated by Anonymous over 9 years ago

Source changed from other to Q/A

Running teuthology using the following yaml demonstrates this problem:

interactive-on-error: true
roles:
- [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2]
- [client.2]
- [client.1]
- [client.0]
overrides:
  ceph:
    conf:
      mds:
        debug ms: 1
        debug mds: 20
      client:
        debug ms: 1
        debug client: 20
overrides:
  ceph:
    fs: xfs
    conf:
      osd:
        osd sloppy crc: true
        osd op thread timeout: 60
# make sure we get the same MPI version on all hosts
os_type: ubuntu
os_version: "14.04" 
tasks:
- chef: null
- install:
- ceph:
- kclient:
- pexec:
    clients:
      - cd $TESTDIR
      - wget http://ceph.com/qa/fsx-mpi.c
      - mpicc fsx-mpi.c -o fsx-mpi
      - rm fsx-mpi.c
      - ln -s $TESTDIR/mnt.* $TESTDIR/gmnt
- ssh_keys:
- mpi:
    exec: sudo $TESTDIR/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 $TESTDIR/gmnt
    workdir: $TESTDIR/gmnt
- pexec:
    all:
      - rm -rf $TESTDIR/gmnt
      - rm -rf $TESTDIR/fsx-mpi

At the point where the teuthology-task goes interactive, ssh to one of the
clients. df gmnt shows:

10.214.138.154:6789,10.214.138.154:6791,10.214.138.154:6790:/ 628838400 413696 628424704   1% /home/ubuntu/cephtest/mnt.1

running the following command (from the cephtest directory) shows the corresponding following failure:

COMMAND:

mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /home/ubuntu/cephtest/gmnt

FAILURE:


Warning: Permanently added '10.214.138.156' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.214.138.142' (ECDSA) to the list of known hosts.
skipping zero size read
truncating to largest ever: 0x7cccb
READ BAD DATA: offset = 0x2568c, size = 0xe8a1
OFFSET    GOOD    BAD    RANGE
0x2568c    0x068f    0x0000    0x e838
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (8 total operations):
1(1 mod 256): SKIPPED (no operation)
2(2 mod 256): WRITE    0x1c748 thru 0xa6f7b    (0x8a834 bytes) HOLE    ***WWWW
3(3 mod 256): WRITE    0x7ea33 thru 0x7fc14    (0x11e2 bytes)
4(4 mod 256): READ    0x21437 thru 0x42e18    (0x219e2 bytes)    ***RRRR***
5(5 mod 256): MAPREAD    0x6f92f thru 0x77b31    (0x8203 bytes)
6(6 mod 256): WRITE    0x1860 thru 0x8dc6f    (0x8c410 bytes)    ***WWWW
7(7 mod 256): TRUNCATE DOWN    from 0xa6f7c to 0x7cccb
8(8 mod 256): READ    0x2568c thru 0x33f2c    (0xe8a1 bytes)    ***RRRR***
Correct content saved for comparison
(maybe hexdump "mnt.1" vs "mnt.1.fsxgood")

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 110
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@vpm152] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@vpm152] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@vpm152] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2@vpm067] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:2@vpm067] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@vpm067] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
<pre>

This test completes when run on a non-ceph device:
<pre>
mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /tmp/foobar
</pre>

Similar problems can be seen if one uses ceph-fuse instead of kclient.

Actions

Copy link

Updated by Anonymous over 9 years ago

Parent task set to #1398

Actions

Copy link

Updated by John Spray over 9 years ago

Is this a new failure on the existing cluster, or something that's come up on the new cluster?

Actions

Copy link

Updated by Greg Farnum over 9 years ago

If I'm understanding my quick skim correctly, this is not "MPI tests are failing" but "this mpi-fsx" test is failing, right? The suites/fs/multiclient/tasks/mdtest.yaml fragment is still being executed regularly and seems to be doing fine, whereas the suites/fs/multiclient/tasks/fsx-mpi.yaml.disabled fragment that I think you're trying to get going here was disabled a long time ago (and maybe it really is turning up a bug in CephFS).

Actions

Copy link

Updated by Zheng Yan over 9 years ago

please remove the sudo before /home/ubuntu/cephtest/fsx-mpi. Otherwise rank of all processes will be zero

Actions

Copy link

Updated by Zheng Yan over 9 years ago

--- fsx-mpi.c.old    2015-01-13 11:57:51.703656062 +0800
+++ fsx-mpi.c    2015-01-13 11:58:51.521022162 +0800
@@ -1212,6 +1212,9 @@
    }
    else
    {
+      for (i = 0; i < maxfilelen; i++)
+         random();
+
       original_buf = (char *) malloc(maxfilelen);
       good_buf = (char *) malloc(maxfilelen);
       memset(good_buf, '\0', maxfilelen);

the test passes after applying above patch and adding "-W -R" options to fsx-mpi, memory mapped IOs are disabled because cephfs don't synchronize memory mapped data among different hosts.

Actions

Copy link

Updated by Zack Cerza about 8 years ago

Target version deleted (~~sprint30~~)

Actions

Copy link

Updated by Greg Farnum almost 8 years ago

Category set to Correctness/Safety

Actions

Copy link

#10

Updated by Ivan Guan about 7 years ago

Zheng Yan wrote:

[...]

the test passes after applying above patch and adding "-W -R" options to fsx-mpi, memory mapped IOs are disabled because cephfs don't synchronize memory mapped data among different hosts.

I appled above patch and add "-W -R" options,but it still report failures below.Could you give some suggestions to me? thanks ...
My enviroment:
centos7
ceph jewl version
mpi:
[root@xt2 cephtest]# rpm -qa |grep mpi
mpich-3.2-devel-3.2-2.el7.x86_64
mpich-3.2-3.2-2.el7.x86_64

mapped writes DISABLED
mapped reads DISABLED
mapped writes DISABLED
mapped reads DISABLED
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
truncating to largest ever: 0x8c4d6
truncating to largest ever: 0xdfb00
READ BAD DATA: offset = 0xaa620, size = 0x559e0
OFFSET GOOD BAD RANGE
0xaa620 0x1921 0x1820 0x559e0
operation# (mod 256) for the bad data may be 24
LOG DUMP (31 total operations):
1(1 mod 256): SKIPPED (no operation)
2(2 mod 256): SKIPPED (no operation)
3(3 mod 256): SKIPPED (no operation)
4(4 mod 256): SKIPPED (no operation)
5(5 mod 256): SKIPPED (no operation)
6(6 mod 256): TRUNCATE UP from 0x0 to 0x8c4d6
7(7 mod 256): READ 0x4c995 thru 0x8c4d5 (0x3fb41 bytes)
8(8 mod 256): TRUNCATE DOWN from 0x8c4d6 to 0xed51
9(9 mod 256): WRITE 0x5e672 thru 0xd9905 (0x7b294 bytes) HOLE
10(10 mod 256): READ 0x5e672 thru 0xd9905 (0x7b294 bytes)
11(11 mod 256): READ 0x8c27f thru 0xd9905 (0x4d687 bytes)
12(12 mod 256): READ 0xa3dae thru 0xd9905 (0x35b58 bytes)
13(13 mod 256): READ 0xcdb26 thru 0xd9905 (0xbde0 bytes)
14(14 mod 256): READ 0x63979 thru 0xd9905 (0x75f8d bytes)
15(15 mod 256): READ 0x66b3c thru 0xd9905 (0x72dca bytes)
16(16 mod 256): READ 0x33679 thru 0xd9905 (0xa628d bytes)
17(17 mod 256): TRUNCATE UP from 0xd9906 to 0xdfb00
18(18 mod 256): WRITE 0xdbf9 thru 0x32548 (0x24950 bytes)
19(19 mod 256): READ 0xdbf9 thru 0x32548 (0x24950 bytes)
20(20 mod 256): WRITE 0xb3fc0 thru 0xfffff (0x4c040 bytes) EXTEND WWWW
21(21 mod 256): READ 0xb3fc0 thru 0xfffff (0x4c040 bytes) **RRRR
22(22 mod 256): TRUNCATE DOWN from 0x100000 to 0x71fec **WWWW
23(23 mod 256): TRUNCATE DOWN from 0x71fec to 0x5c60a
24(24 mod 256): TRUNCATE UP from 0x5c60a to 0xb4cb3
25(25 mod 256): READ 0x39cd1 thru 0x97911 (0x5dc41 bytes)
26(26 mod 256): WRITE 0x21b75 thru 0xedde7 (0xcc273 bytes) EXTEND
27(27 mod 256): READ 0x21b75 thru 0xedde7 (0xcc273 bytes)
28(28 mod 256): WRITE 0x7fc0c thru 0xfffff (0x803f4 bytes) EXTEND **WWWW
29(29 mod 256): READ 0x7fc0c thru 0xfffff (0x803f4 bytes) **RRRR
30(30 mod 256): WRITE 0xaa620 thru 0xfffff (0x559e0 bytes) WWWW
31(31 mod 256): READ 0xaa620 thru 0xfffff (0x559e0 bytes) **RRRR*
Correct content saved for comparison
(maybe hexdump "/mnt/cephtest/gmnt/test" vs "/mnt/cephtest/gmnt/test.fsxgood")
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(425).....................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(332)................: Failure during collective
MPIR_Barrier_impl(327)................:
MPIR_Barrier(292).....................:
MPIR_Barrier_intra(169)...............:
MPIDU_Complete_posted_with_error(1137): Process failed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Subtask #10489

Mpi tests fail on both ceph-fuse and kclient

Updated by Anonymous over 9 years ago

Updated by Anonymous over 9 years ago

Updated by Anonymous over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by Zheng Yan over 9 years ago

Updated by Zheng Yan over 9 years ago

Updated by Zack Cerza about 8 years ago

Updated by Greg Farnum almost 8 years ago

Updated by Ivan Guan about 7 years ago