Project

General

Profile

Bug #9177

ceph-fuse: failing MPI mdtest runs

Added by Greg Farnum almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
08/20/2014
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

2014-08-19T05:06:15.018 INFO:teuthology.orchestra.run.burnupi08.stdout:mdtest-1.8.3 was launched with 3 total task(s) on 3 nodes
2014-08-19T05:06:15.018 INFO:teuthology.orchestra.run.burnupi08.stdout:Command line used: /home/ubuntu/cephtest/mdtest-1.8.4/mdtest -d /home/ubuntu/cephtest/gmnt -I 20 -z 5 -b 2 -R
2014-08-19T05:06:15.019 INFO:teuthology.orchestra.run.burnupi08.stderr:*** buffer overflow detected ***: /home/ubuntu/cephtest/mdtest-1.8.4/mdtest terminated
2014-08-19T05:06:15.019 INFO:teuthology.orchestra.run.burnupi08.stderr:======= Backtrace: =========
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/lib/x86_64-linux-gnu/libc.so.6(+0x741df)[0x7f92d81ad1df]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f92d8244bac]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/lib/x86_64-linux-gnu/libc.so.6(+0x10aa70)[0x7f92d8243a70]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/lib/x86_64-linux-gnu/libc.so.6(+0x10b004)[0x7f92d8244004]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/home/ubuntu/cephtest/mdtest-1.8.4/mdtest[0x407306]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/home/ubuntu/cephtest/mdtest-1.8.4/mdtest[0x407584]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/home/ubuntu/cephtest/mdtest-1.8.4/mdtest[0x402bb3]
2014-08-19T05:06:15.020 INFO:teuthology.orchestra.run.burnupi08.stderr:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f92d815aec5]
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:/home/ubuntu/cephtest/mdtest-1.8.4/mdtest[0x402e06]
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:======= Memory map: ========
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:00400000-0040a000 r-xp 00000000 08:01 22152126                           /home/ubuntu/cephtest/mdtest-1.8.4/mdtest
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:00609000-0060a000 r--p 00009000 08:01 22152126                           /home/ubuntu/cephtest/mdtest-1.8.4/mdtest
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:0060a000-0060b000 rw-p 0000a000 08:01 22152126                           /home/ubuntu/cephtest/mdtest-1.8.4/mdtest
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:0060b000-0060f000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:01e67000-01e88000 rw-p 00000000 00:00 0                                  [heap]
2014-08-19T05:06:15.021 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7306000-7f92d7707000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7707000-7f92d770a000 r-xp 00000000 08:01 57147596                   /lib/x86_64-linux-gnu/libdl-2.19.so
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d770a000-7f92d7909000 ---p 00003000 08:01 57147596                   /lib/x86_64-linux-gnu/libdl-2.19.so
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7909000-7f92d790a000 r--p 00002000 08:01 57147596                   /lib/x86_64-linux-gnu/libdl-2.19.so
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d790a000-7f92d790b000 rw-p 00003000 08:01 57147596                   /lib/x86_64-linux-gnu/libdl-2.19.so
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d790b000-7f92d7921000 r-xp 00000000 08:01 57147455                   /lib/x86_64-linux-gnu/libgcc_s.so.1
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7921000-7f92d7b20000 ---p 00016000 08:01 57147455                   /lib/x86_64-linux-gnu/libgcc_s.so.1
2014-08-19T05:06:15.022 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7b20000-7f92d7b21000 rw-p 00015000 08:01 57147455                   /lib/x86_64-linux-gnu/libgcc_s.so.1
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7b21000-7f92d7b29000 r-xp 00000000 08:01 47199200                   /usr/lib/libcr.so.0.5.5
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7b29000-7f92d7d28000 ---p 00008000 08:01 47199200                   /usr/lib/libcr.so.0.5.5
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7d28000-7f92d7d29000 r--p 00007000 08:01 47199200                   /usr/lib/libcr.so.0.5.5
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7d29000-7f92d7d2a000 rw-p 00008000 08:01 47199200                   /usr/lib/libcr.so.0.5.5
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7d2a000-7f92d7d2b000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7d2b000-7f92d7d32000 r-xp 00000000 08:01 57147603                   /lib/x86_64-linux-gnu/librt-2.19.so
2014-08-19T05:06:15.023 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7d32000-7f92d7f31000 ---p 00007000 08:01 57147603                   /lib/x86_64-linux-gnu/librt-2.19.so
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7f31000-7f92d7f32000 r--p 00006000 08:01 57147603                   /lib/x86_64-linux-gnu/librt-2.19.so
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7f32000-7f92d7f33000 rw-p 00007000 08:01 57147603                   /lib/x86_64-linux-gnu/librt-2.19.so
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7f33000-7f92d7f38000 r-xp 00000000 08:01 47199206                   /usr/lib/x86_64-linux-gnu/libmpl.so.1.0.0
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d7f38000-7f92d8137000 ---p 00005000 08:01 47199206                   /usr/lib/x86_64-linux-gnu/libmpl.so.1.0.0
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8137000-7f92d8138000 r--p 00004000 08:01 47199206                   /usr/lib/x86_64-linux-gnu/libmpl.so.1.0.0
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8138000-7f92d8139000 rw-p 00005000 08:01 47199206                   /usr/lib/x86_64-linux-gnu/libmpl.so.1.0.0
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8139000-7f92d82f5000 r-xp 00000000 08:01 57147610                   /lib/x86_64-linux-gnu/libc-2.19.so
2014-08-19T05:06:15.024 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d82f5000-7f92d84f4000 ---p 001bc000 08:01 57147610                   /lib/x86_64-linux-gnu/libc-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d84f4000-7f92d84f8000 r--p 001bb000 08:01 57147610                   /lib/x86_64-linux-gnu/libc-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d84f8000-7f92d84fa000 rw-p 001bf000 08:01 57147610                   /lib/x86_64-linux-gnu/libc-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d84fa000-7f92d84ff000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d84ff000-7f92d8518000 r-xp 00000000 08:01 57147612                   /lib/x86_64-linux-gnu/libpthread-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8518000-7f92d8717000 ---p 00019000 08:01 57147612                   /lib/x86_64-linux-gnu/libpthread-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8717000-7f92d8718000 r--p 00018000 08:01 57147612                   /lib/x86_64-linux-gnu/libpthread-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8718000-7f92d8719000 rw-p 00019000 08:01 57147612                   /lib/x86_64-linux-gnu/libpthread-2.19.so
2014-08-19T05:06:15.025 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8719000-7f92d871d000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d871d000-7f92d892f000 r-xp 00000000 08:01 47199209                   /usr/lib/x86_64-linux-gnu/libmpich.so.10.0.4
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d892f000-7f92d8b2e000 ---p 00212000 08:01 47199209                   /usr/lib/x86_64-linux-gnu/libmpich.so.10.0.4
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8b2e000-7f92d8b3b000 r--p 00211000 08:01 47199209                   /usr/lib/x86_64-linux-gnu/libmpich.so.10.0.4
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8b3b000-7f92d8b3e000 rw-p 0021e000 08:01 47199209                   /usr/lib/x86_64-linux-gnu/libmpich.so.10.0.4
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8b3e000-7f92d8b77000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8b77000-7f92d8c7c000 r-xp 00000000 08:01 57147604                   /lib/x86_64-linux-gnu/libm-2.19.so
2014-08-19T05:06:15.026 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8c7c000-7f92d8e7b000 ---p 00105000 08:01 57147604                   /lib/x86_64-linux-gnu/libm-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8e7b000-7f92d8e7c000 r--p 00104000 08:01 57147604                   /lib/x86_64-linux-gnu/libm-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8e7c000-7f92d8e7d000 rw-p 00105000 08:01 57147604                   /lib/x86_64-linux-gnu/libm-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d8e7d000-7f92d8ea0000 r-xp 00000000 08:01 57147595                   /lib/x86_64-linux-gnu/ld-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d908a000-7f92d9090000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d909c000-7f92d909f000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d909f000-7f92d90a0000 r--p 00022000 08:01 57147595                   /lib/x86_64-linux-gnu/ld-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d90a0000-7f92d90a1000 rw-p 00023000 08:01 57147595                   /lib/x86_64-linux-gnu/ld-2.19.so
2014-08-19T05:06:15.027 INFO:teuthology.orchestra.run.burnupi08.stderr:7f92d90a1000-7f92d90a2000 rw-p 00000000 00:00 0
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:7fff8c9c9000-7fff8c9ea000 rw-p 00000000 00:00 0                          [stack]
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:7fff8c9fd000-7fff8c9fe000 r-xp 00000000 00:00 0                          [vdso]
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:7fff8c9fe000-7fff8ca00000 r--p 00000000 00:00 0                          [vvar]
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:[proxy:0:1@burnupi61] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:[proxy:0:1@burnupi61] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:[proxy:0:1@burnupi61] [proxy:0:2@burnupi29] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
2014-08-19T05:06:15.028 INFO:teuthology.orchestra.run.burnupi08.stderr:[proxy:0:2@burnupi29] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stderr:[proxy:0:2@burnupi29] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stderr:main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stdout:
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stdout:===================================================================================
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stdout:=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stdout:=   EXIT CODE: 6
2014-08-19T05:06:15.029 INFO:teuthology.orchestra.run.burnupi08.stdout:=   CLEANING UP REMAINING PROCESSES
2014-08-19T05:06:15.030 INFO:teuthology.orchestra.run.burnupi08.stdout:=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
2014-08-19T05:06:15.030 INFO:teuthology.orchestra.run.burnupi08.stdout:===================================================================================
2014-08-19T05:06:15.030 INFO:teuthology.orchestra.run.burnupi08.stderr:[mpiexec@burnupi08] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
2014-08-19T05:06:15.031 INFO:teuthology.orchestra.run.burnupi08.stderr:[mpiexec@burnupi08] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
2014-08-19T05:06:15.031 INFO:teuthology.orchestra.run.burnupi08.stderr:[mpiexec@burnupi08] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
2014-08-19T05:06:15.031 INFO:teuthology.orchestra.run.burnupi08.stderr:[mpiexec@burnupi08] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
2014-08-19T05:06:15.031 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 51, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 39, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/mpi.py", line 136, in task
    master_remote.run(args=args, )
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 114, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 401, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 102, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on burnupi08 with status 255: 'mpiexec -f /home/ubuntu/cephtest/mpi-hosts /home/ubuntu/cephtest/mdtest-1.8.4/mdtest -d /home/ubuntu/cephtest/gmnt -I 20 -z 5 -b 2 -R'

http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-18_23:04:01-fs-master-testing-basic-multi/434755/
http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-18_23:04:01-fs-master-testing-basic-multi/434753/
http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-19_19:06:01-fs-dumpling-testing-basic-multi/436313/

That this is happening on both dumpling and master branches makes me think perhaps it's a new kind of infrastructure issue, but it could also be a bad backport or something...

Associated revisions

Revision bda325b3 (diff)
Added by John Spray almost 5 years ago

suites/fs: update to latest mdtest

They appear to have (accidentally?) fixed whatever
was crashing.

Fixes: #9177

Signed-off-by: John Spray <>

Revision b07abf5b (diff)
Added by John Spray almost 5 years ago

suites/fs: update to latest mdtest

They appear to have (accidentally?) fixed whatever
was crashing.

Fixes: #9177

Signed-off-by: John Spray <>
(cherry picked from commit bda325b33b036ac110bbc65f62019aa76d05ae3b)

History

#1 Updated by John Spray almost 5 years ago

mdtest has a getcwd call into an unzeroed buffer that it doesn't check the error of. If fuse is failing the getcwd for any reason (permissions?) then that could make mdtest crash. Doesn't explain why it would spontaneously start though.

#2 Updated by Greg Farnum almost 5 years ago

How did you track it down to getcwd? If that is the issue there are a bunch of avenues of attack here, and we should check how long it's been since we actually saw this test pass (I dismissed many failures, but I think there have been days when everything including this passed...).

#4 Updated by John Spray almost 5 years ago

The compiler is spitting out a warning about getcwd -- no evidence that that's what it's actually hitting in this instance.

#6 Updated by Greg Farnum almost 5 years ago

/teuthology-2014-09-08_19:06:01-fs-dumpling-testing-basic-multi/473897/
/teuthology-2014-09-08_19:06:01-fs-dumpling-testing-basic-multi/473899
teuthology-2014-09-08_23:04:01-fs-master-testing-basic-multi/474438/

#7 Updated by John Spray almost 5 years ago

  • Status changed from Verified to Testing

#9 Updated by Greg Farnum almost 5 years ago

  • Status changed from Testing to Resolved
  • Assignee set to John Spray

John fixed this by updating mdtest in ceph-qa-suite as of commit:b1365a80982dba4160e861c28d887b066ca451b6.

Also available in: Atom PDF