Project

General

Profile

Bug #17762

transient jerasure unit test failures

Added by Kefu Chai about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
10/13/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

https://github.com/ceph/ceph/pull/11426 experienced it with the following output

 79/146 Test  #86: unittest_erasure_code_plugin_jerasure ...***Exception: SegFault  0.10 sec
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ErasureCodePlugin
[ RUN      ] ErasureCodePlugin.factory
[       OK ] ErasureCodePlugin.factory (3 ms)
[----------] 1 test from ErasureCodePlugin (3 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3 ms total)
[  PASSED  ] 1 test.
2016-10-11 12:42:37.598277 7f5762be7d40 -1 did not load config file, using default settings.
load: jerasure *** Caught signal (Segmentation fault) **
 in thread 7f5762be7d40 thread_name:unittest_erasur

an attempt to reproduce it in master as of 61310d41f307ac07ff81f19ec1926434a0ced713 did not reproduce the problem, it may be required to run it in a loop

loic@63fd9120625d:~/ceph--loic/build$ ctest -R
unittest_erasure_code_plugin_jerasure
Test project /home/loic/ceph--loic/build
    Start 86: unittest_erasure_code_plugin_jerasure
1/1 Test #86: unittest_erasure_code_plugin_jerasure ...   Passed    0.01 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
loic@63fd9120625d:~/ceph--loic/build$ git rev-parse HEAD
61310d41f307ac07ff81f19ec1926434a0ced713
loic@63fd9120625d:~/ceph--loic/build$

Related issues

Copied from Ceph - Bug #17561: transient jerasure unit test failures Resolved 10/13/2016
Copied to Ceph - Backport #18193: jewel: transient jerasure unit test failures Resolved

History

#1 Updated by Kefu Chai about 1 year ago

  • Copied from Bug #17561: transient jerasure unit test failures added

#2 Updated by Samuel Just about 1 year ago

  • Priority changed from Urgent to Immediate

#3 Updated by Kefu Chai about 1 year ago

  • Status changed from Verified to Need Review

#4 Updated by Loic Dachary about 1 year ago

The reproducer ( http://tracker.ceph.com/issues/17561#note-6 ) fails on https://github.com/ceph/ceph/pull/11733/commits/93cfb3d550a70ce02208c7cb0e8c4e9cfdfdc38a with:

Thread 3 (Thread 0x7f9c0cb7bd40 (LWP 28483)):
#0  0x00007f9c0c2bf65b in pthread_join (threadid=140308148987648,
    thread_return=0x0) at pthread_join.c:92
#1  0x00007f9c0c82f700 in Thread::join (this=0x7f9c17536000,
    prval=prval@entry=0x0)
    at /home/loic/ceph-ubuntu-14.04-loic/src/common/Thread.cc:173
#2  0x00007f9c0c823477 in ceph::logging::Log::stop (this=<optimized out>)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Log.cc:453
#3  0x00007f9c0c8234c8 in ceph::logging::log_on_exit (pp=0x7f9c17506048)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Log.cc:48
#4  0x00007f9c0c823a16 in OnExitManager::~OnExitManager (
    this=0x7f9c0cba0a00 <ceph::logging::exit_callbacks>,
    __in_chrg=<optimized out>)
    at /home/loic/ceph-ubuntu-14.04-loic/src/include/on_exit.h:26
#5  0x00007f9c0a4ea1a9 in __run_exit_handlers (status=0,
    listp=0x7f9c0a86c6c8 <__exit_funcs>,
    run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#6  0x00007f9c0a4ea1f5 in __GI_exit (status=<optimized out>) at exit.c:104
#7  0x00007f9c0a4cff4c in __libc_start_main (
    main=0x7f9c0c7e91a0 <main(int, char**)>, argc=1, argv=0x7fff0ad8e108,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fff0ad8e0f8) at libc-start.c:321
#8  0x00007f9c0c7eacd3 in _start ()

Thread 2 (Thread 0x7f9c08695700 (LWP 28485)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f9c0c82b8ca in WaitUntil (when=..., mutex=..., this=0x7f9c1758c080)
    at /home/loic/ceph-ubuntu-14.04-loic/src/common/Cond.h:72
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>,
    this=0x7f9c1758c080)
    at /home/loic/ceph-ubuntu-14.04-loic/src/common/Cond.h:81
#3  CephContextServiceThread::entry (this=0x7f9c1758c000)
    at /home/loic/ceph-ubuntu-14.04-loic/src/common/ceph_context.cc:99
#4  0x00007f9c0c2be184 in start_thread (arg=0x7f9c08695700)
    at pthread_create.c:312
#5  0x00007f9c0a5a837d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f9c09612700 (LWP 28484)):
#0  0x00007f9c0c2c61fb in raise (sig=11)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00007f9c0c818b52 in reraise_fatal (signum=11)
    at /home/loic/ceph-ubuntu-14.04-loic/src/global/signal_handler.cc:72
#2  handle_fatal_signal (signum=11)
    at /home/loic/ceph-ubuntu-14.04-loic/src/global/signal_handler.cc:134
#3  <signal handler called>
#4  0x00007f9c0c82268e in hint_size (this=0x7f9c175330a0)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Entry.h:65
#5  ceph::logging::Log::_flush (this=this@entry=0x7f9c17536000,
    t=t@entry=0x7f9c09611890, requeue=requeue@entry=0x7f9c17536110,
    crash=crash@entry=false)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Log.cc:311
#6  0x00007f9c0c822a09 in ceph::logging::Log::flush (
    this=this@entry=0x7f9c17536000)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Log.cc:287
#7  0x00007f9c0c822c3e in ceph::logging::Log::entry (this=0x7f9c17536000)
    at /home/loic/ceph-ubuntu-14.04-loic/src/log/Log.cc:464
#8  0x00007f9c0c2be184 in start_thread (arg=0x7f9c09612700)
    at pthread_create.c:312
#9  0x00007f9c0a5a837d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

#5 Updated by Kefu Chai 12 months ago

  • Status changed from Need Review to Pending Backport
  • Source changed from other to Development

#7 Updated by Loic Dachary 12 months ago

  • Copied to Backport #18193: jewel: transient jerasure unit test failures added

#8 Updated by Sage Weil 11 months ago

  • Priority changed from Immediate to Urgent

#9 Updated by Nathan Cutler 7 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF