Actions
Bug #20249
closedceph-mgr crashes in cephtool-test-mon.sh constantly on jenkins
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2017-06-10 17:34:06.081731 7fddbf7fe700 0 -- 127.0.0.1:0/3150684206 >> 127.0.0.1:6846/3538 conn(0x7fdda800a670 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node! /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:2462: main: test_tiering_1 /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:318: test_tiering_1: ceph osd pool create slow 2 2017-06-10 17:34:06.321108 7effe8a74700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:06.339587 7effe8a74700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:06.344522 7effe1a9f700 0 -- 127.0.0.1:0/4054448205 >> 127.0.0.1:6846/3538 conn(0x7effbc009d80 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node! 2017-06-10 17:34:06.545991 7effe1a9f700 0 -- 127.0.0.1:0/4054448205 >> 127.0.0.1:6846/3538 conn(0x7effbc009d80 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node! pool 'slow' created ... /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:301: flush_pg_stats: for osd in '$ids' //home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:303: flush_pg_stats: ceph tell osd.3 flush_pg_stats 2017-06-10 17:34:13.026878 7f1376708700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:13.064418 7f1376708700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:13.101045 7f1366ffd700 0 -- 127.0.0.1:0/2329738269 >> 127.0.0.1:6846/3538 conn(0x7f135400c2c0 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/13057 not 127.0.0.1:6846/3538 - wrong node! /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:303: flush_pg_stats: seq=111669149707 /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:304: flush_pg_stats: seqs=' 0-17179869196 1-38654705677 2-77309411343 3-111669149707' .. //home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:311: flush_pg_stats: ceph osd last-stat-seq 0 2017-06-10 17:34:14.612494 7f1042bbc700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:14.630788 7f1042bbc700 -1 WARNING: all dangerous and experimental features are enabled. 2017-06-10 17:34:14.636482 7f1032ffd700 0 -- 127.0.0.1:0/3310316884 >> 127.0.0.1:6846/3538 conn(0x7f101800ccc0 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/13057 not 127.0.0.1:6846/3538 - wrong node! /home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:311: flush_pg_stats: test 17179869193 -lt 17179869196
Files
Updated by Kefu Chai almost 7 years ago
the failed test was test_tiering_1. and it timed out when waiting for OSD.0 to return a seq number greater than 17179869196
.
Updated by Kefu Chai almost 7 years ago
- File consoleText.gz consoleText.gz added
Updated by Kefu Chai almost 7 years ago
- Subject changed from cephtool-test-mon.sh constantly timesout on jenkins to cephtool-test-mon.sh constantly times out on jenkins
Updated by Chang Liu almost 7 years ago
sorry. `ceph tell osd.3 flush_pg_stats` could get response in jenkins. That i pasted above is not relevant.
Updated by Kefu Chai almost 7 years ago
https://github.com/ceph/ceph/pull/15620 is posted for more log on timeout.
Updated by Kefu Chai almost 7 years ago
ceph version 12.0.3-1446-gae45b81 (ae45b8155978e8905d7afc79d111fbea83a420ce) luminous (dev) 1: (()+0x3a7e62) [0x560efa9a4e62] 2: (()+0x11390) [0x7f2718bc2390] 3: (gsignal()+0x38) [0x7f2717b53428] 4: (abort()+0x16a) [0x7f2717b5502a] 5: (()+0x298b79) [0x560efa895b79] 6: (ceph::Formatter::dump_format_unquoted(char const*, char const*, ...)+0x9a) [0x560efab4479a] 7: (PGMapDigest::dump_object_stat_sum(TextTable&, ceph::Formatter*, object_stat_sum_t const&, unsigned long, float, bool, pg_pool_t const*)+0x10e) [0x560efa82a6ee] 8: (PGMapDigest::dump_pool_stats_full(OSDMap const&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, ceph::Formatter*, bool) const+0x70e) [0x560efa830bae] 9: (PyModules::get_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x14e2) [0x560efa8922e2] 10: (()+0x29a47b) [0x560efa89747b] 11: (PyEval_EvalFrameEx()+0x8a51) [0x7f2719cab751] 12: (PyEval_EvalFrameEx()+0x7124) [0x7f2719ca9e24] 13: (PyEval_EvalFrameEx()+0x7124) [0x7f2719ca9e24] 14: (PyEval_EvalCodeEx()+0x85c) [0x7f2719dd401c] 15: (()+0x13e2e0) [0x7f2719d2a2e0] 16: (PyObject_Call()+0x43) [0x7f2719cfd1e3] 17: (()+0x18531c) [0x7f2719d7131c] 18: (PyObject_Call()+0x43) [0x7f2719cfd1e3] 19: (PyObject_CallMethod()+0xf4) [0x7f2719cfe3b4] 20: (MgrPyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x217) [0x560efa89c557] 21: (FunctionContext::finish(int)+0x2a) [0x560efa87cd1a] 22: (Context::complete(int)+0x9) [0x560efa879839] 23: (Finisher::finisher_thread_entry()+0x460) [0x560efa9e4180] 24: (()+0x76ba) [0x7f2718bb86ba] 25: (clone()+0x6d) [0x7f2717c2482d]
the full log is at teuthology:/home/kchai/20249/consoleText.
Updated by Kefu Chai almost 7 years ago
- Subject changed from cephtool-test-mon.sh constantly times out on jenkins to ceph-mgr crashes in cephtool-test-mon.sh constantly on jenkins
Updated by Kefu Chai almost 7 years ago
anyone interested in this ticket, please feel to grab it. i will continue working on it tomorrow if it's not RCA'ed / fixed by then.
Updated by Chang Liu almost 7 years ago
Updated by Kefu Chai almost 7 years ago
- Status changed from New to Resolved
- Assignee set to Sage Weil
Updated by Nathan Cutler almost 7 years ago
- Has duplicate Bug #20245: PyFormatter aborts dumping PGMapDigest (missing dump_format_va) added
Actions