Project

General

Profile

Actions

Bug #20249

closed

ceph-mgr crashes in cephtool-test-mon.sh constantly on jenkins

Added by Kefu Chai almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-06-10 17:34:06.081731 7fddbf7fe700  0 -- 127.0.0.1:0/3150684206 >> 127.0.0.1:6846/3538 conn(0x7fdda800a670 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node!
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:2462: main:  test_tiering_1
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:318: test_tiering_1:  ceph osd pool create slow 2
2017-06-10 17:34:06.321108 7effe8a74700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:06.339587 7effe8a74700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:06.344522 7effe1a9f700  0 -- 127.0.0.1:0/4054448205 >> 127.0.0.1:6846/3538 conn(0x7effbc009d80 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node!
2017-06-10 17:34:06.545991 7effe1a9f700  0 -- 127.0.0.1:0/4054448205 >> 127.0.0.1:6846/3538 conn(0x7effbc009d80 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/2690030846 not 127.0.0.1:6846/3538 - wrong node!
pool 'slow' created
...
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:301: flush_pg_stats:  for osd in '$ids'
//home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:303: flush_pg_stats:  ceph tell osd.3 flush_pg_stats
2017-06-10 17:34:13.026878 7f1376708700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:13.064418 7f1376708700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:13.101045 7f1366ffd700  0 -- 127.0.0.1:0/2329738269 >> 127.0.0.1:6846/3538 conn(0x7f135400c2c0 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/13057 not 127.0.0.1:6846/3538 - wrong node!
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:303: flush_pg_stats:  seq=111669149707
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:304: flush_pg_stats:  seqs=' 0-17179869196 1-38654705677 2-77309411343 3-111669149707'
..
//home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:311: flush_pg_stats:  ceph osd last-stat-seq 0
2017-06-10 17:34:14.612494 7f1042bbc700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:14.630788 7f1042bbc700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-10 17:34:14.636482 7f1032ffd700  0 -- 127.0.0.1:0/3310316884 >> 127.0.0.1:6846/3538 conn(0x7f101800ccc0 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be 127.0.0.1:6846/13057 not 127.0.0.1:6846/3538 - wrong node!
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:311: flush_pg_stats:  test 17179869193 -lt 17179869196

see https://jenkins.ceph.com/job/ceph-pull-requests/25786/consoleFull#-14132878204b0cfa0c-a892-49e0-a359-82544e1a192e


Files

consoleText.gz (130 KB) consoleText.gz "make check" log file Kefu Chai, 06/12/2017 02:43 AM

Related issues 1 (0 open1 closed)

Has duplicate mgr - Bug #20245: PyFormatter aborts dumping PGMapDigest (missing dump_format_va)Duplicate06/11/2017

Actions
Actions #1

Updated by Kefu Chai almost 7 years ago

the failed test was test_tiering_1. and it timed out when waiting for OSD.0 to return a seq number greater than 17179869196 .

Actions #3

Updated by Kefu Chai almost 7 years ago

  • Subject changed from cephtool-test-mon.sh constantly timesout on jenkins to cephtool-test-mon.sh constantly times out on jenkins
Actions #4

Updated by Kefu Chai almost 7 years ago

  • Description updated (diff)
Actions #5

Updated by Chang Liu almost 7 years ago

sorry. `ceph tell osd.3 flush_pg_stats` could get response in jenkins. That i pasted above is not relevant.

Actions #6

Updated by Kefu Chai almost 7 years ago

https://github.com/ceph/ceph/pull/15620 is posted for more log on timeout.

Actions #7

Updated by Kefu Chai almost 7 years ago

 ceph version 12.0.3-1446-gae45b81 (ae45b8155978e8905d7afc79d111fbea83a420ce) luminous (dev)
 1: (()+0x3a7e62) [0x560efa9a4e62]
 2: (()+0x11390) [0x7f2718bc2390]
 3: (gsignal()+0x38) [0x7f2717b53428]
 4: (abort()+0x16a) [0x7f2717b5502a]
 5: (()+0x298b79) [0x560efa895b79]
 6: (ceph::Formatter::dump_format_unquoted(char const*, char const*, ...)+0x9a) [0x560efab4479a]
 7: (PGMapDigest::dump_object_stat_sum(TextTable&, ceph::Formatter*, object_stat_sum_t const&, unsigned long, float, bool, pg_pool_t const*)+0x10e) [0x560efa82a6ee]
 8: (PGMapDigest::dump_pool_stats_full(OSDMap const&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, ceph::Formatter*, bool) const+0x70e) [0x560efa830bae]
 9: (PyModules::get_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x14e2) [0x560efa8922e2]
 10: (()+0x29a47b) [0x560efa89747b]
 11: (PyEval_EvalFrameEx()+0x8a51) [0x7f2719cab751]
 12: (PyEval_EvalFrameEx()+0x7124) [0x7f2719ca9e24]
 13: (PyEval_EvalFrameEx()+0x7124) [0x7f2719ca9e24]
 14: (PyEval_EvalCodeEx()+0x85c) [0x7f2719dd401c]
 15: (()+0x13e2e0) [0x7f2719d2a2e0]
 16: (PyObject_Call()+0x43) [0x7f2719cfd1e3]
 17: (()+0x18531c) [0x7f2719d7131c]
 18: (PyObject_Call()+0x43) [0x7f2719cfd1e3]
 19: (PyObject_CallMethod()+0xf4) [0x7f2719cfe3b4]
 20: (MgrPyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x217) [0x560efa89c557]
 21: (FunctionContext::finish(int)+0x2a) [0x560efa87cd1a]
 22: (Context::complete(int)+0x9) [0x560efa879839]
 23: (Finisher::finisher_thread_entry()+0x460) [0x560efa9e4180]
 24: (()+0x76ba) [0x7f2718bb86ba]
 25: (clone()+0x6d) [0x7f2717c2482d]

the full log is at teuthology:/home/kchai/20249/consoleText.

Actions #8

Updated by Kefu Chai almost 7 years ago

  • Subject changed from cephtool-test-mon.sh constantly times out on jenkins to ceph-mgr crashes in cephtool-test-mon.sh constantly on jenkins
Actions #9

Updated by Kefu Chai almost 7 years ago

anyone interested in this ticket, please feel to grab it. i will continue working on it tomorrow if it's not RCA'ed / fixed by then.

Actions #11

Updated by Kefu Chai almost 7 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil
Actions #12

Updated by Nathan Cutler almost 7 years ago

  • Has duplicate Bug #20245: PyFormatter aborts dumping PGMapDigest (missing dump_format_va) added
Actions

Also available in: Atom PDF