Project

General

Profile

Actions

Bug #21891

open

Bug #38094: mgr: crash list

ceph mgr stopping result in segfault

Added by Марк Коренберг over 6 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

root@node2:~# /usr/bin/ceph-mgr -d --cluster ceph --id node2 --setuser ceph --setgroup ceph
2017-10-23 01:01:27.287102 7f0a611a1540  0 set uid:gid to 64045:64045 (ceph:ceph)
2017-10-23 01:01:27.287118 7f0a611a1540  0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 4070
2017-10-23 01:01:27.289131 7f0a611a1540  0 pidfile_write: ignore empty --pid-file
2017-10-23 01:01:27.295677 7f0a611a1540  1 mgr send_beacon standby
2017-10-23 01:01:29.296087 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:31.296464 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:33.296791 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:35.297149 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:37.297501 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:39.297849 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:41.298163 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:43.298521 7f0a5476e700  1 mgr send_beacon standby
2017-10-23 01:01:43.385259 7f0a57774700  1 mgr handle_mgr_map Activating!
2017-10-23 01:01:43.385458 7f0a57774700  1 mgr handle_mgr_map I am now activating
2017-10-23 01:01:43.403458 7f0a5376c700  1 mgr init Loading python module 'dashboard'
2017-10-23 01:01:43.542354 7f0a5376c700  1 mgr load Constructed class from module: dashboard
2017-10-23 01:01:43.542366 7f0a5376c700  1 mgr init Loading python module 'status'
2017-10-23 01:01:43.573417 7f0a5376c700  1 mgr load Constructed class from module: status
2017-10-23 01:01:43.573441 7f0a5376c700  1 mgr start Creating threads for 2 modules
2017-10-23 01:01:43.573534 7f0a5376c700  1 mgr send_beacon active
[23/Oct/2017:01:01:43] ENGINE Bus STARTING
[23/Oct/2017:01:01:43] ENGINE Started monitor thread '_TimeoutMonitor'.
[23/Oct/2017:01:01:43] ENGINE Serving on http://:::7000
[23/Oct/2017:01:01:43] ENGINE Bus STARTED
2017-10-23 01:01:45.298878 7f0a5476e700  1 mgr send_beacon active
2017-10-23 01:01:47.299386 7f0a5476e700  1 mgr send_beacon active
2017-10-23 01:01:49.300138 7f0a5476e700  1 mgr send_beacon active
2017-10-23 01:01:51.300620 7f0a5476e700  1 mgr send_beacon active
[23/Oct/2017:01:01:52] HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond
    response.body = self.handler()
  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/lib/ceph/mgr/dashboard/module.py", line 888, in perf
    content_data=json.dumps(self._osd(osd_id), indent=2)
  File "/usr/lib/ceph/mgr/dashboard/module.py", line 870, in _osd
    assert r == 0
AssertionError

[23/Oct/2017:01:01:52] HTTP 
Request Headers:
  CONNECTION: keep-alive
  HOST: 10.80.20.100:7000
  UPGRADE-INSECURE-REQUESTS: 1
  Remote-Addr: ::ffff:10.130.0.21
  ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
  USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
  ACCEPT-LANGUAGE: ru,en-US;q=0.8,en;q=0.6
  ACCEPT-ENCODING: gzip, deflate, sdch
::ffff:10.130.0.21 - - [23/Oct/2017:01:01:52] "GET /osd/perf/6 HTTP/1.1" 500 1481 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36" 
2017-10-23 01:01:53.302373 7f0a5476e700  1 mgr send_beacon active
::ffff:10.130.0.21 - - [23/Oct/2017:01:01:53] "GET /favicon.ico HTTP/1.1" 200 1406 "http://10.80.20.100:7000/osd/perf/6" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36" 
2017-10-23 01:01:55.302842 7f0a5476e700  1 mgr send_beacon active
2017-10-23 01:01:57.304530 7f0a5476e700  1 mgr send_beacon active

^C

2017-10-23 01:01:58.210228 7f0a53f6d700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory
2017-10-23 01:01:58.210246 7f0a53f6d700 -1 received  signal: Interrupt from  PID: 0 task name: <unknown> UID: 0
2017-10-23 01:01:58.210248 7f0a53f6d700 -1 mgr handle_signal *** Got signal Interrupt ***
[23/Oct/2017:01:01:58] ENGINE Bus STOPPING

[23/Oct/2017:01:02:03] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) shut down
[23/Oct/2017:01:02:03] ENGINE Stopped thread '_TimeoutMonitor'.
[23/Oct/2017:01:02:03] ENGINE Bus STOPPED
[23/Oct/2017:01:02:03] ENGINE Bus EXITING
[23/Oct/2017:01:02:03] ENGINE Bus EXITED
[23/Oct/2017:01:02:03] ENGINE Waiting for child threads to terminate...
*** Caught signal (Segmentation fault) **
 in thread 7f0a611a1540 thread_name:ceph-mgr
 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
 1: (()+0x3b4a84) [0x55cf56507a84]
 2: (()+0x110c0) [0x7f0a5ecd60c0]
 3: (std::vector<Option, std::allocator<Option> >::~vector()+0x1b8) [0x55cf567370c8]
 4: (()+0x35910) [0x7f0a5dc91910]
 5: (()+0x3596a) [0x7f0a5dc9196a]
 6: (__libc_start_main()+0xf8) [0x7f0a5dc7c2b8]
 7: (_start()+0x2a) [0x55cf5636216a]
Segmentation fault
Actions #1

Updated by Patrick Donnelly over 6 years ago

  • Project changed from Ceph to mgr
Actions #2

Updated by Chang Liu over 6 years ago

  • Status changed from New to In Progress
  • Assignee set to Chang Liu
Actions #4

Updated by John Spray over 6 years ago

The assertion in the dashboard module is related to that PR, but the actual segfault is not.

I suspect the segfault might have something to do with the way ceph-mgr modules can load things like libcephfs and librados, which are linked with their own versions of the global options objects, leading to harmless crashes during destruction of the globals.

Actions #5

Updated by John Spray over 6 years ago

  • Status changed from In Progress to 12
Actions #6

Updated by John Spray over 6 years ago

  • Category set to ceph-mgr
Actions #7

Updated by Fabian Grünbichler over 6 years ago

just for completeness sake, here is a BT with debugging symbols (still affects 12.2.2):

Thread 2 (Thread 0x7f54b972e700 (LWP 25250)):
#0  0x00007f54d3e0f3f3 in select () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f54d6b31ed0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#2  0x00007f54d6b5a091 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#3  0x00007f54d6b58390 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#4  0x00007f54d6b58390 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#5  0x00007f54d6cc115c in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6  0x00007f54d6c155b0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7  0x00007f54d6bad543 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#8  0x00007f54d6c6acbc in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#9  0x00007f54d6bad543 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#10 0x00007f54d6cc0587 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#11 0x00007f54d6b325f2 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#12 0x00007f54d4d9e494 in start_thread (arg=0x7f54b972e700) at pthread_create.c:333
#13 0x00007f54d3e16aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 1 (Thread 0x7f54d72716c0 (LWP 25170)):
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x000055794f19d30e in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:138
#3  <signal handler called>
#4  std::__cxx11::_List_base<char const*, std::allocator<char const*> >::_M_clear (this=<optimized out>)
    at /usr/include/c++/6/bits/list.tcc:73
#5  std::__cxx11::_List_base<char const*, std::allocator<char const*> >::~_List_base (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/6/bits/stl_list.h:442
#6  std::__cxx11::list<char const*, std::allocator<char const*> >::~list (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/6/bits/stl_list.h:503
#7  Option::~Option (this=<optimized out>, __in_chrg=<optimized out>) at ./src/common/options.h:13
#8  std::_Destroy<Option> (__pointer=<optimized out>) at /usr/include/c++/6/bits/stl_construct.h:93
#9  std::_Destroy_aux<false>::__destroy<Option*> (__last=<optimized out>, __first=0x55795aa44000)
    at /usr/include/c++/6/bits/stl_construct.h:103
#10 std::_Destroy<Option*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/6/bits/stl_construct.h:126
#11 std::_Destroy<Option*, Option> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/6/bits/stl_construct.h:151
#12 std::vector<Option, std::allocator<Option> >::~vector (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/6/bits/stl_vector.h:426
#13 0x00007f54d3d63910 in __run_exit_handlers (status=0, listp=0x7f54d40c75d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
    run_dtors=run_dtors@entry=true) at exit.c:83
#14 0x00007f54d3d6396a in __GI_exit (status=<optimized out>) at exit.c:105
#15 0x00007f54d3d4e2b8 in __libc_start_main (main=0x55794efd62a0 <main(int, char const**)>, argc=10, argv=0x7ffe246a43c8,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe246a43b8) at ../csu/libc-start.c:325
#16 0x000055794efe117a in _start ()

Actions #8

Updated by Ernesto Puerta about 5 years ago

  • Parent task set to #38094
Actions #9

Updated by runsisi hust about 5 years ago

(gdb) bt
#0  0x00007f9f7d2764ab in raise () from /lib64/libpthread.so.0
#1  0x0000559a372ee816 in reraise_fatal (signum=11) at /usr/src/debug/ceph-12.2.9/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=11) at /usr/src/debug/ceph-12.2.9/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  _M_data (this=<optimized out>) at /usr/include/c++/4.8.2/bits/basic_string.h:293
#5  _M_rep (this=<optimized out>) at /usr/include/c++/4.8.2/bits/basic_string.h:301
#6  ~basic_string (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/basic_string.h:539
#7  ~_List_node (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_list.h:106
#8  destroy<std::_List_node<std::basic_string<char> > > (this=<optimized out>, __p=<optimized out>) at /usr/include/c++/4.8.2/ext/new_allocator.h:124
#9  std::_List_base<std::string, std::allocator<std::string> >::_M_clear (this=0x559a42a76110) at /usr/include/c++/4.8.2/bits/list.tcc:75
#10 0x0000559a3750aeb4 in ~_List_base (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_list.h:378
#11 ~list (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_list.h:438
#12 Option::~Option (this=0x559a42a76000, __in_chrg=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/common/options.h:13
#13 0x0000559a3750b097 in _Destroy<Option> (__pointer=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_construct.h:93
#14 __destroy<Option*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_construct.h:103
#15 _Destroy<Option*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_construct.h:126
#16 _Destroy<Option*, Option> (__last=<optimized out>, __first=0x559a42a76148) at /usr/include/c++/4.8.2/bits/stl_construct.h:151
#17 std::vector<Option, std::allocator<Option> >::~vector (this=0x559a42a76110, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_vector.h:415
#18 0x00007f9f7c4a6a69 in __run_exit_handlers () from /lib64/libc.so.6
#19 0x00007f9f7c4a6ab5 in exit () from /lib64/libc.so.6
#20 0x00007f9f7c48fc0c in __libc_start_main () from /lib64/libc.so.6
#21 0x0000559a3711a583 in _start ()

still exists in 12.2.9 :(

Actions #10

Updated by Марк Коренберг over 4 years ago

Please close. Outdated

Actions #11

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF