Actions
Bug #38705
closedmgr: segv in module thread, PyArg_ParseTuple
Added by Sage Weil about 5 years ago. Updated about 5 years ago.
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
0> 2019-03-12 15:45:08.094 7f18522b4700 -1 *** Caught signal (Segmentation fault) ** in thread 7f18522b4700 thread_name:prometheus ceph version 14.1.0-589-g96939c1 (96939c10eb6b3296161d2009da58061072d2a704) nautilus (rc) 1: (()+0x12890) [0x7f18691e1890] 2: (()+0x1cfca2) [0x7f186982dca2] 3: (()+0x1d2125) [0x7f1869830125] 4: (PyArg_ParseTuple()+0x86) [0x7f18698305d6] 5: (()+0x14e994) [0x55f689b12994] 6: (PyEval_EvalFrameEx()+0x8010) [0x7f186970d1d0] 7: (PyEval_EvalCodeEx()+0x7d8) [0x7f186983d278] 8: (PyEval_EvalFrameEx()+0x5bf6) [0x7f186970adb6] 9: (PyEval_EvalCodeEx()+0x7d8) [0x7f186983d278] 10: (()+0x1645f9) [0x7f18697c25f9] 11: (PyObject_Call()+0x43) [0x7f18696b2333] 12: (()+0x1abd1c) [0x7f1869809d1c] 13: (PyObject_Call()+0x43) [0x7f18696b2333] 14: (PyObject_CallMethod()+0xc8) [0x7f18697d6c78] 15: (PyModuleRunner::serve()+0x62) [0x55f689b957f2] 16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1cf) [0x55f689b95e9f] 17: (()+0x76db) [0x7f18691d66db] 18: (clone()+0x3f) [0x7f18683b788f]
/a/sage-2019-03-12_15:01:18-rados-wip-sage3-testing-2019-03-12-0708-distro-basic-smithi/3713081
Updated by Sage Weil about 5 years ago
lots of these failures. module varies (i've seen dashboard, prometheus so far)
Updated by Sage Weil about 5 years ago
appear to happen during standby. also, i see an ignored monmap message:
-50> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr handle_mgr_map active in map: 0 active is 6600 -49> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] Starting modules in standby mode -48> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'balancer' because it does not implement a standby mode -47> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'crash' because it does not implement a standby mode -46> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'devicehealth' because it does not implement a standby mode -45> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'orchestrator_cli' because it does not implement a standby mode -44> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'progress' because it does not implement a standby mode -43> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] starting module prometheus -42> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'restful' because it does not implement a standby mode -41> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'selftest' because it does not implement a standby mode -40> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'status' because it does not implement a standby mode -39> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr[py] skipping module 'volumes' because it does not implement a standby mode -38> 2019-03-12 15:45:08.094 7f185f87c700 20 mgr Gil Switched to new thread state 0x55f68fc0e000 -37> 2019-03-12 15:45:08.094 7f185f07b700 4 mgrc handle_mgr_map Got map version 123 -36> 2019-03-12 15:45:08.094 7f185f07b700 4 mgrc handle_mgr_map Active mgr is now [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] -35> 2019-03-12 15:45:08.094 7f185f07b700 4 mgrc reconnect Starting new session with [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] -34> 2019-03-12 15:45:08.094 7f185f07b700 1 --2- 172.21.15.201:0/14794 >> [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] conn(0x55f68fc46000 0x55f68fc4e000 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect -33> 2019-03-12 15:45:08.094 7f1863083700 1 -- 172.21.15.201:0/14794 >> [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] conn(0x55f68fc46000 msgr2=0x55f68fc4e000 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:172.21.15.201:6800/14793 -32> 2019-03-12 15:45:08.094 7f1863083700 1 --2- 172.21.15.201:0/14794 >> [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] conn(0x55f68fc46000 0x55f68fc4e000 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx=0)._fault waiting 0.200000 -31> 2019-03-12 15:45:08.094 7f185f87c700 1 mgr load Constructed class from module: prometheus -30> 2019-03-12 15:45:08.094 7f185f87c700 20 mgr ~Gil Destroying new thread state 0x55f68fc0e000 -29> 2019-03-12 15:45:08.094 7f185f87c700 4 mgr operator() Starting thread for prometheus -28> 2019-03-12 15:45:08.094 7f18522b4700 4 mgr entry Entering thread for prometheus -27> 2019-03-12 15:45:08.094 7f18522b4700 20 mgr Gil Switched to new thread state 0x55f68fc0e0b0 -26> 2019-03-12 15:45:08.094 7f185f07b700 1 -- 172.21.15.201:0/14794 --> [v2:172.21.15.201:6800/14793,v1:172.21.15.201:6801/14793] -- mgropen(unknown.z) v3 -- 0x55f68fc56000 con 0x55f68fc46000 -25> 2019-03-12 15:45:08.094 7f185f07b700 1 client.0 ms_handle_refused on v2:172.21.15.201:6800/14793 -24> 2019-03-12 15:45:08.094 7f185f07b700 1 client.0 ms_handle_refused on v2:172.21.15.201:6800/14793 -23> 2019-03-12 15:45:08.094 7f185f07b700 1 -- 172.21.15.201:0/14794 <== mon.0 v2:172.21.15.17:3300/0 4 ==== mon_map magic: 0 v1 ==== 377+0+0 (crc 0 0 0) 0x55f68b678600 con 0x55f68c204900 -22> 2019-03-12 15:45:08.094 7f185f07b700 10 monclient: handle_monmap mon_map magic: 0 v1 -21> 2019-03-12 15:45:08.094 7f185f07b700 10 monclient: got monmap 1 from mon.a (according to old e1) -20> 2019-03-12 15:45:08.094 7f185f07b700 10 monclient: dump: epoch 1 fsid a419c130-4869-44c6-a9a2-9aafcae98e38 last_changed 2019-03-12 15:38:31.242540 created 2019-03-12 15:38:31.242540 min_mon_release 14 (nautilus) 0: [v2:172.21.15.17:3300/0,v1:172.21.15.17:6789/0] mon.a 1: [v2:172.21.15.201:3300/0,v1:172.21.15.201:6789/0] mon.b 2: [v2:172.21.15.17:3301/0,v1:172.21.15.17:6790/0] mon.c -19> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr ms_dispatch standby mon_map magic: 0 v1 -18> 2019-03-12 15:45:08.094 7f185f07b700 0 ms_deliver_dispatch: unhandled message 0x55f68b678600 mon_map magic: 0 v1 from mon.0 v2:172.21.15.17:3300/0 -17> 2019-03-12 15:45:08.094 7f185f07b700 1 -- 172.21.15.201:0/14794 <== mon.0 v2:172.21.15.17:3300/0 5 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (crc 0 0 0) 0x55f68b48af40 con 0x55f68c204900 -16> 2019-03-12 15:45:08.094 7f185f07b700 10 cephx client: 0x55f68b4c2b60 handle_response ret = 0 -15> 2019-03-12 15:45:08.094 7f185f07b700 10 cephx client: get_rotating_key -14> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: dump_rotating: -13> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 1 AQCM0odcw+vbLRAA1czv1D9JvAG55JVaKaLblg== expires 2019-03-12 16:38:52.769386 -12> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 2 AQCM0odc5vrbLRAAyiUW3Qfbcdtng/rlyzJqOA== expires 2019-03-12 17:38:52.769386 -11> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 3 AQCM0odc7wTcLRAAdP17TON2ZXs/fHP2KqVuEw== expires 2019-03-12 18:38:52.769386 -10> 2019-03-12 15:45:08.094 7f185f07b700 10 monclient: _finish_auth 0 -9> 2019-03-12 15:45:08.094 7f185f07b700 10 cephx: validate_tickets want 55 have 55 need 0 -8> 2019-03-12 15:45:08.094 7f185f07b700 20 cephx client: need_tickets: want=55 have=55 need=0 -7> 2019-03-12 15:45:08.094 7f185f07b700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-03-12 15:44:38.098447) -6> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: dump_rotating: -5> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 1 AQCM0odcw+vbLRAA1czv1D9JvAG55JVaKaLblg== expires 2019-03-12 16:38:52.769386 -4> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 2 AQCM0odc5vrbLRAAyiUW3Qfbcdtng/rlyzJqOA== expires 2019-03-12 17:38:52.769386 -3> 2019-03-12 15:45:08.094 7f185f07b700 10 auth: id 3 AQCM0odc7wTcLRAAdP17TON2ZXs/fHP2KqVuEw== expires 2019-03-12 18:38:52.769386 -2> 2019-03-12 15:45:08.094 7f185f07b700 1 -- 172.21.15.201:0/14794 <== mon.0 v2:172.21.15.17:3300/0 6 ==== osd_map(19..19 src has 1..19) v4 ==== 3829+0+0 (crc 0 0 0) 0x55f68c16af00 con 0x55f68c204900 -1> 2019-03-12 15:45:08.094 7f185f07b700 4 mgr ms_dispatch standby osd_map(19..19 src has 1..19) v4 0> 2019-03-12 15:45:08.094 7f18522b4700 -1 *** Caught signal (Segmentation fault) ** in thread 7f18522b4700 thread_name:prometheus
another one,
-43> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr handle_mgr_map active in map: 0 active is 4861 -42> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] Starting modules in standby mode -41> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'balancer' because it does not implement a standby mode -40> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'crash' because it does not implement a standby mode -39> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] starting module dashboard -38> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'devicehealth' because it does not implement a standby mode -37> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'orchestrator_cli' because it does not implement a standby mode -36> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'progress' because it does not implement a standby mode -35> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'restful' because it does not implement a standby mode -34> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'status' because it does not implement a standby mode -33> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr[py] skipping module 'volumes' because it does not implement a standby mode -32> 2019-03-12 16:29:27.847 7fa2aead2700 20 mgr Gil Switched to new thread state 0x5b24000 -31> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgrc handle_mgr_map Got map version 21 -30> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgrc handle_mgr_map Active mgr is now -29> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgrc reconnect No active mgr available yet -28> 2019-03-12 16:29:27.847 7fa2ae2d1700 1 -- 172.21.15.3:0/16587 <== mon.0 v2:172.21.15.3:3300/0 4 ==== mon_map magic: 0 v1 ==== 377+0+0 (crc 0 0 0) 0x2ea4400 con 0x2e8ad80 -27> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 monclient: handle_monmap mon_map magic: 0 v1 -26> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 monclient: got monmap 1 from mon.a (according to old e1) -25> 2019-03-12 16:29:27.847 7fa2aead2700 1 mgr load Constructed class from module: dashboard -24> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 monclient: dump: epoch 1 fsid c7e84a6a-22ca-4bdc-bdaf-076e4bf9bbce last_changed 2019-03-12 16:27:13.323545 created 2019-03-12 16:27:13.323545 min_mon_release 14 (nautilus) 0: [v2:172.21.15.3:3300/0,v1:172.21.15.3:6789/0] mon.a 1: [v2:172.21.15.90:3300/0,v1:172.21.15.90:6789/0] mon.b 2: [v2:172.21.15.3:3301/0,v1:172.21.15.3:6790/0] mon.c -23> 2019-03-12 16:29:27.847 7fa2aead2700 20 mgr ~Gil Destroying new thread state 0x5b24000 -22> 2019-03-12 16:29:27.847 7fa2aead2700 4 mgr operator() Starting thread for dashboard -21> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr ms_dispatch standby mon_map magic: 0 v1 -20> 2019-03-12 16:29:27.847 7fa2ae2d1700 0 ms_deliver_dispatch: unhandled message 0x2ea4400 mon_map magic: 0 v1 from mon.0 v2:172.21.15.3:3300/0 -19> 2019-03-12 16:29:27.847 7fa2ae2d1700 1 -- 172.21.15.3:0/16587 <== mon.0 v2:172.21.15.3:3300/0 5 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (crc 0 0 0) 0x215a1c0 con 0x2e8ad80 -18> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 cephx client: 0x214cb60 handle_response ret = 0 -17> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 cephx client: get_rotating_key -16> 2019-03-12 16:29:27.847 7fa2a279a700 4 mgr entry Entering thread for dashboard -15> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: dump_rotating: -14> 2019-03-12 16:29:27.847 7fa2a279a700 20 mgr Gil Switched to new thread state 0x5b240b0 -13> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 1 AQD23YdcLp7iDhAAGCNvwQxZcHbXFyhrX1lP/Q== expires 2019-03-12 17:27:34.249727 -12> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 2 AQD23Ydc4bviDhAA3m+5Go4DxPfHndQvL6dMeQ== expires 2019-03-12 18:27:34.249727 -11> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 3 AQD23YdcQtriDhAAKTUdF5Q+sOkln5vFycVIeQ== expires 2019-03-12 19:27:34.249727 -10> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 monclient: _finish_auth 0 -9> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 cephx: validate_tickets want 55 have 55 need 0 -8> 2019-03-12 16:29:27.847 7fa2ae2d1700 20 cephx client: need_tickets: want=55 have=55 need=0 -7> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-03-12 16:28:57.851350) -6> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: dump_rotating: -5> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 1 AQD23YdcLp7iDhAAGCNvwQxZcHbXFyhrX1lP/Q== expires 2019-03-12 17:27:34.249727 -4> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 2 AQD23Ydc4bviDhAA3m+5Go4DxPfHndQvL6dMeQ== expires 2019-03-12 18:27:34.249727 -3> 2019-03-12 16:29:27.847 7fa2ae2d1700 10 auth: id 3 AQD23YdcQtriDhAAKTUdF5Q+sOkln5vFycVIeQ== expires 2019-03-12 19:27:34.249727 -2> 2019-03-12 16:29:27.847 7fa2ae2d1700 1 -- 172.21.15.3:0/16587 <== mon.0 v2:172.21.15.3:3300/0 6 ==== osd_map(29..29 src has 1..29) v4 ==== 6037+0+0 (crc 0 0 0) 0x2df0780 con 0x2e8ad80 -1> 2019-03-12 16:29:27.847 7fa2ae2d1700 4 mgr ms_dispatch standby osd_map(29..29 src has 1..29) v4 0> 2019-03-12 16:29:27.847 7fa2a279a700 -1 *** Caught signal (Segmentation fault) ** in thread 7fa2a279a700 thread_name:dashboard
Updated by Sage Weil about 5 years ago
- Status changed from 12 to Fix Under Review
Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Resolved
Actions