Project

General

Profile

Actions

Bug #41512

closed

"*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus in PyModuleRunner::log

Added by Yuri Weinstein over 4 years ago. Updated over 4 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/luminous-x, upgrade/mimic-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2019-08-26_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Jobs: many
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2019-08-26_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/4252328/teuthology.log

zgrep "^ ceph version" /a/teuthology-2019-08-26_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/4252328/remote/smithi050/log/ceph-mgr.x.log.gz -b20 -a30

2325623-2019-08-26 03:00:34.412 7fc1f57e9700  4 mgr.server handle_open from 0x557e84b72c00  client,x
2325716-2019-08-26 03:00:34.412 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] --> 172.21.15.50:0/1183373898 -- mgrconfigure(period=5, threshold=5) v3 -- 0x557e81617a40 con 0x557e84b72c00
2325924-2019-08-26 03:00:34.412 7fc1e7f4e700  1 -- 172.21.15.50:0/1873085746 --> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] -- mgrreport(unknown.x +16-0 packed 134) v7 -- 0x557e84d3f800 con 0x557e84d26800
2326134-2019-08-26 03:00:34.412 7fc1eff9e700  1 -- 172.21.15.50:0/1183373898 <== mgr.74108 v2:172.21.15.50:6800/16906 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (crc 0 0 0) 0x557e81617a40 con 0x557e84d26000
2326348-2019-08-26 03:00:34.412 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] <== client.74144 172.21.15.50:0/1873085746 2 ==== mgrreport(client.x +16-0 packed 134) v7 ==== 1272+0+0 (crc 0 0 0) 0x557e84e67800 con 0x557e84b72800
2326597-2019-08-26 03:00:34.412 7fc1f57e9700  4 mgr.server handle_report from 0x557e84b72800 client,x
2326691-2019-08-26 03:00:34.412 7fc1f57e9700  4 mgr.server handle_report rejecting report from non-daemon client x
2326798-2019-08-26 03:00:34.412 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1873085746 conn(0x557e84b72800 msgr2=0x557e84e80000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2327021-2019-08-26 03:00:34.412 7fc1f57e9700  1 --2- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1873085746 conn(0x557e84b72800 0x557e84e80000 crc :-1 s=READY pgs=8 cs=0 l=1 rx=0 tx=0).stop
2327233-2019-08-26 03:00:34.412 7fc1eff9e700  1 -- 172.21.15.50:0/1183373898 --> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] -- mgrreport(unknown.x +16-0 packed 134) v7 -- 0x557e84cb5800 con 0x557e84d26000
2327443-2019-08-26 03:00:34.412 7fc2121f7700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 52
2327696-2019-08-26 03:00:34.413 7fc2121f7700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2327932-2019-08-26 03:00:34.413 7fc2121f7700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 crc :-1 s=READY pgs=36 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2328240-2019-08-26 03:00:34.413 7fc2121f7700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 crc :-1 s=READY pgs=36 cs=0 l=1 rx=0 tx=0).stop
2328453-2019-08-26 03:00:34.413 7fc1e7f4e700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 unknown :-1 s=STATE_CLOSED l=1).mark_down
2328664-2019-08-26 03:00:34.413 7fc1e7f4e700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 unknown :-1 s=CLOSED pgs=36 cs=0 l=1 rx=0 tx=0).stop
2328882-2019-08-26 03:00:34.413 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] <== client.74138 172.21.15.50:0/1183373898 2 ==== mgrreport(client.x +16-0 packed 134) v7 ==== 1272+0+0 (crc 0 0 0) 0x557e84cb5800 con 0x557e84b72c00
2329131-2019-08-26 03:00:34.413 7fc1f57e9700  4 mgr.server handle_report from 0x557e84b72c00 client,x
2329225-2019-08-26 03:00:34.413 7fc1f57e9700  4 mgr.server handle_report rejecting report from non-daemon client x
2329332-2019-08-26 03:00:34.413 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1183373898 conn(0x557e84b72c00 msgr2=0x557e84e80580 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2329555-2019-08-26 03:00:34.413 7fc1f57e9700  1 --2- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1183373898 conn(0x557e84b72c00 0x557e84e80580 crc :-1 s=READY pgs=9 cs=0 l=1 rx=0 tx=0).stop
2329767-2019-08-26 03:00:34.413 7fc2129f8700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 53
2330020-2019-08-26 03:00:34.413 7fc2129f8700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2330256-2019-08-26 03:00:34.413 7fc2129f8700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 crc :-1 s=READY pgs=37 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2330564-2019-08-26 03:00:34.413 7fc2129f8700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 crc :-1 s=READY pgs=37 cs=0 l=1 rx=0 tx=0).stop
2330777-2019-08-26 03:00:34.413 7fc1eff9e700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 unknown :-1 s=STATE_CLOSED l=1).mark_down
2330988-2019-08-26 03:00:34.413 7fc1eff9e700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 unknown :-1 s=CLOSED pgs=37 cs=0 l=1 rx=0 tx=0).stop
2331206-2019-08-26 03:00:34.426 7fc1f0fa0700 -1 *** Caught signal (Segmentation fault) **
2331288- in thread 7fc1f0fa0700 thread_name:mgr-fin
2331332-
2331333: ceph version 14.2.2-471-g49a6c9a (49a6c9aaf29cfb947ac384277ebed4d7897eaf01) nautilus (stable)
2331428- 1: (()+0xf5d0) [0x7fc216d3c5d0]
2331461- 2: (Mutex::lock(bool)+0x9) [0x7fc2194ace59]
2331506- 3: (PyModuleRunner::log(int, std::string const&)+0x100) [0x557e7f31eb90]
2331580- 4: (()+0x132826) [0x557e7f285826]
2331615- 5: (PyEval_EvalFrameEx()+0x6df0) [0x7fc218efccf0]
2331666- 6: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2331717- 7: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2331768- 8: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2331819- 9: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2331870- 10: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2331920- 11: (()+0x70a6d) [0x7fc218e88a6d]
2331955- 12: (PyObject_Call()+0x43) [0x7fc218e63a63]
2332000- 13: (PyEval_EvalFrameEx()+0x17fd) [0x7fc218ef76fd]
2332052- 14: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2332102- 15: (PyEval_EvalFrameEx()+0x663c) [0x7fc218efc53c]
2332154- 16: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2332204- 17: (()+0x70a6d) [0x7fc218e88a6d]
2332239- 18: (PyObject_Call()+0x43) [0x7fc218e63a63]
2332284- 19: (PyEval_EvalFrameEx()+0x17fd) [0x7fc218ef76fd]
2332336- 20: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2332388- 21: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2332440- 22: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2332490- 23: (()+0x70978) [0x7fc218e88978]
2332525- 24: (PyObject_Call()+0x43) [0x7fc218e63a63]
2332570- 25: (()+0x5aa55) [0x7fc218e72a55]
2332605- 26: (PyObject_Call()+0x43) [0x7fc218e63a63]
2332650- 27: (PyEval_CallObjectWithKeywords()+0x47) [0x7fc218ef58f7]
2332711- 28: (()+0x115822) [0x7fc218f2d822]
2332747- 29: (()+0x7dd5) [0x7fc216d34dd5]
2332781- 30: (clone()+0x6d) [0x7fc2159e502d]
--
2932433-   -27> 2019-08-26 03:00:34.412 7fc1f57e9700  4 mgr.server handle_report rejecting report from non-daemon client x
2932548-   -26> 2019-08-26 03:00:34.412 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1873085746 conn(0x557e84b72800 msgr2=0x557e84e80000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2932779-   -25> 2019-08-26 03:00:34.412 7fc1f57e9700  1 --2- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1873085746 conn(0x557e84b72800 0x557e84e80000 crc :-1 s=READY pgs=8 cs=0 l=1 rx=0 tx=0).stop
2932999-   -24> 2019-08-26 03:00:34.412 7fc1eff9e700  1 -- 172.21.15.50:0/1183373898 --> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] -- mgrreport(unknown.x +16-0 packed 134) v7 -- 0x557e84cb5800 con 0x557e84d26000
2933217-   -23> 2019-08-26 03:00:34.412 7fc2121f7700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 52
2933478-   -22> 2019-08-26 03:00:34.413 7fc2121f7700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2933722-   -21> 2019-08-26 03:00:34.413 7fc2121f7700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 crc :-1 s=READY pgs=36 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2934038-   -20> 2019-08-26 03:00:34.413 7fc2121f7700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 crc :-1 s=READY pgs=36 cs=0 l=1 rx=0 tx=0).stop
2934259-   -19> 2019-08-26 03:00:34.413 7fc1e7f4e700  4 mgrc ms_handle_reset ms_handle_reset con 0x557e84d26800
2934363-   -18> 2019-08-26 03:00:34.413 7fc1e7f4e700  4 mgrc reconnect Terminating session with v2:172.21.15.50:6800/16906
2934478-   -17> 2019-08-26 03:00:34.413 7fc1e7f4e700  1 -- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 msgr2=0x557e84cfdb80 unknown :-1 s=STATE_CLOSED l=1).mark_down
2934697-   -16> 2019-08-26 03:00:34.413 7fc1e7f4e700  1 --2- 172.21.15.50:0/1873085746 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26800 0x557e84cfdb80 unknown :-1 s=CLOSED pgs=36 cs=0 l=1 rx=0 tx=0).stop
2934923-   -15> 2019-08-26 03:00:34.413 7fc1e7f4e700  4 mgrc reconnect waiting to retry connect until 2019-08-26 03:00:35.412172
2935044-   -14> 2019-08-26 03:00:34.413 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] <== client.74138 172.21.15.50:0/1183373898 2 ==== mgrreport(client.x +16-0 packed 134) v7 ==== 1272+0+0 (crc 0 0 0) 0x557e84cb5800 con 0x557e84b72c00
2935301-   -13> 2019-08-26 03:00:34.413 7fc1f57e9700  4 mgr.server handle_report from 0x557e84b72c00 client,x
2935403-   -12> 2019-08-26 03:00:34.413 7fc1f57e9700  4 mgr.server handle_report rejecting report from non-daemon client x
2935518-   -11> 2019-08-26 03:00:34.413 7fc1f57e9700  1 -- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1183373898 conn(0x557e84b72c00 msgr2=0x557e84e80580 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2935749-   -10> 2019-08-26 03:00:34.413 7fc1f57e9700  1 --2- [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] >> 172.21.15.50:0/1183373898 conn(0x557e84b72c00 0x557e84e80580 crc :-1 s=READY pgs=9 cs=0 l=1 rx=0 tx=0).stop
2935969-    -9> 2019-08-26 03:00:34.413 7fc2129f8700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 53
2936230-    -8> 2019-08-26 03:00:34.413 7fc2129f8700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2936474-    -7> 2019-08-26 03:00:34.413 7fc2129f8700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 crc :-1 s=READY pgs=37 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2936790-    -6> 2019-08-26 03:00:34.413 7fc2129f8700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 crc :-1 s=READY pgs=37 cs=0 l=1 rx=0 tx=0).stop
2937011-    -5> 2019-08-26 03:00:34.413 7fc1eff9e700  4 mgrc ms_handle_reset ms_handle_reset con 0x557e84d26000
2937115-    -4> 2019-08-26 03:00:34.413 7fc1eff9e700  4 mgrc reconnect Terminating session with v2:172.21.15.50:6800/16906
2937230-    -3> 2019-08-26 03:00:34.413 7fc1eff9e700  1 -- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 msgr2=0x557e84cfd600 unknown :-1 s=STATE_CLOSED l=1).mark_down
2937449-    -2> 2019-08-26 03:00:34.413 7fc1eff9e700  1 --2- 172.21.15.50:0/1183373898 >> [v2:172.21.15.50:6800/16906,v1:172.21.15.50:6818/16906] conn(0x557e84d26000 0x557e84cfd600 unknown :-1 s=CLOSED pgs=37 cs=0 l=1 rx=0 tx=0).stop
2937675-    -1> 2019-08-26 03:00:34.413 7fc1eff9e700  4 mgrc reconnect waiting to retry connect until 2019-08-26 03:00:35.412143
2937796-     0> 2019-08-26 03:00:34.426 7fc1f0fa0700 -1 *** Caught signal (Segmentation fault) **
2937886- in thread 7fc1f0fa0700 thread_name:mgr-fin
2937930-
2937931: ceph version 14.2.2-471-g49a6c9a (49a6c9aaf29cfb947ac384277ebed4d7897eaf01) nautilus (stable)
2938026- 1: (()+0xf5d0) [0x7fc216d3c5d0]
2938059- 2: (Mutex::lock(bool)+0x9) [0x7fc2194ace59]
2938104- 3: (PyModuleRunner::log(int, std::string const&)+0x100) [0x557e7f31eb90]
2938178- 4: (()+0x132826) [0x557e7f285826]
2938213- 5: (PyEval_EvalFrameEx()+0x6df0) [0x7fc218efccf0]
2938264- 6: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2938315- 7: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2938366- 8: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2938417- 9: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2938468- 10: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2938518- 11: (()+0x70a6d) [0x7fc218e88a6d]
2938553- 12: (PyObject_Call()+0x43) [0x7fc218e63a63]
2938598- 13: (PyEval_EvalFrameEx()+0x17fd) [0x7fc218ef76fd]
2938650- 14: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2938700- 15: (PyEval_EvalFrameEx()+0x663c) [0x7fc218efc53c]
2938752- 16: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2938802- 17: (()+0x70a6d) [0x7fc218e88a6d]
2938837- 18: (PyObject_Call()+0x43) [0x7fc218e63a63]
2938882- 19: (PyEval_EvalFrameEx()+0x17fd) [0x7fc218ef76fd]
2938934- 20: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2938986- 21: (PyEval_EvalFrameEx()+0x67bd) [0x7fc218efc6bd]
2939038- 22: (PyEval_EvalCodeEx()+0x7ed) [0x7fc218eff03d]
2939088- 23: (()+0x70978) [0x7fc218e88978]
2939123- 24: (PyObject_Call()+0x43) [0x7fc218e63a63]
2939168- 25: (()+0x5aa55) [0x7fc218e72a55]
2939203- 26: (PyObject_Call()+0x43) [0x7fc218e63a63]
2939248- 27: (PyEval_CallObjectWithKeywords()+0x47) [0x7fc218ef58f7]
2939309- 28: (()+0x115822) [0x7fc218f2d822]
2939345- 29: (()+0x7dd5) [0x7fc216d34dd5]
2939379- 30: (clone()+0x6d) [0x7fc2159e502d]


Related issues 1 (0 open1 closed)

Is duplicate of rbd - Bug #41029: [upgrade] mimic -> latest can result in 'rbd_support' failing to loadResolvedJason Dillaman07/31/2019

Actions
Actions #2

Updated by Sebastian Wagner over 4 years ago

  • Project changed from Ceph to mgr
  • Subject changed from "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus to "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus in PyModuleRunner::log
Actions #3

Updated by Josh Durgin over 4 years ago

Maybe related to rbd_support module failing to initialize, and logging afterwards:

2019-08-26 03:00:29.435 7fc2121f7700  1 -- 172.21.15.50:0/1183373898 <== osd.3 v1:172.21.15.50:6809/11473 2 ==== osd_op_reply(2 rbd_namespace [call] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v8 ==== 157+0+0 (unknown 2844119871 0 0) 0x557e84b91180 con 0x557e841d0800
2019-08-26 03:00:29.435 7fc1f5fea700 -1 librbd::api::Namespace: list: error listing namespaces: (95) Operation not supported
2019-08-26 03:00:29.435 7fc1f5fea700 -1 mgr load Failed to construct class in 'rbd_support'
2019-08-26 03:00:29.435 7fc1f5fea700 -1 mgr load Traceback (most recent call last):
  File "/usr/share/ceph/mgr/rbd_support/module.py", line 1319, in __init__
    self.task = TaskHandler(self)
  File "/usr/share/ceph/mgr/rbd_support/module.py", line 610, in __init__
    self.init_task_queue()
  File "/usr/share/ceph/mgr/rbd_support/module.py", line 676, in init_task_queue
    for namespace in rbd.RBD().namespace_list(ioctx):
  File "rbd.pyx", line 2160, in rbd.RBD.namespace_list (/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2-471-g49a6c9a/rpm/el7/BUILD/ceph-14.2.2-471-g49a6c9a/build/src/pybind/rbd/pyrex/rbd.c:18093)
OSError: [errno 95] error listing namespaces

2019-08-26 03:00:29.436 7fc1f5fea700 20 mgr ~Gil Destroying new thread state 0x557e81dca2c0
2019-08-26 03:00:29.436 7fc1f5fea700 -1 mgr operator() Failed to run module in active mode ('rbd_support')
Actions #4

Updated by Kefu Chai over 4 years ago

on mgr.x side

2019-08-22 07:28:19.603 7fc112c70d40  0 ceph version 14.2.2-469-ge157074 (e157074510ad826a2f5417a61c74d2b13d97f3c2) nautilus (stable), process ceph-mgr, pid 18085
...
2019-08-22 07:28:20.299 7fc112c70d40  1 mgr[py] Loading python module 'rbd_support'
...
2019-08-22 07:28:21.583 7fc0f394d700  4 mgr[rbd_support] load_task_task: rbd, start_after=
...
2019-08-22 07:28:21.583 7fc0f394d700  1 -- 172.21.15.67:0/2635230790 --> v1:172.21.15.194:6800/12876 -- osd_op(unknown.0.0:2 1.4 1:377edd40:::rbd_namespace:head [call rbd.namespace_list] snapc 0=[] ondisk+read+known_if_redirected e19) v8 -- 0x592ca00 con 0x58cc400
...
2019-08-22 07:28:21.583 7fc103f5b700  1 -- 172.21.15.67:0/2635230790 <== osd.4 v1:172.21.15.194:6800/12876 1 ==== osd_op_reply(2 rbd_namespace [call] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v8 ==== 157+0+0 (unknown 2280702990 0 0) 0x5973340 con 0x58cc400

on osd.4:

2019-08-22 07:22:46.944 7fd116fa1e00  0 ceph version 13.2.6-363-g0508aa2 (0508aa240643da6bdb5ec3c9390ad3417eedd4ba) mimic (stable), process ceph-osd, pid 12555
...
2019-08-22 07:28:21.584 7fcae7b1b700 20 osd.4 pg_epoch: 19 pg[1.4( empty local-lis/les=12/13 n=0 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [4,5] r=0 lpr=12 crt=0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.64587.0:2 1.4 1:377edd40:::rbd_namespace:head [call rbd.namespace_list] snapc 0=[] ondisk+read+known_if_redirected e19) v8
2019-08-22 07:28:21.584 7fcae7b1b700  1 -- 172.21.15.194:6800/12876 --> 172.21.15.67:0/2635230790 -- osd_op_reply(2 rbd_namespace [call rbd.namespace_list] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v8 -- 0x3728340 con 0
...
2019-08-22 07:33:12.580 7f1c1561bf80  0 ceph version 14.2.2-469-ge157074 (e157074510ad826a2f5417a61c74d2b13d97f3c2) nautilus (stable), process ceph-osd, pid 15040
...
2019-08-22 07:33:12.692 7f1c1561bf80 10 register_cxx_method rbd.namespace_add flags 3 0x7f1bfa949600
2019-08-22 07:33:12.692 7f1c1561bf80 10 register_cxx_method rbd.namespace_remove flags 3 0x7f1bfa949850
2019-08-22 07:33:12.692 7f1c1561bf80 10 register_cxx_method rbd.namespace_list flags 1 0x7f1bfa94d360

on mon.a:

2019-08-22 07:22:54.840 7fd020beda80  0 ceph version 13.2.6-363-g0508aa2 (0508aa240643da6bdb5ec3c9390ad3417eedd4ba) mimic (stable), process ceph-mon, pid 12943
...
2019-08-22 07:22:59.876 7fd00bf0a700 10 mon.a@0(leader).mgr e0 create_initial initial modules balancer,crash,iostat,restful,status, 35 commands
...
2019-08-22 07:25:55.991 7fbeada423c0  0 ceph version 14.2.2-469-ge157074 (e157074510ad826a2f5417a61c74d2b13d97f3c2) nautilus (stable), process ceph-mon, pid 16688
...
2019-08-22 07:28:14.523 7fbe971c2700  4 mon.a@0(leader).mgr e38 always on modules changed, pending balancer,crash,devicehealth,orchestrator_cli,progress,rbd_support,status,volumes != wanted {14=balancer,crash,devicehealth,orchestrator_cli,progress,rbd_support,status,volumes}
  1. mimic does not support rbd.namespace. mgr does not have rbd_support module, neither did monitor enable rbd_support mgr module.
  2. after monitor is upgraded from mimic, its "always_on_modules" is updated with the ones from nautilus since when rbd_support is listed as one of always_on_modules
  3. mgr is updated with this info, so it starts the rbd_support module, in rbd_support.Module.__init__(), TaskHandler::init_task_queue() calls rbd.RBD().namespace_list(ioctx). but by then, osd.4 was still mimic.
Actions #5

Updated by Kefu Chai over 4 years ago

since we don't have/enforce require_minimal_release option yet in MgrModule. i think we have following options

- do not update "always_on_modules[foo,..]" unless "osdmap.require_osd_release" is greater than "foo"
- have finer control of mgr module enabling, say, to add an option "require_osd_release" and only start the ones whose "require_osd_release" is not greater than current "osdmap.require_osd_release"

the log message from mon is fixed by https://github.com/ceph/ceph/pull/29917

Actions #6

Updated by Josh Durgin over 4 years ago

Kefu Chai wrote:

since we don't have/enforce require_minimal_release option yet in MgrModule. i think we have following options

- do not update "always_on_modules[foo,..]" unless "osdmap.require_osd_release" is greater than "foo"
- have finer control of mgr module enabling, say, to add an option "require_osd_release" and only start the ones whose "require_osd_release" is not greater than current "osdmap.require_osd_release"

The latter sounds good - there are plenty of modules that don't depend on particular osd features at all.

Actions #7

Updated by Kefu Chai over 4 years ago

  • Status changed from New to 12
  • Assignee set to Kefu Chai
Actions #8

Updated by Mykola Golub over 4 years ago

The rbd_support module has already been fixed in the master to ignore 'namespace operation is not supported' error for older osd [1]. It is not backported to nautilus yet though.

[1] https://tracker.ceph.com/issues/41029

Actions #9

Updated by Kefu Chai over 4 years ago

  • Status changed from 12 to Duplicate
  • Assignee changed from Kefu Chai to Mykola Golub
Actions #10

Updated by Kefu Chai over 4 years ago

  • Is duplicate of Bug #41029: [upgrade] mimic -> latest can result in 'rbd_support' failing to load added
Actions

Also available in: Atom PDF