https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-02-22T00:58:35Z
Ceph
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159282
2020-02-22T00:58:35Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p>Same failure here for the other volumes plugin test: <a class="external" href="http://pulpito.ceph.com/yuriw-2020-02-18_16:08:51-fs-nautilus-distro-basic-smithi/4777850/">http://pulpito.ceph.com/yuriw-2020-02-18_16:08:51-fs-nautilus-distro-basic-smithi/4777850/</a></p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159296
2020-02-22T17:18:38Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p>Test run with --num 5 to see if we can get more information:</p>
<p><a class="external" href="http://pulpito.ceph.com/pdonnell-2020-02-22_17:17:16-fs-nautilus-distro-basic-smithi/">http://pulpito.ceph.com/pdonnell-2020-02-22_17:17:16-fs-nautilus-distro-basic-smithi/</a></p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159301
2020-02-22T21:24:12Z
Sage Weil
sage@newdream.net
<ul></ul><p>with debug_ms=20 we see, on the mgr,<br /><pre>
2020-02-22 14:44:50.810 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.810 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read continue len=32
2020-02-22 14:44:50.810 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main r=0
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=41 align=8
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=73 align=8
2020-02-22 14:44:50.810 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.812 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.812 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.812 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.812 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.812 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.814 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.814 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.814 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.814 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.814 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.816 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.816 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.816 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.816 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.816 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
...
</pre><br />/a/sage-2020-02-22_13:50:50-fs:basic_functional-nautilus-distro-basic-smithi/4791175</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159302
2020-02-22T21:27:12Z
Sage Weil
sage@newdream.net
<ul></ul><p>lots of connections are busy looping, also waiting on the same throttle<br /><pre>
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6818/33792,v1:172.21.15.44:6819/33792] conn(0x55eea7152400 0x55eea714eb00 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/2377610774,v1:172.21.15.44:6835/2377610774] conn(0x55eea818bc00 0x55eea7be0580 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/2377610774,v1:172.21.15.44:6835/2377610774] conn(0x55eea818bc00 0x55eea7be0580 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6818/33792,v1:172.21.15.44:6819/33792] conn(0x55eea7152400 0x55eea714eb00 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/1766558866,v1:172.21.15.49:6833/1766558866] conn(0x55eea815c800 0x55eea8149080 crc :-1 s=THROTTLE_MESSAGE pgs=9 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/1766558866,v1:172.21.15.49:6833/1766558866] conn(0x55eea815c800 0x55eea8149080 crc :-1 s=THROTTLE_MESSAGE pgs=9 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/835642329,v1:172.21.15.49:6833/835642329] conn(0x55eea818b800 0x55eea6f19600 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/1238526435,v1:172.21.15.44:6835/1238526435] conn(0x55eea7de4000 0x55eea807d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/1238526435,v1:172.21.15.44:6835/1238526435] conn(0x55eea7de4000 0x55eea807d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/835642329,v1:172.21.15.49:6833/835642329] conn(0x55eea818b800 0x55eea6f19600 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32850 conn(0x55eea63eec00 0x55eea7c0cb00 secure :-1 s=THROTTLE_MESSAGE pgs=2 cs=0 l=1 rx=0x55eea6e0f800 tx=0x55eea6ee7900).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32850 conn(0x55eea63eec00 0x55eea7c0cb00 secure :-1 s=THROTTLE_MESSAGE pgs=2 cs=0 l=1 rx=0x55eea6e0f800 tx=0x55eea6ee7900).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6802/33790,v1:172.21.15.44:6803/33790] conn(0x55eea7152000 0x55eea714e580 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32977 conn(0x55eea34d8c00 0x55eea742d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32977 conn(0x55eea34d8c00 0x55eea742d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6802/33790,v1:172.21.15.44:6803/33790] conn(0x55eea7152000 0x55eea714e580 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32851 conn(0x55eea6354800 0x55eea6ce8000 crc :-1 s=THROTTLE_MESSAGE pgs=8 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32978 conn(0x55eea6dea800 0x55eea6cea680 secure :-1 s=THROTTLE_MESSAGE pgs=1 cs=0 l=1 rx=0x55eea6dd1500 tx=0x55eea6cc8d00).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32851 conn(0x55eea6354800 0x55eea6ce8000 crc :-1 s=THROTTLE_MESSAGE pgs=8 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32978 conn(0x55eea6dea800 0x55eea6cea680 secure :-1 s=THROTTLE_MESSAGE pgs=1 cs=0 l=1 rx=0x55eea6dd1500 tx=0x55eea6cc8d00).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6815/34301,v1:172.21.15.49:6817/34301] conn(0x55eea70bec00 0x55eea7098680 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6815/34301,v1:172.21.15.49:6817/34301] conn(0x55eea70bec00 0x55eea7098680 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
</pre></p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159303
2020-02-22T21:32:28Z
Sage Weil
sage@newdream.net
<ul></ul><p>My guess is we are leaking a message ref somewhere...</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159325
2020-02-23T23:16:44Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p>Merging this PR seems to cause or expose the issue: <a class="external" href="https://github.com/ceph/ceph/pull/33122">https://github.com/ceph/ceph/pull/33122</a></p>
<p>Before merge:</p>
<p><a class="external" href="http://pulpito.ceph.com/pdonnell-2020-02-23_18:32:02-fs-nautilus-distro-basic-smithi/">http://pulpito.ceph.com/pdonnell-2020-02-23_18:32:02-fs-nautilus-distro-basic-smithi/</a></p>
<p>Failures unrelated to this tracker.</p>
<p>After merge:</p>
<p><a class="external" href="http://pulpito.ceph.com/pdonnell-2020-02-23_18:25:22-fs-nautilus-distro-basic-smithi/">http://pulpito.ceph.com/pdonnell-2020-02-23_18:25:22-fs-nautilus-distro-basic-smithi/</a></p>
<p>all timeout failures.</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159326
2020-02-23T23:23:49Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p>Sage Weil wrote:</p>
<blockquote>
<p>My guess is we are leaking a message ref somewhere...</p>
</blockquote>
<p>I see at least one instance of that here: <a class="external" href="https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446">https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446</a></p>
<p>I will write up a fix for that but this PR was merged <em>after</em> <a class="external" href="https://github.com/ceph/ceph/pull/33122">https://github.com/ceph/ceph/pull/33122</a> so there must be another one somewhere. Still looking...</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159327
2020-02-23T23:32:48Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li><li><strong>Assignee</strong> set to <i>Patrick Donnelly</i></li><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Immediate</i></li><li><strong>Target version</strong> set to <i>v14.2.8</i></li><li><strong>Pull request ID</strong> set to <i>33498</i></li></ul><p>Testing a fix for broken backport #31905. I think there must be another one of these.</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159333
2020-02-24T02:26:30Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/43046">Backport #43046</a>: nautilus: mgr: "mds metadata" to setup new DaemonState races with fsmap</i> added</li></ul>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159340
2020-02-24T06:08:52Z
Venky Shankar
vshankar@redhat.com
<ul></ul><p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Merging this PR seems to cause or expose the issue: <a class="external" href="https://github.com/ceph/ceph/pull/33122">https://github.com/ceph/ceph/pull/33122</a></p>
<p>Before merge:</p>
<p><a class="external" href="http://pulpito.ceph.com/pdonnell-2020-02-23_18:32:02-fs-nautilus-distro-basic-smithi/">http://pulpito.ceph.com/pdonnell-2020-02-23_18:32:02-fs-nautilus-distro-basic-smithi/</a></p>
<p>Failures unrelated to this tracker.</p>
<p>After merge:</p>
<p><a class="external" href="http://pulpito.ceph.com/pdonnell-2020-02-23_18:25:22-fs-nautilus-distro-basic-smithi/">http://pulpito.ceph.com/pdonnell-2020-02-23_18:25:22-fs-nautilus-distro-basic-smithi/</a></p>
</blockquote>
<p>One of the failures (4794420) resemble this issue: <a class="external" href="https://tracker.ceph.com/issues/44207">https://tracker.ceph.com/issues/44207</a></p>
<blockquote>
<p>all timeout failures.</p>
</blockquote>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159341
2020-02-24T06:10:36Z
Venky Shankar
vshankar@redhat.com
<ul></ul><p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Sage Weil wrote:</p>
<blockquote>
<p>My guess is we are leaking a message ref somewhere...</p>
</blockquote>
<p>I see at least one instance of that here: <a class="external" href="https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446">https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446</a></p>
<p>I will write up a fix for that but this PR was merged <em>after</em> <a class="external" href="https://github.com/ceph/ceph/pull/33122">https://github.com/ceph/ceph/pull/33122</a> so there must be another one somewhere. Still looking...</p>
</blockquote>
<p>The other one might just be <a class="external" href="https://tracker.ceph.com/issues/44207">https://tracker.ceph.com/issues/44207</a>. We should include backport of <a class="external" href="https://github.com/ceph/ceph/pull/33413">https://github.com/ceph/ceph/pull/33413</a></p>
<p>Patrick?</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159441
2020-02-24T18:09:40Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul><p>Venky Shankar wrote:</p>
<blockquote>
<p>Patrick Donnelly wrote:</p>
<blockquote>
<p>Sage Weil wrote:</p>
<blockquote>
<p>My guess is we are leaking a message ref somewhere...</p>
</blockquote>
<p>I see at least one instance of that here: <a class="external" href="https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446">https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446</a></p>
<p>I will write up a fix for that but this PR was merged <em>after</em> <a class="external" href="https://github.com/ceph/ceph/pull/33122">https://github.com/ceph/ceph/pull/33122</a> so there must be another one somewhere. Still looking...</p>
</blockquote>
<p>The other one might just be <a class="external" href="https://tracker.ceph.com/issues/44207">https://tracker.ceph.com/issues/44207</a>. We should include backport of <a class="external" href="https://github.com/ceph/ceph/pull/33413">https://github.com/ceph/ceph/pull/33413</a></p>
<p>Patrick?</p>
</blockquote>
<p>Yes, I see the dead lock now. #33413 seems to help but there is still deadlocks, from your test: /ceph/teuthology-archive/vshankar-2020-02-24_12:33:54-fs-wip-vshankar-testing-testing-basic-smithi/4798102</p>
<p>I will create a new issue for this.</p>
mgr - Bug #44245: nautilus: mgr: connection halt
https://tracker.ceph.com/issues/44245?journal_id=159443
2020-02-24T19:03:57Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p><a class="external" href="https://tracker.ceph.com/issues/44276">https://tracker.ceph.com/issues/44276</a></p>