Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-25T17:48:18Z
Ceph
Redmine
RADOS - Backport #65121 (New): reef: PG autoscaler tuning => catastrophic ceph cluster crash
https://tracker.ceph.com/issues/65121
2024-03-25T17:48:18Z
Backport Bot
RADOS - Backport #65120 (New): squid: PG autoscaler tuning => catastrophic ceph cluster crash
https://tracker.ceph.com/issues/65120
2024-03-25T17:48:11Z
Backport Bot
RADOS - Backport #65119 (New): quincy: PG autoscaler tuning => catastrophic ceph cluster crash
https://tracker.ceph.com/issues/65119
2024-03-25T17:48:04Z
Backport Bot
RADOS - Backport #64576 (New): quincy: Incorrect behavior on combined cmpext+write ops in the fac...
https://tracker.ceph.com/issues/64576
2024-02-26T18:46:22Z
Backport Bot
RADOS - Backport #64575 (New): reef: Incorrect behavior on combined cmpext+write ops in the face ...
https://tracker.ceph.com/issues/64575
2024-02-26T18:46:15Z
Backport Bot
RADOS - Bug #64437 (New): qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 ...
https://tracker.ceph.com/issues/64437
2024-02-14T19:19:09Z
Laura Flores
<p>/a/lflores-2024-02-13_16:18:32-rados-wip-yuri5-testing-2024-02-12-1152-distro-default-smithi/7558380<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-02-13T19:01:44.117 INFO:tasks.workunit.client.0.smithi129.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:788: TEST_repair_stats_ec: jq '.pg_map.osd_stats[] | select(.osd == 0 ).num_shards_repaired'
2024-02-13T19:01:44.117 INFO:tasks.workunit.client.0.smithi129.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:788: TEST_repair_stats_ec: ceph pg dump --format=json-pretty
2024-02-13T19:01:44.430 INFO:tasks.workunit.client.0.smithi129.stderr:dumped all
2024-02-13T19:01:44.448 INFO:tasks.workunit.client.0.smithi129.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:788: TEST_repair_stats_ec: COUNT=26
2024-02-13T19:01:44.449 INFO:tasks.workunit.client.0.smithi129.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:789: TEST_repair_stats_ec: test 26 = 13
2024-02-13T19:01:44.449 INFO:tasks.workunit.client.0.smithi129.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:789: TEST_repair_stats_ec: return 1
2024-02-13T19:01:44.449 INFO:tasks.workunit.client.0.smithi129.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:65: run: return 1
</span></code></pre></p>
RADOS - Backport #64158 (New): quincy: CommandFailedError (rados/test_python.sh): "RADOS object n...
https://tracker.ceph.com/issues/64158
2024-01-24T20:05:21Z
Backport Bot
Ceph - Bug #61400 (New): valgrind+ceph-mon: segmentation fault in rocksdb+tcmalloc
https://tracker.ceph.com/issues/61400
2023-05-24T14:28:51Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
0> 2023-05-24T02:54:54.546+0000 708e7c0 -1 *** Caught signal (Segmentation fault) **
in thread 708e7c0 thread_name:ceph-mon
ceph version 18.0.0-4167-gfa0e62c4 (fa0e62c4a1d8e4a737d9cbe50224f70009b79b28) reef (dev)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x5827420]
2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)+0x20) [0x55f50a0]
3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x20) [0x55f5370]
4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x80) [0x55f5430]
5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))+0x76) [0x55f8e46]
6: (tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x165) [0x5609015]
7: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x22b) [0x109b0dd]
8: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x65) [0x109a139]
9: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x760) [0xf56d80]
10: (MonitorDBStore::open(std::ostream&)+0xfd) [0xc5ca1d]
11: main()
12: __libc_start_main()
13: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>
<p>From: /ceph/teuthology-archive/pdonnell-2023-05-23_18:20:18-fs-wip-pdonnell-testing-20230523.134409-distro-default-smithi/7284230/remote/smithi007/log/ceph-mon.a.log.gz</p>
<p>This happened shortly after the ceph-mon starts.</p>
<p>Here's the error from valgrind as well:</p>
<pre>
<error>
<unique>0x1</unique>
<tid>1</tid>
<threadname>ceph-mon</threadname>
<kind>InvalidRead</kind>
<what>Invalid read of size 8</what>
<stack>
<frame>
<ip>0x55F50A0</ip>
<obj>/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.3</obj>
<fn>tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)</fn>
</frame>
<frame>
<ip>0x55F536F</ip>
<obj>/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.3</obj>
<fn>tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)</fn>
</frame>
<frame>
<ip>0x55F542F</ip>
<obj>/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.3</obj>
<fn>tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)</fn>
</frame>
<frame>
<ip>0x55F8E45</ip>
<obj>/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.3</obj>
<fn>tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))</fn>
</frame>
<frame>
<ip>0x5609014</ip>
<obj>/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.3</obj>
<fn>tcmalloc::allocate_full_cpp_throw_oom(unsigned long)</fn>
</frame>
<frame>
<ip>0x109B0DC</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>rocksdb::DBImpl::Open(rocksdb::DBOptions const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; &gt; const&amp;, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; &gt;*, rocksdb::DB**, bool, bool)</fn>
</frame>
<frame>
<ip>0x109A138</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>rocksdb::DB::Open(rocksdb::DBOptions const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; &gt; const&amp;, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; &gt;*, rocksdb::DB**)</fn>
</frame>
<frame>
<ip>0xF56D7F</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>RocksDBStore::do_open(std::ostream&amp;, bool, bool, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;)</fn>
<dir>./obj-x86_64-linux-gnu/src/kv/./src/kv</dir>
<file>RocksDBStore.cc</file>
<line>1193</line>
</frame>
<frame>
<ip>0xC5CA1C</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>MonitorDBStore::open(std::ostream&amp;)</fn>
<dir>./obj-x86_64-linux-gnu/src/./src/mon</dir>
<file>MonitorDBStore.h</file>
<line>674</line>
</frame>
<frame>
<ip>0xC368BF</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>main</fn>
<dir>./obj-x86_64-linux-gnu/src/./src</dir>
<file>ceph_mon.cc</file>
<line>639</line>
</frame>
</stack>
<auxwhat>Address 0x20 is not stack'd, malloc'd or (recently) free'd</auxwhat>
</error>
</pre>
<p>From: /ceph/teuthology-archive/pdonnell-2023-05-23_18:20:18-fs-wip-pdonnell-testing-20230523.134409-distro-default-smithi/7284230/remote/smithi007/log/valgrind/mon.a.log.gz</p>
RADOS - Backport #59677 (New): quincy: osd:tick checking mon for new map
https://tracker.ceph.com/issues/59677
2023-05-08T17:10:15Z
Backport Bot
RADOS - Bug #57267 (New): Valgrind reports memory "Leak_IndirectlyLost" errors on ceph-mon in "Ke...
https://tracker.ceph.com/issues/57267
2022-08-23T19:00:45Z
Laura Flores
<p>/a/yuriw-2022-08-19_20:57:42-rados-wip-yuri6-testing-2022-08-19-0940-pacific-distro-default-smithi/6981517/remote/smithi130/log/valgrind/mon.a.log.gz</p>
<pre><code class="text syntaxhl"><span class="CodeRay"><error>
<unique>0xc0cc5e</unique>
<tid>1</tid>
<threadname>ceph-mon</threadname>
<kind>Leak_IndirectlyLost</kind>
<xwhat>
<text>312 bytes in 13 blocks are indirectly lost in loss record 14 of 40</text>
<leakedbytes>312</leakedbytes>
<leakedblocks>13</leakedblocks>
</xwhat>
<stack>
<frame>
<ip>0x4C38B6F</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>operator new[](unsigned long)</fn>
<dir>/builddir/build/BUILD/valgrind-3.19.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>640</line>
</frame>
<frame>
<ip>0x54342C6</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>ceph::buffer::v15_2_0::ptr_node::cloner::operator()(ceph::buffer::v15_2_0::ptr_node const&amp;)</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>2228</line>
</frame>
<frame>
<ip>0x4EF972</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>clone_from</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/include</dir>
<file>buffer.h</file>
<line>593</line>
</frame>
<frame>
<ip>0x4EF972</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>operator=</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/include</dir>
<file>buffer.h</file>
<line>964</line>
</frame>
<frame>
<ip>0x4EF972</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>operator=</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/include</dir>
<file>buffer.h</file>
<line>961</line>
</frame>
<frame>
<ip>0x4EF972</ip>
<obj>/usr/bin/ceph-mon</obj>
<fn>KeyServerData::get_caps(ceph::common::CephContext*, EntityName const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, AuthCapsInfo&amp;) const</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/auth/cephx</dir>
<file>CephxKeyServer.cc</file>
<line>128</line>
</frame>
...
<frame>
<ip>0x53F9800</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>ProtocolV2::handle_read_frame_segment(std::unique_ptr&lt;ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>1190</line>
</frame>
<frame>
<ip>0x53E1F6B</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>47</line>
</frame>
<frame>
<ip>0x53E214E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>ProtocolV2::read_event()</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>479</line>
</frame>
<frame>
<ip>0x53AA18B</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>AsyncConnection::process()</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>AsyncConnection.cc</file>
<line>464</line>
</frame>
<frame>
<ip>0x5404E15</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>Event.cc</file>
<line>449</line>
</frame>
<frame>
<ip>0x540AA9B</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>operator()</fn>
<dir>/usr/src/debug/ceph-16.2.10-668.g67571cd8.el8.x86_64/src/msg/async</dir>
<file>Stack.cc</file>
<line>53</line>
</frame>
<frame>
<ip>0x540AA9B</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
<dir>/usr/include/c++/8/bits</dir>
<file>std_function.h</file>
<line>297</line>
</frame>
<frame>
<ip>0x10529BA2</ip>
<obj>/usr/lib64/libstdc++.so.6.0.25</obj>
</frame>
<frame>
<ip>0xFB7D1C9</ip>
<obj>/usr/lib64/libpthread-2.28.so</obj>
<fn>start_thread</fn>
</frame>
<frame>
<ip>0x10DCFDD2</ip>
<obj>/usr/lib64/libc-2.28.so</obj>
<fn>clone</fn>
</frame>
</stack>
</error>
</span></code></pre>
<p>Another occurrence reported here: /a/yuriw-2022-06-15_18:29:33-rados-wip-yuri4-testing-2022-06-15-1000-pacific-distro-default-smithi/6881215</p>
RADOS - Bug #56896 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_c...
https://tracker.ceph.com/issues/56896
2022-07-28T02:22:31Z
Telemetry Bot
<p><a class="external" href="http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775be5a14240d2473e019eebac05627b7efb4c77ec8d">http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775be5a14240d2473e019eebac05627b7efb4c77ec8d</a></p>
<p>Assert condition: end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout<br />Assert function: int OSD::shutdown()</p>
<p>Sanitized backtrace:<br /><pre> pthread_kill()
raise()
OSD::shutdown()
SignalHandler::entry()
</pre><br />Crash dump sample:<br /><pre>{
"assert_condition": "end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout",
"assert_file": "osd/OSD.cc",
"assert_func": "int OSD::shutdown()",
"assert_line": 4323,
"assert_msg": "osd/OSD.cc: In function 'int OSD::shutdown()' thread 7fb8e1d6b640 time 2022-07-19T04:46:25.632566-0400\nosd/OSD.cc: 4323: FAILED ceph_assert(end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout)",
"assert_thread_name": "signal_handler",
"backtrace": [
"/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb8e6474520]",
"pthread_kill()",
"raise()",
"abort()",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x182) [0x5624886557f3]",
"/usr/bin/ceph-osd(+0x58c955) [0x562488655955]",
"(OSD::shutdown()+0x1678) [0x562488799228]",
"(SignalHandler::entry()+0x62d) [0x562488ddeb1d]",
"/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fb8e64c6b43]",
"/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fb8e6558a00]"
],
"ceph_version": "17.2.0",
"crash_id": "2022-07-19T08:46:25.642368Z_707f1918-6002-4cf2-91c4-3eabf43e2ad6",
"entity_name": "osd.92814b044f442681826ea0cbcccfd81e98868fb3",
"os_id": "22.04",
"os_name": "Ubuntu 22.04 LTS",
"os_version": "22.04 LTS (Jammy Jellyfish)",
"os_version_id": "22.04",
"process_name": "ceph-osd",
"stack_sig": "2af5542c6650dea698ae7186f067d3beafbf3bd30a36d7d19c304f29ad9b5802",
"timestamp": "2022-07-19T08:46:25.642368Z",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-37-generic",
"utsname_sysname": "Linux",
"utsname_version": "#39-Ubuntu SMP Wed Jun 1 19:16:45 UTC 2022"
}</pre></p>
RADOS - Bug #55776 (New): octopus: map exx had wrong cluster addr
https://tracker.ceph.com/issues/55776
2022-05-26T22:56:31Z
Laura Flores
<p>Description: rados/objectstore/{backends/ceph_objectstore_tool supported-random-distro$/{ubuntu_18.04}}</p>
<p>/a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832682<br /><pre><code class="text syntaxhl"><span class="CodeRay">2022-05-14T09:55:45.794 INFO:tasks.ceph.osd.3.smithi146.stderr:2022-05-14T09:55:45.791+0000 7f9567743d80 -1 osd.3 49 log_to_monitors {default=true}
2022-05-14T09:55:46.341 INFO:tasks.ceph.osd.3.smithi146.stderr:2022-05-14T09:55:46.339+0000 7f955c3ab700 -1 osd.3 49 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2022-05-14T09:55:48.347 INFO:tasks.ceph.osd.3.smithi146.stderr:2022-05-14T09:55:48.343+0000 7f9553b9a700 -1 log_channel(cluster) log [ERR] : map e64 had wrong cluster addr ([v2:0.0.0.0:6826/135641,v1:0.0.0.0:6827/135641] != my [v2:172.21.15.146:6826/135641,v1:172.21.15.146:6827/135641])
</span></code></pre></p>
<p>Failed one more time in a rerun: /a/yuriw-2022-05-14_14:25:15-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6833288. Then the test passed.</p>
RADOS - Bug #51945 (New): qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
https://tracker.ceph.com/issues/51945
2021-07-28T18:23:06Z
Neha Ojha
nojha@redhat.com
<pre>
2021-07-27T21:11:15.731 INFO:tasks.workunit.client.0.smithi110.stderr:+ ceph auth del client.bar
2021-07-27T21:11:18.771 INFO:teuthology.orchestra.run.smithi077.stdout:{"election_epoch":190,"quorum":[0,1],"quorum_names":["a","b"],"quorum_leader_name":"a","quorum_age":0,"features":{"quorum_con":"4540138297136906239","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"]},"monmap":{"epoch":1,"fsid":"87a21b71-6325-4bd7-b729-3438c61d134e","modified":"2021-07-27T20:51:11.510526Z","created":"2021-07-27T20:51:11.510526Z","min_mon_release":16,"min_mon_release_name":"pacific","election_strategy":1,"disallowed_leaders: ":"","stretch_mode":false,"features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.77:3300","nonce":0},{"type":"v1","addr":"172.21.15.77:6789","nonce":0}]},"addr":"172.21.15.77:6789/0","public_addr":"172.21.15.77:6789/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.110:3300","nonce":0},{"type":"v1","addr":"172.21.15.110:6789","nonce":0}]},"addr":"172.21.15.110:6789/0","public_addr":"172.21.15.110:6789/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.77:3301","nonce":0},{"type":"v1","addr":"172.21.15.77:6790","nonce":0}]},"addr":"172.21.15.77:6790/0","public_addr":"172.21.15.77:6790/0","priority":0,"weight":0,"crush_location":"{}"}]}}
2021-07-27T21:11:19.158 INFO:tasks.workunit.client.0.smithi110.stderr:entity client.bar does not exist
2021-07-27T21:11:19.166 INFO:tasks.workunit.client.0.smithi110.stderr:+ expect 'ceph -k /tmp/cephtest-mon-caps-madness.bar.keyring --user bar quorum_status' 13
2021-07-27T21:11:19.166 INFO:tasks.workunit.client.0.smithi110.stderr:+ cmd='ceph -k /tmp/cephtest-mon-caps-madness.bar.keyring --user bar quorum_status'
2021-07-27T21:11:19.167 INFO:tasks.workunit.client.0.smithi110.stderr:+ expected_ret=13
2021-07-27T21:11:19.167 INFO:tasks.workunit.client.0.smithi110.stderr:+ echo ceph -k /tmp/cephtest-mon-caps-madness.bar.keyring --user bar quorum_status
2021-07-27T21:11:19.168 INFO:tasks.workunit.client.0.smithi110.stdout:ceph -k /tmp/cephtest-mon-caps-madness.bar.keyring --user bar quorum_status
2021-07-27T21:11:19.169 INFO:tasks.workunit.client.0.smithi110.stderr:+ eval ceph -k /tmp/cephtest-mon-caps-madness.bar.keyring --user bar quorum_status
2021-07-27T21:11:19.597 INFO:tasks.workunit.client.0.smithi110.stderr:+ ret=0
2021-07-27T21:11:19.597 INFO:tasks.workunit.client.0.smithi110.stderr:+ [[ 0 -ne 13 ]]
2021-07-27T21:11:19.597 INFO:tasks.workunit.client.0.smithi110.stderr:+ echo 'Error: Expected return 13, got 0'
2021-07-27T21:11:19.598 INFO:tasks.workunit.client.0.smithi110.stdout:Error: Expected return 13, got 0
2021-07-27T21:11:19.598 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-07-27T21:11:19.599 INFO:tasks.workunit.client.0.smithi110.stderr:+ [[ 1 -eq 1 ]]
2021-07-27T21:11:19.599 INFO:tasks.workunit.client.0.smithi110.stderr:+ exit 1
</pre>
<p>/a/yuriw-2021-07-27_17:19:39-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6297204</p>
RADOS - Bug #51168 (New): ceph-osd state machine crash during peering process
https://tracker.ceph.com/issues/51168
2021-06-10T15:58:50Z
Yao Ning
zay11022@163.com
<pre>
[root@ceph-128 ~]# ceph crash info 2021-06-10_15:30:01.935970Z_f5d8ab5b-8613-408b-ac22-75425c5cf672
{
"os_version_id": "7",
"assert_condition": "abort",
"utsname_release": "3.10.0-1127.el7.x86_64",
"os_name": "CentOS Linux",
"entity_name": "osd.43",
"assert_file": "/builddir/build/BUILD/ceph-14.2.18/src/osd/PG.cc",
"timestamp": "2021-06-10 15:30:01.935970Z",
"process_name": "ceph-osd",
"utsname_machine": "x86_64",
"assert_line": 7274,
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "tp_osd_tp",
"utsname_version": "#1 SMP Tue Mar 31 23:36:51 UTC 2020",
"backtrace": [
"(()+0xf630) [0x7f0101be1630]",
"(gsignal()+0x37) [0x7f01009d4387]",
"(abort()+0x148) [0x7f01009d5a78]",
"(ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x1a5) [0x55859f824b98]",
"(PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0xc3) [0x55859f9d5643]",
"(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::deep_construct(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>* const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>&)+0x36) [0x55859fa2b4e6]",
"(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1d6) [0x55859fa2be26]",
"(boost::statechart::simple_state<PG::RecoveryState::RepNotRecovering, PG::RecoveryState::ReplicaActive, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xd3) [0x55859fa2b483]",
"(PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x2dd) [0x55859f9efe0d]",
"(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) [0x55859f92c2c4]",
"(PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x51) [0x55859fb94951]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x914) [0x55859f920d84]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55859fedcfe6]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55859fedfb00]",
"(()+0x7ea5) [0x7f0101bd9ea5]",
"(clone()+0x6d) [0x7f0100a9c96d]"
],
"utsname_hostname": "ceph-138",
"assert_msg": "/builddir/build/BUILD/ceph-14.2.18/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7f00daa4b700 time 2021-06-10 23:30:01.927379\n/builddir/build/BUILD/ceph-14.2.18/src/osd/PG.cc: 7274: ceph_abort_msg(\"we got a bad state machine event\")\n",
"crash_id": "2021-06-10_15:30:01.935970Z_f5d8ab5b-8613-408b-ac22-75425c5cf672",
"assert_func": "PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)",
"ceph_version": "14.2.18"
}
</pre>
RADOS - Bug #49961 (New): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
https://tracker.ceph.com/issues/49961
2021-03-24T19:25:12Z
Neha Ojha
nojha@redhat.com
<pre>
2021-03-24T19:11:12.800 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:108: TEST_recovery_scrub_1: for osd in $(seq 0 $(expr $OSDS - 1))
2021-03-24T19:11:12.800 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:110: TEST_recovery_scrub_1: grep -q 'not scheduling scrubs due to active recovery' td/osd-recovery-scrub/osd.3.log
2021-03-24T19:11:12.801 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:112: TEST_recovery_scrub_1: found=true
2021-03-24T19:11:12.802 INFO:tasks.workunit.client.0.smithi122.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:113: TEST_recovery_scrub_1: expr 2 + 1
2021-03-24T19:11:12.804 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:113: TEST_recovery_scrub_1: count=3
2021-03-24T19:11:12.804 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:116: TEST_recovery_scrub_1: '[' true = false ']'
2021-03-24T19:11:12.804 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:120: TEST_recovery_scrub_1: '[' 3 -eq 4 ']'
2021-03-24T19:11:12.805 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:120: TEST_recovery_scrub_1: return 1
2021-03-24T19:11:12.805 INFO:tasks.workunit.client.0.smithi122.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-recovery-scrub.sh:31: run: return 1
</pre>
<p>/a/nojha-2021-03-24_17:35:55-rados-wip-38044-distro-basic-smithi/5994449</p>