RADOS: Activity
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-29T03:05:25Z
Ceph
Redmine
Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
https://tracker.ceph.com/issues/59196#change-258646
2024-03-29T03:05:25Z
Brad Hubbard
bhubbard@redhat.com
<p>Closing <a class="external" href="https://github.com/ceph/ceph/pull/55596">https://github.com/ceph/ceph/pull/55596</a> in favour of <a class="external" href="https://github.com/ceph/ceph/pull/56574">https://github.com/ceph/ceph/pull/56574</a></p>
Bug #65185: OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
https://tracker.ceph.com/issues/65185#change-258564
2024-03-28T11:14:02Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<p>The OSDMap CRC issue is clearly there but I'm not sure / I doubt it can explain the scrub error.<br />Let's ask Ronen for taking a look.<br />In the meantime we can rerun with squid backport of the CRC fix inside (<a class="external" href="https://github.com/ceph/ceph/pull/56553">https://github.com/ceph/ceph/pull/56553</a>).</p>
Bug #65186: OSDs unreachable in upgrade test
https://tracker.ceph.com/issues/65186#change-258563
2024-03-28T11:03:42Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<pre>
2024-03-22T07:19:18.414630+0000 mon.a (mon.0) 14 : cluster 4 Health check failed: 8 osds(s) are not reachable (OSD_UNREACHABLE)
</pre>
<p>Hmm, unreachable OSDs. This in turn could be potentially explained with overload caused by exchange of full maps.<br />We can rerun with squid backport of the CRC fix inside (<a class="external" href="https://github.com/ceph/ceph/pull/56553">https://github.com/ceph/ceph/pull/56553</a>).</p>
Backport #65198: squid: Failed to encode map X with expected CRC
https://tracker.ceph.com/issues/65198#change-258562
2024-03-28T11:02:15Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/56553">https://github.com/ceph/ceph/pull/56553</a></p>
Backport #65198 (In Progress): squid: Failed to encode map X with expected CRC
https://tracker.ceph.com/issues/65198#change-258561
2024-03-28T10:31:39Z
Radoslaw Zarzynski
rzarzyns@redhat.com
Backport #65198 (In Progress): squid: Failed to encode map X with expected CRC
https://tracker.ceph.com/issues/65198
2024-03-28T10:29:12Z
Backport Bot
Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
https://tracker.ceph.com/issues/64824#change-258513
2024-03-28T07:03:47Z
yite gu
<pre>
-31> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 _scrub last_key (kv,35617) scrubbed_keys 100 has_next 1
-30> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 15 mon.d@0(leader) e10 scrub_reset_timeout reset timeout event
-29> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 _ms_dispatch existing session 0x55888a354240 for mon.0
-28> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 entity_name global_id 0 (none) caps allow *
-27> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 5 mon.d@0(leader).paxos(paxos active c 40569131..40569761) is_readable = 1 - now=2024-03-28T06:42:38.171076+0000 lease_expire=2024-03-28T06:42:40.652792+0000 has v0 lc 40569761
-26> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader).log v19867627 preprocess_query log(1 entries from seq 21188 at 2024-03-28T06:42:38.169947+0000) v1 from mon.0 v2:10.6.153.24:3300/0
-25> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader).log v19867627 preprocess_log log(1 entries from seq 21188 at 2024-03-28T06:42:38.169947+0000) v1 from mon.0
-24> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 is_capable service=log command= write addr - on cap allow *
-23> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow so far , doing grant allow *
-22> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow all
-21> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader).log v19867627 prepare_update log(1 entries from seq 21188 at 2024-03-28T06:42:38.169947+0000) v1 from mon.0 v2:10.6.153.24:3300/0
-20> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader).log v19867627 prepare_log log(1 entries from seq 21188 at 2024-03-28T06:42:38.169947+0000) v1 from mon.0
-19> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader).log v19867627 logging 2024-03-28T06:42:38.169947+0000 mon.d (mon.0) 21188 : cluster [DBG] scrub ok on 0,1,2: ScrubResult(keys {auth=100} crc {auth=275392284})
-18> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 _ms_dispatch existing session 0x55888a354480 for mon.2
-17> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 entity_name global_id 0 (none) caps allow *
-16> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 is_capable service=mon command= read addr v2:10.6.153.26:3300/0 on cap allow *
-15> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow so far , doing grant allow *
-14> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow all
-13> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader) e10 handle_scrub mon_scrub(result v 40569761 ScrubResult(keys {auth=26,config=2,health=12,kv=60} crc {auth=4123610191,config=876675216,health=2092533985,kv=2385793469}) num_keys 100 key (kv,35617)) v2
-12> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 15 mon.d@0(leader) e10 scrub_reset_timeout reset timeout event
-11> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 _ms_dispatch existing session 0x55888a11f8c0 for mon.1
-10> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 mon.d@0(leader) e10 entity_name global_id 0 (none) caps allow *
-9> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 is_capable service=mon command= read addr v2:10.6.153.23:3300/0 on cap allow *
-8> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow so far , doing grant allow *
-7> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 20 allow all
-6> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader) e10 handle_scrub mon_scrub(result v 40569761 ScrubResult(keys {auth=26,config=2,health=12,kv=60} crc {auth=4123610191,config=876675216,health=2092533985,kv=2385793469}) num_keys 100 key (kv,35617)) v2
-5> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 15 mon.d@0(leader) e10 scrub_reset_timeout reset timeout event
-4> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader) e10 scrub_check_results
-3> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2: ScrubResult(keys {auth=26,config=2,health=12,kv=60} crc {auth=4123610191,config=876675216,health=2092533985,kv=2385793469})
-2> 2024-03-28T06:42:38.170+0000 7f2c51bc8700 10 mon.d@0(leader) e10 _scrub start (kv,35617) num_keys 100
-1> 2024-03-28T06:42:38.172+0000 7f2c51bc8700 -1 /root/rpmbuild/BUILD/ceph-16.2.14-2/src/mon/Monitor.cc: In function 'bool Monitor::_scrub(ScrubResult*, std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, int*)' thread 7f2c51bc8700 time 2024-03-28T06:42:38.172021+0000
/root/rpmbuild/BUILD/ceph-16.2.14-2/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
ceph version 16.2.14-2 (8ee3a81f5d70c1cd5b0b8c5cfade8580a0025906) pacific (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f2c5f726fcc]
2: /usr/lib64/ceph/libceph-common.so.2(+0x27a1e6) [0x7f2c5f7271e6]
3: (Monitor::_scrub(ScrubResult*, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*, int*)+0x14bf) [0x5588862375bf]
4: (Monitor::scrub()+0x3b8) [0x55888623b198]
5: (Monitor::handle_scrub(boost::intrusive_ptr<MonOpRequest>)+0x470) [0x55888623b9c0]
6: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdc0) [0x55888625c090]
7: (Monitor::_ms_dispatch(Message*)+0x670) [0x55888625d080]
8: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55888628c2bc]
9: (DispatchQueue::entry()+0x126a) [0x7f2c5f96eb7a]
10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f2c5fa231c1]
11: /lib64/libpthread.so.0(+0x81ca) [0x7f2c5d4531ca]
12: clone()
</pre>
Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
https://tracker.ceph.com/issues/64972#change-258503
2024-03-27T22:35:35Z
Patrick Donnelly
pdonnell@redhat.com
<p>Laura Flores wrote:</p>
<blockquote>
<p>Strange, the syntax in the text snippet works in a vstart cluster:<br />[...]</p>
</blockquote>
<p>The issue, I believe, is that the QA suite code is using the new tell syntax on an older octopus cluster.</p>
Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
https://tracker.ceph.com/issues/64972#change-258500
2024-03-27T22:09:09Z
Laura Flores
<p>Strange, the syntax in the text snippet works in a vstart cluster:<br /><pre><code class="text syntaxhl"><span class="CodeRay">$ ./bin/ceph tell 4.74 deep-scrub
{
"deep": true,
"must": true,
"stamp": "0.000000"
}
</span></code></pre></p>
<p>So far I've verified on main and squid that this works. So it's possible it doesn't work on an older version.</p>
Bug #65186: OSDs unreachable in upgrade test
https://tracker.ceph.com/issues/65186#change-258498
2024-03-27T20:33:52Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7615991</p>
Bug #65185: OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
https://tracker.ceph.com/issues/65185#change-258495
2024-03-27T20:31:38Z
Laura Flores
<p>Laura Flores wrote:</p>
<blockquote>
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz<br />[...]</p>
<p>More in this run: <a class="external" href="https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/">https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/</a></p>
</blockquote>
<p>Note the "failed to encode expected crc" log messages right before this. Could it be related?<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T09:19:46.882532+0000 osd.4 (osd.4) 3037 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.883679+0000 osd.5 (osd.5) 1588 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884492+0000 osd.4 (osd.4) 3038 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884932+0000 osd.7 (osd.7) 929 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.885034+0000 osd.5 (osd.5) 1589 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.941945+0000 osd.6 (osd.6) 310 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.452680+0000 mgr.y (mgr.25274) 3825 : cluster 0 pgmap v7002: 129 pgs: 1 active+undersized+remapped, 1 active+clean+inconsistent, 127 active+clean; 308 MiB data, 2.3 GiB used, 713 GiB / 715 GiB avail
2024-03-22T09:19:46.882532+0000 osd.4 (osd.4) 3037 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.883679+0000 osd.5 (osd.5) 1588 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884492+0000 osd.4 (osd.4) 3038 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.887150+0000 mon.a (mon.0) 7841 : cluster 0 osdmap e3583: 8 total, 8 up, 8 in
2024-03-22T09:19:46.884932+0000 osd.7 (osd.7) 929 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.885034+0000 osd.5 (osd.5) 1589 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.941945+0000 osd.6 (osd.6) 310 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.452680+0000 mgr.y (mgr.25274) 3825 : cluster 0 pgmap v7002: 129 pgs: 1 active+undersized+remapped, 1 active+clean+inconsistent, 127 active+clean; 308 MiB data, 2.3 GiB used, 713 GiB / 715 GiB avail
2024-03-22T09:19:47.887150+0000 mon.a (mon.0) 7841 : cluster 0 osdmap e3583: 8 total, 8 up, 8 in
2024-03-22T09:19:48.262121+0000 mon.a (mon.0) 7843 : cluster 4 Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
2024-03-22T09:19:48.262163+0000 mon.a (mon.0) 7844 : cluster 4 Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2024-03-22T09:19:48.262121+0000 mon.a (mon.0) 7843 : cluster 4 Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
2024-03-22T09:19:48.262163+0000 mon.a (mon.0) 7844 : cluster 4 Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
</span></code></pre></p>
Bug #65186: OSDs unreachable in upgrade test
https://tracker.ceph.com/issues/65186#change-258494
2024-03-27T20:29:17Z
Laura Flores
<p>Possibly a dupe of the related tracker (crc encoding issues)</p>
Bug #65186 (New): OSDs unreachable in upgrade test
https://tracker.ceph.com/issues/65186
2024-03-27T20:28:19Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616011/remote/smithi087/log/a8e8c570-e819-11ee-95cd-87774f69a715<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T07:19:18.215315+0000 mon.a (mon.0) 10 : cluster 0 Standby manager daemon x restarted
2024-03-22T07:19:18.215450+0000 mon.a (mon.0) 11 : cluster 0 Standby manager daemon x started
2024-03-22T07:19:18.215315+0000 mon.a (mon.0) 10 : cluster 0 Standby manager daemon x restarted
2024-03-22T07:19:18.215450+0000 mon.a (mon.0) 11 : cluster 0 Standby manager daemon x started
2024-03-22T07:19:18.277027+0000 mon.a (mon.0) 12 : cluster 0 mgrmap e33: y(active, since 63s), standbys: x
2024-03-22T07:19:18.414028+0000 mon.a (mon.0) 13 : cluster 1 Active manager daemon y restarted
2024-03-22T07:19:18.414630+0000 mon.a (mon.0) 14 : cluster 4 Health check failed: 8 osds(s) are not reachable (OSD_UNREACHABLE)
2024-03-22T07:19:18.414953+0000 mon.a (mon.0) 15 : cluster 1 Activating manager daemon y
2024-03-22T07:19:18.427127+0000 mon.a (mon.0) 16 : cluster 0 osdmap e81: 8 total, 8 up, 8 in
2024-03-22T07:19:18.277027+0000 mon.a (mon.0) 12 : cluster 0 mgrmap e33: y(active, since 63s), standbys: x
2024-03-22T07:19:18.427673+0000 mon.a (mon.0) 17 : cluster 0 mgrmap e34: y(active, starting, since 0.0129348s), standbys: x
2024-03-22T07:19:18.414028+0000 mon.a (mon.0) 13 : cluster 1 Active manager daemon y restarted
2024-03-22T07:19:18.433869+0000 osd.4 (osd.4) 3 : cluster 3 failed to encode map e81 with expected crc
2024-03-22T07:19:18.435418+0000 osd.2 (osd.2) 3 : cluster 3 failed to encode map e81 with expected crc
2024-03-22T07:19:18.414630+0000 mon.a (mon.0) 14 : cluster 4 Health check failed: 8 osds(s) are not reachable (OSD_UNREACHABLE)
2024-03-22T07:19:18.443967+0000 osd.4 (osd.4) 4 : cluster 3 failed to encode map e81 with expected crc
</span></code></pre></p>
<p>Likely connected to <a class="external" href="https://tracker.ceph.com/issues/63389">https://tracker.ceph.com/issues/63389</a>.</p>
Bug #65185 (New): OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
https://tracker.ceph.com/issues/65185
2024-03-27T20:21:29Z
Laura Flores
<p>/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz<br /><pre><code class="text syntaxhl"><span class="CodeRay">2024-03-22T09:20:00.000187+0000 mon.a (mon.0) 7863 : cluster 4 [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
2024-03-22T09:20:00.000194+0000 mon.a (mon.0) 7864 : cluster 4 [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
2024-03-22T09:19:59.897409+0000 mon.a (mon.0) 7860 : cluster 0 osdmap e3595: 8 total, 8 up, 8 in
2024-03-22T09:20:00.000202+0000 mon.a (mon.0) 7865 : cluster 4 pg 103.14 is active+clean+inconsistent, acting [5,1,2]
2024-03-22T09:20:00.000151+0000 mon.a (mon.0) 7861 : cluster 4 Health detail: HEALTH_ERR noscrub flag(s) set; 1 scrub errors; Possible data damage: 1 pg inconsistent
</span></code></pre></p>
<p>More in this run: <a class="external" href="https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/">https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/</a></p>
Bug #65183 (Fix Under Review): Overriding an EC pool needs the "--yes-i-really-mean-it" flag in a...
https://tracker.ceph.com/issues/65183#change-258473
2024-03-27T16:43:57Z
Radoslaw Zarzynski
rzarzyns@redhat.com