Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-26T01:42:35Z
Ceph
Redmine
crimson - Bug #65130 (Fix Under Review): crimson: crimson-rados did not detect reintroduction of ...
https://tracker.ceph.com/issues/65130
2024-03-26T01:42:35Z
Samuel Just
sjust@redhat.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/56376">https://github.com/ceph/ceph/pull/56376</a> would have reintroduced <a class="external" href="https://tracker.ceph.com/issues/61875">https://tracker.ceph.com/issues/61875</a> as it puts the snap mapper keys back into the pg meta object. Oddly, a teuthology run on that branch which seems to have included tests with both snapshots and osd restarts did not show crashes associated with this regression and at least one case that seems like it should have exercised the relevant code passed. A quick glance over PGLog.cc::FuturizedShardStoreReader doesn't show any changes, so it should have crashed in the final else branch of FuturizedShardStoreLogReader::process_entry at e.decode_with_checksum.</p>
<p>Tasks:<br />- Confirm that the crimson-rados suite actually combines snapshots with OSD restarts<br />- Work out why the existing suite didn't fail the above PR<br />- Amend the tests to cover the gap</p>
CephFS - Bug #65039 (Triaged): mds: standby-replay segmentation fault in md_log_replay
https://tracker.ceph.com/issues/65039
2024-03-21T14:19:46Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
2024-03-21T03:15:55.310 INFO:journalctl@ceph.mds.h.smithi060.stdout:Mar 21 03:15:55 smithi060 ceph-87dd0fc6-e72e-11ee-95c9-87774f69a715-mds-h[71557]: *** Caught signal (Segmentation fault) **
2024-03-21T03:15:55.310 INFO:journalctl@ceph.mds.h.smithi060.stdout:Mar 21 03:15:55 smithi060 ceph-87dd0fc6-e72e-11ee-95c9-87774f69a715-mds-h[71557]: in thread 7f7135d7c700 thread_name:md_log_replay
</pre>
<p>From: /teuthology/pdonnell-2024-03-21_02:37:43-fs:workload-main-distro-default-smithi/7614435/teuthology.log</p>
<p>I logged into the machine and collected a gdb stack trace (attached). Initially I was looking for a deadlock not a segmentation fault. The signal handler for SIGSEGV got deadlocked (predictably) because it was using malloc:</p>
<pre>
Thread 26 (Thread 0x7f7135d7c700 (LWP 72204)):
#0 0x00007f7148e163d0 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1 0x00007f7148e162d3 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2 0x00007f7148e05a55 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#3 0x00007f7148e093e3 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) () from /lib64/libtcmalloc.so.4
#4 0x00007f71484409b3 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*> () from /usr/lib64/ceph/libceph-common.so.2
#5 0x00007f7148440aa9 in ceph::ClibBackTrace::demangle[abi:cxx11](char const*) () from /usr/lib64/ceph/libceph-common.so.2
#6 0x00007f7148441025 in ceph::ClibBackTrace::print(std::ostream&) const () from /usr/lib64/ceph/libceph-common.so.2
#7 0x000055c9ae7266dd in handle_oneshot_fatal_signal (signum=11) at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/global/signal_handler.cc:331
#8 <signal handler called>
#9 0x00007f7148e05603 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /lib64/libtcmalloc.so.4
#10 0x00007f7148e058ae in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#11 0x00007f7148e05971 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#12 0x00007f7148e093e3 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) () from /lib64/libtcmalloc.so.4
#13 0x000055c9ae311e17 in EMetaBlob::fullbit::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&) () at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/include/compact_map.h:27
#14 0x000055c9ae31429d in EMetaBlob::dirlump::_decode_bits (this=0x55c9b25c9770) at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/mds/events/EMetaBlob.h:609
#15 0x000055c9ae31c397 in EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*) () at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/mds/events/EMetaBlob.h:296
#16 0x000055c9ae322551 in EUpdate::replay(MDSRank*) () at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/mds/journal.cc:2252
#17 0x000055c9ae64dd97 in MDLog::_replay_thread (this=0x55c9b18e6000) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_ptr.h:421
#18 0x000055c9ae6543b1 in MDLog::ReplayThread::entry (this=<optimized out>) at /usr/src/debug/ceph-19.0.0-2244.gcab8141b.el8.x86_64/src/mds/MDLog.h:181
#19 0x00007f71471331ca in start_thread () from /lib64/libpthread.so.0
#20 0x00007f71456308d3 in clone () from /lib64/libc.so.6
</pre>
<p>Unfortunately, I didn't get a chance to dig into frame <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> to see why it segfaulted.</p>
CephFS - Bug #65020 (Triaged): qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-...
https://tracker.ceph.com/issues/65020
2024-03-21T00:37:10Z
Patrick Donnelly
pdonnell@redhat.com
<p><a class="external" href="https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612910/">https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612910/</a></p>
<p>and many others.</p>
<p>More fallout after <a class="external" href="https://github.com/ceph/ceph/pull/55455">https://github.com/ceph/ceph/pull/55455</a> was merged.</p>
crimson - Bug #64975 (In Progress): crimson: Health check failed: 9 scrub errors (OSD_SCRUB_ERR...
https://tracker.ceph.com/issues/64975
2024-03-18T21:45:04Z
Samuel Just
sjust@redhat.com
<p>ERROR 2024-03-15 10:04:01,561 [shard 1:main] osd - pg_epoch 198 pg[2.2( empty local-lis/les=11/12 n=0 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=0 lpr=11 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep PGScrubber::emit_chunk_result: Scrub errors found. range: start: MIN, end: MAX, result: chunk_result_t(num_scrub_errors: 1, num_deep_scrub_errors: 0, snapset_errors: [[]], object_errors: [[inconsistent_obj_t(error: , object: //snapmapper, version: 0, shards: {osd_shard_t(osd: 0, shard: -1): shard_info_t(error: INFO_MISSING, size: 0, omap_digest_present: true, omap_digest: 4294967295, data_digest_present: true, data_digest: 4294967295, selected_io: false, primary: true), osd_shard_t(osd: 1, shard: -1): shard_info_t(error: INFO_MISSING, size: 0, omap_digest_present: true, omap_digest: 4294967295, data_digest_present: true, data_digest: 4294967295, selected_io: false, primary: false)}, union_shards: INFO_MISSING)]])</p>
<p>The bug is that scrub isn't skipping the snap mapper object.</p>
rgw - Bug #64841 (New): java_s3tests: testObjectCreateBadExpectMismatch failure
https://tracker.ceph.com/issues/64841
2024-03-11T17:06:12Z
Casey Bodley
cbodley@redhat.com
<p>ex. <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589576/teuthology.log">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589576/teuthology.log</a><br /><pre>
2024-03-10T16:40:31.654 INFO:teuthology.orchestra.run.smithi060.stdout:suite > Object tests > ObjectTest.testObjectCreateBadExpectMismatch STARTED
2024-03-10T16:40:32.455 INFO:teuthology.orchestra.run.smithi060.stdout:
2024-03-10T16:40:32.455 INFO:teuthology.orchestra.run.smithi060.stdout:suite > Object tests > ObjectTest.testObjectCreateBadExpectMismatch FAILED
2024-03-10T16:40:32.455 INFO:teuthology.orchestra.run.smithi060.stdout: com.amazonaws.services.s3.model.AmazonS3Exception at ObjectTest.java:525
</pre></p>
<p>scanning the rgw log <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589576/remote/smithi060/log/rgw.ceph.client.0.log.gz">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589576/remote/smithi060/log/rgw.ceph.client.0.log.gz</a> i see this sequence of requests:<br /><pre>
2024-03-10T16:40:31.653+0000 CreateBucket test-c2dd4bb1-7d2b-431d-a7b0-200c96d8349313
2024-03-10T16:40:32.445+0000 status ok
2024-03-10T16:40:33.409+0000 ListObjectVersions
2024-03-10T16:40:34.465+0000 status ok
2024-03-10T16:40:34.469+0000 a4d1c640 1 failed to read header: bad method
2024-03-10T16:40:34.885+0000 ListObjectVersions
2024-03-10T16:40:35.645+0000 status ok
2024-03-10T16:40:35.649+0000 DeleteBucket
</pre></p>
<p>that "bad method" error corresponds to the test's PutObject request that's supposed to pass `Expect: 200`. the "bad method" error comes from beast's http parser when it's trying to read the next request from a connection. i assume this means we either read too many bytes from the previous request, or too few</p>
<p>there are several other occurrances of this "bad method" error, but none led to test failures. scanning the rgw log of successful java_s3tests runs, i don't see any "bad method" errors</p>
rgw - Bug #64835 (New): valgrind invalid read related to D3nDataCache::d3n_libaio_create_write_re...
https://tracker.ceph.com/issues/64835
2024-03-11T15:43:14Z
Casey Bodley
cbodley@redhat.com
<p>d3n datacache job failing with valgrind issue</p>
<p>example job: <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589588/teuthology.log">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589588/teuthology.log</a></p>
<p>valgrind log: <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589588/remote/smithi082/log/valgrind/ceph.client.0.log.gz">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:50:40-rgw-wip-cbodley2-testing-distro-default-smithi/7589588/remote/smithi082/log/valgrind/ceph.client.0.log.gz</a></p>
<pre><code class="xml syntaxhl"><span class="CodeRay"><span class="tag"><error></span>
<span class="tag"><unique></span>0x795ca<span class="tag"></unique></span>
<span class="tag"><tid></span>1<span class="tag"></tid></span>
<span class="tag"><kind></span>InvalidRead<span class="tag"></kind></span>
<span class="tag"><what></span>Invalid read of size 8<span class="tag"></what></span>
<span class="tag"><stack></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x66B9848<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>free_res<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x66B98C1<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>__libc_freeres<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x483F1B2<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/libexec/valgrind/vgpreload_core-amd64-linux.so<span class="tag"></obj></span>
<span class="tag"><fn></span>_vgnU_freeres<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6543551<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>__run_exit_handlers<span class="tag"></fn></span>
<span class="tag"><dir></span>./stdlib/./stdlib<span class="tag"></dir></span>
<span class="tag"><file></span>exit.c<span class="tag"></file></span>
<span class="tag"><line></span>136<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x654360F<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>exit<span class="tag"></fn></span>
<span class="tag"><dir></span>./stdlib/./stdlib<span class="tag"></dir></span>
<span class="tag"><file></span>exit.c<span class="tag"></file></span>
<span class="tag"><line></span>143<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6527D96<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>(below main)<span class="tag"></fn></span>
<span class="tag"><dir></span>./csu/../sysdeps/nptl<span class="tag"></dir></span>
<span class="tag"><file></span>libc_start_call_main.h<span class="tag"></file></span>
<span class="tag"><line></span>74<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"></stack></span>
<span class="tag"><auxwhat></span>Address 0x11fda240 is 16 bytes inside a block of size 64 free'd<span class="tag"></auxwhat></span>
<span class="tag"><stack></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x484B27F<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so<span class="tag"></obj></span>
<span class="tag"><fn></span>free<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x66B9853<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>free_res<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x66B98C1<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>__libc_freeres<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x483F1B2<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/libexec/valgrind/vgpreload_core-amd64-linux.so<span class="tag"></obj></span>
<span class="tag"><fn></span>_vgnU_freeres<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6543551<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>__run_exit_handlers<span class="tag"></fn></span>
<span class="tag"><dir></span>./stdlib/./stdlib<span class="tag"></dir></span>
<span class="tag"><file></span>exit.c<span class="tag"></file></span>
<span class="tag"><line></span>136<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x654360F<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>exit<span class="tag"></fn></span>
<span class="tag"><dir></span>./stdlib/./stdlib<span class="tag"></dir></span>
<span class="tag"><file></span>exit.c<span class="tag"></file></span>
<span class="tag"><line></span>143<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6527D96<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>(below main)<span class="tag"></fn></span>
<span class="tag"><dir></span>./csu/../sysdeps/nptl<span class="tag"></dir></span>
<span class="tag"><file></span>libc_start_call_main.h<span class="tag"></file></span>
<span class="tag"><line></span>74<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"></stack></span>
<span class="tag"><auxwhat></span>Block was alloc'd at<span class="tag"></auxwhat></span>
<span class="tag"><stack></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x48487A9<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so<span class="tag"></obj></span>
<span class="tag"><fn></span>malloc<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x659CCA7<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>get_elem<span class="tag"></fn></span>
<span class="tag"><dir></span>./rt/./rt<span class="tag"></dir></span>
<span class="tag"><file></span>aio_misc.c<span class="tag"></file></span>
<span class="tag"><line></span>140<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x659CCA7<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>__aio_enqueue_request<span class="tag"></fn></span>
<span class="tag"><dir></span>./rt/./rt<span class="tag"></dir></span>
<span class="tag"><file></span>aio_misc.c<span class="tag"></file></span>
<span class="tag"><line></span>352<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x659D8B1<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/lib/x86_64-linux-gnu/libc.so.6<span class="tag"></obj></span>
<span class="tag"><fn></span>aio_write@@GLIBC_2.34<span class="tag"></fn></span>
<span class="tag"><dir></span>./rt/./rt<span class="tag"></dir></span>
<span class="tag"><file></span>aio_write.c<span class="tag"></file></span>
<span class="tag"><line></span>35<span class="tag"></line></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x972674<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>D3nDataCache::d3n_libaio_create_write_request(ceph::buffer::v15_2_0::list<span class="entity">&amp;</span>, unsigned int, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span>)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x980773<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>D3nDataCache::put(ceph::buffer::v15_2_0::list<span class="entity">&amp;</span>, unsigned int, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span><span class="entity">&amp;</span>)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9C81DE<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>get_obj_data::flush(rgw::OwningList<span class="entity">&lt;</span>rgw::AioResultEntry<span class="entity">&gt;</span><span class="entity">&amp;</span><span class="entity">&amp;</span>)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9D8B3A<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x7FD4B5<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWGetObj::execute(optional_yield)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6B50EB<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>rgw_process_authenticated(RGWHandler_REST*, RGWOp*<span class="entity">&amp;</span>, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6B911D<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>process_request(RGWProcessEnv const<span class="entity">&amp;</span>, RGWRequest*, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span> const<span class="entity">&amp;</span>, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span>*, std::chrono::duration<span class="entity">&lt;</span>unsigned long, std::ratio<span class="entity">&lt;</span>1l, 1000000000l<span class="entity">&gt;</span> <span class="entity">&gt;</span>*, int*)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x119BAB6<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x620A53<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x123DB86<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>make_fcontext<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"></stack></span>
<span class="tag"></error></span>
</span></code></pre>
rgw - Bug #64832 (New): valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csv
https://tracker.ceph.com/issues/64832
2024-03-11T15:19:43Z
Casey Bodley
cbodley@redhat.com
<p>from rgw/verify job: <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/teuthology.log">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/teuthology.log</a><br />valgrind log: <a class="external" href="http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/remote/smithi154/log/valgrind/ceph.client.0.log.gz">http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/remote/smithi154/log/valgrind/ceph.client.0.log.gz</a></p>
<pre><code class="xml syntaxhl"><span class="CodeRay"><span class="tag"><error></span>
<span class="tag"><unique></span>0xacb11<span class="tag"></unique></span>
<span class="tag"><tid></span>298<span class="tag"></tid></span>
<span class="tag"><threadname></span>io_context_pool<span class="tag"></threadname></span>
<span class="tag"><kind></span>UninitCondition<span class="tag"></kind></span>
<span class="tag"><what></span>Conditional jump or move depends on uninitialised value(s)<span class="tag"></what></span>
<span class="tag"><stack></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x8F4C10<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x8F76E1<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list<span class="entity">&amp;</span>, long, long)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x7B3BD6<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list<span class="entity">&amp;</span>, long, long)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9D47E7<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>get_obj_data::flush(rgw::OwningList<span class="entity">&lt;</span>rgw::AioResultEntry<span class="entity">&gt;</span><span class="entity">&amp;</span><span class="entity">&amp;</span>)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x9E552A<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x7FD455<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWGetObj::execute(optional_yield)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x8F9E20<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>RGWSelectObj_ObjStore_S3::execute(optional_yield)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6BC4CB<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>rgw_process_authenticated(RGWHandler_REST*, RGWOp*<span class="entity">&amp;</span>, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x6C054D<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>process_request(RGWProcessEnv const<span class="entity">&amp;</span>, RGWRequest*, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span> const<span class="entity">&amp;</span>, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<span class="entity">&lt;</span>char, std::char_traits<span class="entity">&lt;</span>char<span class="entity">&gt;</span>, std::allocator<span class="entity">&lt;</span>char<span class="entity">&gt;</span> <span class="entity">&gt;</span>*, std::chrono::duration<span class="entity">&lt;</span>unsigned long, std::ratio<span class="entity">&lt;</span>1l, 1000000000l<span class="entity">&gt;</span> <span class="entity">&gt;</span>*, int*)<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x5E60F0<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x62757F<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"></frame></span>
<span class="tag"><frame></span>
<span class="tag"><ip></span>0x12591A6<span class="tag"></ip></span>
<span class="tag"><obj></span>/usr/bin/radosgw<span class="tag"></obj></span>
<span class="tag"><fn></span>make_fcontext<span class="tag"></fn></span>
<span class="tag"></frame></span>
<span class="tag"></stack></span>
<span class="tag"></error></span>
</span></code></pre>
rgw - Bug #64571 (New): lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571
2024-02-26T15:58:44Z
Casey Bodley
cbodley@redhat.com
<p>regression from <a class="external" href="https://github.com/ceph/ceph/pull/52114">https://github.com/ceph/ceph/pull/52114</a>, whose test results included these failures <a class="external" href="https://pulpito.ceph.com/yuvalif-2024-01-30_08:46:48-rgw-wip-end2end-tracing-distro-default-smithi/">https://pulpito.ceph.com/yuvalif-2024-01-30_08:46:48-rgw-wip-end2end-tracing-distro-default-smithi/</a></p>
<p>example rgw/lifecycle job from <a class="external" href="http://qa-proxy.ceph.com/teuthology/yuriw-2024-02-23_19:55:04-rgw-main-distro-default-smithi/7572634/teuthology.log">http://qa-proxy.ceph.com/teuthology/yuriw-2024-02-23_19:55:04-rgw-main-distro-default-smithi/7572634/teuthology.log</a><br /><pre>
2024-02-24T03:30:13.667 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_transition_set_invalid_date PASSED [ 76%]
2024-02-24T03:30:23.431 INFO:tasks.rgw.client.0.smithi171.stderr:daemon-helper: command crashed with signal 11
2024-02-24T03:30:28.964 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2024-02-24T03:30:34.468 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~6s
2024-02-24T03:30:39.972 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~11s
2024-02-24T03:30:45.475 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~17s
2024-02-24T03:30:50.979 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~22s
2024-02-24T03:30:56.483 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~28s
2024-02-24T03:31:01.986 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~33s
2024-02-24T03:31:05.290 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_transition FAILED [ 76%]
2024-02-24T03:31:05.291 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_transition ERROR [ 76%]
2024-02-24T03:31:07.490 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~39s
2024-02-24T03:31:12.994 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~44s
2024-02-24T03:31:18.497 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~50s
2024-02-24T03:31:20.174 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_transition_single_rule_multi_trans FAILED [ 76%]
2024-02-24T03:31:20.174 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_transition_single_rule_multi_trans ERROR [ 76%]
2024-02-24T03:31:24.001 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~55s
2024-02-24T03:31:29.506 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~61s
2024-02-24T03:31:35.009 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~66s
2024-02-24T03:31:36.263 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_set_noncurrent_transition FAILED [ 76%]
2024-02-24T03:31:36.263 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_set_noncurrent_transition ERROR [ 76%]
2024-02-24T03:31:40.513 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~72s
2024-02-24T03:31:46.017 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~77s
2024-02-24T03:31:48.736 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_noncur_transition FAILED [ 77%]
2024-02-24T03:31:48.736 INFO:teuthology.orchestra.run.smithi171.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_noncur_transition ERROR [ 77%]
</pre></p>
<p>example rgw/cloud-transition job from <a class="external" href="http://qa-proxy.ceph.com/teuthology/yuriw-2024-02-23_19:55:04-rgw-main-distro-default-smithi/7572619/teuthology.log">http://qa-proxy.ceph.com/teuthology/yuriw-2024-02-23_19:55:04-rgw-main-distro-default-smithi/7572619/teuthology.log</a><br /><pre>
2024-02-24T03:12:02.182 INFO:tasks.rgw.client.0.smithi106.stderr:daemon-helper: command crashed with signal 11
2024-02-24T03:12:05.128 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2024-02-24T03:12:10.732 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~6s
2024-02-24T03:12:16.337 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~11s
2024-02-24T03:12:21.947 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~17s
2024-02-24T03:12:27.553 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~22s
2024-02-24T03:12:33.157 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~28s
2024-02-24T03:12:38.761 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~34s
2024-02-24T03:12:44.365 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~39s
2024-02-24T03:12:49.968 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~45s
2024-02-24T03:12:55.572 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~50s
2024-02-24T03:13:01.177 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~56s
2024-02-24T03:13:06.781 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~62s
2024-02-24T03:13:12.386 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~67s
2024-02-24T03:13:17.990 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~73s
2024-02-24T03:13:23.594 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~78s
2024-02-24T03:13:29.198 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~84s
2024-02-24T03:13:34.803 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~90s
2024-02-24T03:13:40.407 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~95s
2024-02-24T03:13:42.823 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_transition FAILED [ 25%]
2024-02-24T03:13:42.823 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_transition ERROR [ 25%]
2024-02-24T03:13:46.011 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~101s
2024-02-24T03:13:51.615 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~106s
2024-02-24T03:13:57.219 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~112s
2024-02-24T03:13:59.823 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_multiple_transition FAILED [ 50%]
2024-02-24T03:13:59.824 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_multiple_transition ERROR [ 50%]
2024-02-24T03:14:02.823 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~118s
2024-02-24T03:14:08.427 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~123s
2024-02-24T03:14:14.031 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~129s
2024-02-24T03:14:19.635 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~135s
2024-02-24T03:14:21.200 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_noncur_cloud_transition FAILED [ 75%]
2024-02-24T03:14:21.200 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_noncur_cloud_transition ERROR [ 75%]
2024-02-24T03:14:25.240 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~140s
2024-02-24T03:14:30.844 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~146s
2024-02-24T03:14:35.192 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_transition_large_obj FAILED [100%]
2024-02-24T03:14:35.193 INFO:teuthology.orchestra.run.smithi106.stdout:s3tests_boto3/functional/test_s3.py::test_lifecycle_cloud_transition_large_obj ERROR [100%]
2024-02-24T03:14:35.193 INFO:teuthology.orchestra.run.smithi106.stdout:
</pre></p>
CephFS - Bug #64563 (Triaged): mds: enhance laggy clients detections due to laggy OSDs
https://tracker.ceph.com/issues/64563
2024-02-26T11:52:01Z
Dhairya Parmar
<p>Right now the code happily accepts that if there is any laggy OSD and a client got laggy then it must be due to the OSD, the code is note able to differentiate between "this client isn't responding" and "this client is slow to release caps" meaning if the client went off the grid AND we have any laggy OSD, MDS will not evict that client and instead mark it as "laggy due to laggy OSDs" which is completely wrong. There are five instances where clients are added to set `laggy_clients` in `Server.cc` which needs to re-evaluated and make sure that we do consider client eviction in cases like [0] where the last cap renew span is more than the session autoclose duration(i.e. 300 seconds default config) which is long enough to conclude that we have lost the client.</p>
<p>Not only this, there needs to be some sane/practical values for which we consider any osd to be worthy enough to be considered laggy i.e. "laggy enough to make a client(or anything) laggy", current implementation is too naive as it just checks whether any laggy param(osd_xinfo_t.laggy_interval or osd_xinfo_t.laggy_probability) is non-zero. This will make MDS not evict clients even though that slight OSD lagginess might not be that serious at all. In other words we need to make the helper OSDMap::any_osd_laggy a bit more smart and fine-grained.</p>
<p>[0] <a class="external" href="https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L1184-L1190">https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L1184-L1190</a></p>
CephFS - Bug #64502 (New): pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgra...
https://tracker.ceph.com/issues/64502
2024-02-20T02:15:08Z
Patrick Donnelly
pdonnell@redhat.com
<p>Every ceph-fuse mount for quincy fails to unmount for reef->main:</p>
<p><a class="external" href="https://pulpito.ceph.com/pdonnell-2024-02-19_18:28:45-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/">https://pulpito.ceph.com/pdonnell-2024-02-19_18:28:45-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/</a></p>
<pre>
2024-02-19T19:17:36.535 INFO:tasks.cephfs.fuse_mount:Running fusermount -u on ubuntu@smithi060.front.sepia.ceph.com...
2024-02-19T19:17:36.535 INFO:teuthology.orchestra.run:Running command with timeout 300
2024-02-19T19:17:36.535 DEBUG:teuthology.orchestra.run.smithi060:> sudo fusermount -u /home/ubuntu/cephtest/mnt.0
2024-02-19T19:17:36.562 INFO:teuthology.orchestra.run:waiting for 300
</pre>
<p>From: /teuthology/pdonnell-2024-02-19_18:28:45-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/7566635/teuthology.log</p>
<pre>
2024-02-19T19:17:36.799+0000 7f9fa7fff640 20 client.14548 tick
2024-02-19T19:17:36.799+0000 7f9fa7fff640 20 client.14548 collect_and_send_metrics
2024-02-19T19:17:36.799+0000 7f9fa7fff640 20 client.14548 collect_and_send_global_metrics
2024-02-19T19:17:36.799+0000 7f9fa7fff640 1 -- 192.168.0.1:0/854663557 --> [v2:172.21.15.60:6826/3594652577,v1:172.21.15.60:6827/3594652577] -- client_metrics [client_metric_type: READ_LATENCY latency: 5.996942, avg_latency: 0.000330, sq_sum: 86627012816408144, count=17901][client_metric_type: WRITE_LATENCY latency: 23.710221, avg_latency: 0.000407, sq_sum: 1890169673992666112, count=56281][client_metric_type: METADATA_LATENCY latency: 238.430933, avg_latency: 0.005247, sq_sum: 13600282437617256448, count=45341][client_metric_type: CAP_INFO cap_hits: 831286 cap_misses: 14792 num_caps: 0][client_metric_type: DENTRY_LEASE dlease_hits: 67 dlease_misses: 154700 num_dentries: 0][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 1][client_metric_type: PINNED_ICAPS pinned_icaps: 1 total_inodes: 1][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 1][client_metric_type: READ_IO_SIZES total_ops: 22272 total_size: 3731108728][client_metric_type: WRITE_IO_SIZES total_ops: 56281 total_size: 4270138133] v1 -- 0x7f9fa000b9e0 con 0x5637e76f0e80
2024-02-19T19:17:36.799+0000 7f9fa7fff640 20 client.14548 trim_cache size 1 max 16384
2024-02-19T19:17:36.799+0000 7f9fa7fff640 20 client.14548 upkeep thread waiting interval 1.000000000s
...
2024-02-19T20:23:30.865+0000 7f9fc8e36480 2 client.14548 unmounting
</pre>
<p>From: /teuthology/pdonnell-2024-02-19_18:28:45-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/7566635/remote/smithi060/log/ceph-client.0.log.gz</p>
<p>During teardown of the cluster the unmount eventual proceeds but it's not clear what was blocking it. I think something was holding the RWRef preventing unmount from proceeding.<br /></pre></p>
Linux kernel client - Bug #64471 (New): kernel: upgrades from quincy/v18.2.[01]/reef to main|squi...
https://tracker.ceph.com/issues/64471
2024-02-16T15:14:45Z
Patrick Donnelly
pdonnell@redhat.com
<pre>
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:{
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "target_image": "quay.ceph.io/ceph-ci/ceph:f78a58c0ffd401d1493058a1022c35f011d65275",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "in_progress": true,
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "which": "Upgrading all daemon types on all hosts",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "services_complete": [],
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "progress": "",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "message": "Doing first pull of quay.ceph.io/ceph-ci/ceph:f78a58c0ffd401d1493058a1022c35f011d65275 image",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout: "is_paused": false
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:}
...
2024-02-16T02:34:52.075 INFO:tasks.workunit.client.0.smithi032.stderr:+ pushd fsstress
2024-02-16T02:34:52.076 INFO:tasks.workunit.client.0.smithi032.stdout:~/cephtest/mnt.0/client.0/tmp/fsstress ~/cephtest/mnt.0/client.0/tmp
2024-02-16T02:34:52.076 INFO:tasks.workunit.client.0.smithi032.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6804 osd.3 since back 2024-02-16T02:34:45.611069+0000 front 2024-02-16T02:34:45.611050+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6812 osd.4 since back 2024-02-16T02:34:45.611191+0000 front 2024-02-16T02:34:45.611160+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6820 osd.5 since back 2024-02-16T02:34:45.611126+0000 front 2024-02-16T02:34:45.611214+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.316 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6804 osd.3 since back 2024-02-16T02:34:45.611069+0000 front 2024-02-16T02:34:45.611050+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.316 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6812 osd.4 since back 2024-02-16T02:34:45.611191+0000 front 2024-02-16T02:34:45.611160+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.317 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6820 osd.5 since back 2024-02-16T02:34:45.611126+0000 front 2024-02-16T02:34:45.611214+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
</pre>
<p>From: /teuthology/pdonnell-2024-02-16_01:25:08-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/7561891/teuthology.log</p>
<p>and many others in that run. It's quite reproducible. I don't think it happens with quincy -> main.</p>
<p>This might be related to <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: lots of "heartbeat_check: no reply from X.X.X.X" in OSD logs (Resolved)" href="https://tracker.ceph.com/issues/55258">#55258</a>.</p>
rgw - Bug #64292 (New): rgw-multisite: test_bucket_acl()test fails
https://tracker.ceph.com/issues/64292
2024-02-01T17:14:59Z
Shilpa MJ
smanjara@redhat.com
<p><a class="external" href="http://qa-proxy.ceph.com/teuthology/smanjara-2024-01-30_01:56:33-rgw:multisite-wip-shilpa-test-wo-sync-policy-distro-default-smithi/7537496/teuthology.log">http://qa-proxy.ceph.com/teuthology/smanjara-2024-01-30_01:56:33-rgw:multisite-wip-shilpa-test-wo-sync-policy-distro-default-smithi/7537496/teuthology.log</a></p>
<p>2024-01-30T03:37:23.924 INFO:tasks.rgw_multisite_tests:======================================================================<br />2024-01-30T03:37:23.924 INFO:tasks.rgw_multisite_tests:FAIL: rgw_multi.tests.test_bucket_acl<br />2024-01-30T03:37:23.924 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------<br />2024-01-30T03:37:23.924 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):<br />2024-01-30T03:37:23.925 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/git.ceph.com_teuthology_6128cc3ecb49b7d62475e3595041c19b5326ca7c/virtualenv/lib/python3.8/site-packages/nose/case.py", line 198, in runTest<br />2024-01-30T03:37:23.925 INFO:tasks.rgw_multisite_tests: self.test(*self.arg)<br />2024-01-30T03:37:23.925 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/git.ceph.com_ceph-c_92f1cafb9b06561a486443297adf37cef7b7a5f2/qa/../src/test/rgw/rgw_multi/tests.py", line 855, in test_bucket_acl<br />2024-01-30T03:37:23.925 INFO:tasks.rgw_multisite_tests: assert(len(bucket.get_acl().acl.grants) == 2) # new grant on AllUsers<br />2024-01-30T03:37:23.925 INFO:tasks.rgw_multisite_tests:AssertionError:</p>
Ceph - Bug #64279 (New): "Error ENOTSUP: Warning: due to ceph-mgr restart" in octopus-x/pacific s...
https://tracker.ceph.com/issues/64279
2024-01-31T22:53:19Z
Yuri Weinstein
yweinste@redhat.com
<p>This is for 16.2.15</p>
<p>Run: <a class="external" href="https://pulpito.ceph.com/yuriw-2024-01-31_16:13:03-upgrade:octopus-x-pacific-release-distro-default-smithi/">https://pulpito.ceph.com/yuriw-2024-01-31_16:13:03-upgrade:octopus-x-pacific-release-distro-default-smithi/</a><br />Jobs: 7540432 7540434 7540435<br />Logs: <a class="external" href="https://pulpito.ceph.com/yuriw-2024-01-31_16:13:03-upgrade:octopus-x-pacific-release-distro-default-smithi/7540432/">https://pulpito.ceph.com/yuriw-2024-01-31_16:13:03-upgrade:octopus-x-pacific-release-distro-default-smithi/7540432/</a></p>
<pre>
2024-01-31T17:36:17.722 INFO:teuthology.orchestra.run.smithi159.stderr:Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
2024-01-31T17:36:17.722 INFO:teuthology.orchestra.run.smithi159.stderr:Module 'orchestrator' is not enabled/loaded (required by command 'orch ps'): use `ceph mgr module enable orchestrator` to enable it
</pre>
<pre>
024-01-31T17:36:17.933 INFO:journalctl@ceph.mon.a.smithi159.stdout:Jan 31 17:36:17 smithi159 ceph-157b0ed4-c05e-11ee-95b3-87774f69a715-mon-a[70364]: cluster 2024-01-31T17:36:17.656902+0000 mon.a (mon.0) 62 : cluster [INF] Manager daemon y is now available
2024-01-31T17:36:18.034 DEBUG:teuthology.orchestra.run:got remote process result: 95
2024-01-31T17:36:18.035 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/task/parallel.py", line 56, in task
p.spawn(_run_spawned, ctx, confg, taskname)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/parallel.py", line 84, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/parallel.py", line 98, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/parallel.py", line 30, in resurrect_traceback
raise exc.exc_info[1]
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/parallel.py", line 23, in capture_traceback
return func(*args, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/task/parallel.py", line 64, in _run_spawned
mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/task/sequential.py", line 47, in task
mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_88fb4c6adb4bae6b1cdf34fca7eae2dacb06cc7b/qa/tasks/cephadm.py", line 1058, in shell
_shell(ctx, cluster_name, remote,
File "/home/teuthworker/src/github.com_ceph_ceph-c_88fb4c6adb4bae6b1cdf34fca7eae2dacb06cc7b/qa/tasks/cephadm.py", line 34, in _shell
return remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/orchestra/remote.py", line 523, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_d9fdb2209e15b39d9f061fd85399f352ce0f0894/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi159 with status 95: "sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:octopus shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 157b0ed4-c05e-11ee-95b3-87774f69a715 -e sha1=88fb4c6adb4bae6b1cdf34fca7eae2dacb06cc7b -- bash -c 'ceph orch ps'"
2024-01-31T17:36:18.263 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=05855bfb45164e1fa69621bf999495c7
</pre>
Linux kernel client - Bug #64172 (Fix Under Review): Test failure: test_multiple_path_r (tasks.ce...
https://tracker.ceph.com/issues/64172
2024-01-25T05:55:58Z
Venky Shankar
vshankar@redhat.com
<p>/a/vshankar-2024-01-22_07:03:31-fs-wip-vshankar-testing-20240119.075157-1-testing-default-smithi/7525717</p>
<p>The test setup involves "read" cap on a file system path (directory), remount the directory as file system root and read the created files.</p>
<p>MDS logs: ./remote/smithi157/log/ceph-mds.c.log.gz</p>
<pre>
2024-01-22T08:27:55.205+0000 7f81a7600640 1 -- [v2:172.21.15.157:6835/1231338113,v1:172.21.15.157:6837/1231338113] <== client.17628 v1:192.168.0.1:0/312855551 9 ==== client_request(client.17628:5 lookupino #0x1 2024-01-22T08:27:55.205939+0000 caller_uid=0, caller_gid=0{0,}) v6 ==== 176+0+0 (unknown 772642831 0 0) 0x55ab1d52bb00 con 0x55ab1d52e400
2024-01-22T08:27:55.205+0000 7f81a7600640 4 mds.0.server handle_client_request client_request(client.17628:5 lookupino #0x1 2024-01-22T08:27:55.205939+0000 caller_uid=0, caller_gid=0{0,}) v6
2024-01-22T08:27:55.205+0000 7f81a7600640 20 mds.0.356 get_session have 0x55ab1d202f00 client.17628 v1:192.168.0.1:0/312855551 state open
2024-01-22T08:27:55.205+0000 7f81a7600640 15 mds.0.server oldest_client_tid=5
2024-01-22T08:27:55.205+0000 7f81a7600640 7 mds.0.cache request_start request(client.17628:5 nref=2 cr=0x55ab1d52bb00)
2024-01-22T08:27:55.205+0000 7f81a7600640 7 mds.0.server dispatch_client_request client_request(client.17628:5 lookupino #0x1 2024-01-22T08:27:55.205939+0000 caller_uid=0, caller_gid=0{0,}) v6
2024-01-22T08:27:55.205+0000 7f81a7600640 20 Session check_access path
2024-01-22T08:27:55.205+0000 7f81a7600640 10 MDSAuthCap is_capable inode(path / owner 0:0 mode 041777) by caller 0:0 mask 0 new 0:0 cap: MDSAuthCaps[allow r fsname=cephfs path="/dir1/dir12", allow r fsname=cephfs path="/dir2/dir22"]
2024-01-22T08:27:55.205+0000 7f81a7600640 7 mds.0.server reply_client_request -13 ((13) Permission denied) client_request(client.17628:5 lookupino #0x1 2024-01-22T08:27:55.205939+0000 caller_uid=0, caller_gid=0{0,}) v6
2024-01-22T08:27:55.205+0000 7f81a7600640 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2024-01-22T08:27:55.205+0000 7f81a7600640 20 mds.0.server lat 0.000095
2024-01-22T08:27:55.205+0000 7f81a7600640 10 mds.0.356 send_message_client client.17628 v1:192.168.0.1:0/312855551 client_reply(???:5 = -13 (13) Permission denied) v1
2024-01-22T08:27:55.205+0000 7f81a7600640 1 -- [v2:172.21.15.157:6835/1231338113,v1:172.21.15.157:6837/1231338113] --> v1:192.168.0.1:0/312855551 -- client_reply(???:5 = -13 (13) Permission denied) v1 -- 0x55ab1d5b4700 con 0x55ab1d52e400
</pre>
Ceph - Bug #63617 (New): ceph-common: CommonSafeTimer<std::mutex>::timer_thread(): python3.12 kil...
https://tracker.ceph.com/issues/63617
2023-11-23T18:40:45Z
Kaleb KEITHLEY
<p><a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=2251165">https://bugzilla.redhat.com/show_bug.cgi?id=2251165</a></p>
<p>Description of problem:</p>
<p>Version-Release number of selected component:<br />ceph-common-2:18.2.1-1.fc39</p>
<p>Additional info:<br />reporter: libreport-2.17.11<br />cmdline: /usr/bin/python3.12 /usr/bin/ceph -s<br />backtrace_rating: 4<br />runlevel: N 5<br />executable: /usr/bin/python3.12<br />journald_cursor: s=9f8a7a66b4194fdcbd75dcd3edf4da87;i=173e8c976;b=a08b8db920744522980a5387af245706;m=2743cc1c;t=60accf74a277f;x=cef1ac3a8dc81a9d<br />comment: <br />cgroup: 0::/user.slice/user-1000.slice/user/app.slice/app-org.kde.konsole-44b42a69b68946748c9899bd38ac8c6d.scope<br />kernel: 6.6.2-200.fc39.x86_64<br />uid: 1000<br />rootdir: /<br />crash_function: CommonSafeTimer<std::mutex>::timer_thread<br />type: CCpp<br />package: ceph-common-2:18.2.1-1.fc39<br />reason: python3.12 killed by SIGSEGV</p>
<p>Truncated backtrace:<br />Thread no. 1 (3 frames)<br /> #0 CommonSafeTimer<std::mutex>::timer_thread at /usr/src/debug/ceph-18.2.1-1.fc39.x86_64/src/common/Timer.cc:103<br /> <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> CommonSafeTimerThread<std::mutex>::entry at /usr/src/debug/ceph-18.2.1-1.fc39.x86_64/src/common/Timer.cc:33<br /> <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> clone3 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78</p>