Ceph : Soumya Koduri
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2024-03-26T18:37:34Z
Ceph
Redmine
rgw - Bug #65160 (Fix Under Review): rgw/lc: A few buckets stuck in UNINITIAL state
https://tracker.ceph.com/issues/65160
2024-03-26T18:37:34Z
Soumya Koduri
<p>We are observing lifecycle policy for certain buckets remains stuck in the UNINITIAL state, when the LC transition policy is applied on multiple buckets.</p>
<p>There are a total of buckets 428 buckets in the LC list, out of which 5 are stuck in the UNINITIAL state for a few days. All belong to the same lc shard of lc.4</p>
<p>-------------- snippet of the buckets stuck ----------</p>
<pre><code>"status": "COMPLETE" <br /> },
{<br /> "bucket": ":lucileg.356-bucky-4188-12:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.208655.6",<br /> "shard": "lc.4",<br /> "started": "Thu, 01 Jan 1970 00:00:00 GMT",<br /> "status": "UNINITIAL" <br /> },
{<br /> "bucket": ":lucileg.356-bucky-4188-31:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.218596.16",<br /> "shard": "lc.4",<br /> "started": "Thu, 01 Jan 1970 00:00:00 GMT",<br /> "status": "UNINITIAL" <br /> },
{<br /> "bucket": ":lucileg.356-bucky-4188-42:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.208655.24",<br /> "shard": "lc.4",<br /> "started": "Thu, 01 Jan 1970 00:00:00 GMT",<br /> "status": "UNINITIAL" <br /> },
{<br /> "bucket": ":lucileg.356-bucky-4188-63:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.218596.32",<br /> "shard": "lc.4",<br /> "started": "Thu, 01 Jan 1970 00:00:00 GMT",<br /> "status": "UNINITIAL" <br /> },
{<br /> "bucket": ":lucileg.356-bucky-4188-95:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.218596.47",<br /> "shard": "lc.4",<br /> "started": "Thu, 01 Jan 1970 00:00:00 GMT",<br /> "status": "UNINITIAL" <br /> },
{<br /> "bucket": ":maryx.267-bucky-4525-37:1e614482-6c1c-49cb-81ef-8fb70c1ff4e8.133978.25",<br /> "shard": "lc.4",<br /> "started": "Wed, 21 Feb 2024 11:08:55 GMT",<br /> "status": "COMPLETE" <br /> }<br />How reproducible:</code></pre>
<p>Seen intermittently when LC is applied simultaneously on multiple buckets (100 at a time.)</p>
<p>Reproducibility: 2/3 times.</p>
<p>Steps to Reproduce:</p>
<p>1. Create around 100 buckets and upload 10 objects per bucket.<br />2. Apply an LC rule on all the buckets to transition the objects to a 'cold' storage class.<br />3. Repeat the test (steps 1 and 2) around 2-3 times.</p>
<p>Actual results:</p>
<p>We observed that 5 buckets are stuck in a UNINITIAL state for a few days.</p>
<p>Expected results:</p>
<p>LC process should not be stuck in a UNINITIAL state.</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257914
2024-03-24T16:41:30Z
Soumya Koduri
<p>Casey Bodley wrote:</p>
<blockquote>
<p>caught another crash that looks more like the others we've seen:<br />[...]</p>
<p>this time it's destroying <code>bucket_info_cache_entry::attrs</code></p>
</blockquote>
<p>Most of these crashes seem to be related to <strong>bucket</strong> obj.</p>
<p>@cbodley,</p>
<p>Could you please clarify what will be the bucket_version in the else case below (<a class="external" href="https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_sal_rados.cc#L546">https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_sal_rados.cc#L546</a>)</p>
<p>int RadosBucket::load_bucket(const DoutPrefixProvider* dpp, optional_yield y)
{<br /> int ret;<br /> int ret;</p>
<pre><code>RGWSI_MetaBackend_CtxParams bectx_params = RGWSI_MetaBackend_CtxParams_SObj();<br /> RGWObjVersionTracker ep_ot;<br />..<br />..<br /> } else {<br /> ret = store->ctl()->bucket->read_bucket_instance_info(info.bucket, &info, y, dpp,<br /> RGWBucketCtl::BucketInstance::GetParams()<br /> .set_mtime(&mtime)<br /> .set_attrs(&attrs)<br /> .set_bectx_params(bectx_params));<br /> }<br /> if (ret != 0) {<br /> return ret;<br /> }</code></pre>
<pre><code>bucket_version = ep_ot.read_version; <<<< Here ep_ot.read_version is uninitialized incase of else{}.. right?</code></pre>
<p>}</p>
<p>The new code added in 'lc' publish_reserve()->load_bucket(..) enters the else{} code path as info.bucket.bucket_id is not empty. Could it be that bucket_version here gets updated with incorrect read_version which may result in probable use_after_free of the bucket object in the cache at later point?</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257913
2024-03-24T16:30:48Z
Soumya Koduri
<p>Soumya Koduri wrote:</p>
<blockquote>
<p>Casey Bodley wrote:</p>
<blockquote>
<p>spotted a different lifecycle crash in the rgw/dbstore suite, which is interesting because no expiration/transition is involved:<br />[...]<br />from <a class="external" href="https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-20_23:26:29-rgw-wip-cbodley-testing-distro-default-smithi/7614326/teuthology.log">https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-20_23:26:29-rgw-wip-cbodley-testing-distro-default-smithi/7614326/teuthology.log</a></p>
</blockquote>
<p>There are some lifecycle tests (which include expiration) not marked as 'fails_on_dbstore'. I will skip those tests too and check if the issue occurs even with just applying lifecycle policy rules.</p>
</blockquote>
<p>lc crashes when running dbstore suite are due to a bug in dbstore (<a class="external" href="https://github.com/soumyakoduri/ceph/commit/c20c2d2499eeb182ebffb1675950e42719735b5f">https://github.com/soumyakoduri/ceph/commit/c20c2d2499eeb182ebffb1675950e42719735b5f</a>) and unrelated to the original issue. Will post the fix for review.</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257719
2024-03-22T06:05:25Z
Soumya Koduri
<p>Casey Bodley wrote:</p>
<blockquote>
<p>spotted a different lifecycle crash in the rgw/dbstore suite, which is interesting because no expiration/transition is involved:<br />[...]<br />from <a class="external" href="https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-20_23:26:29-rgw-wip-cbodley-testing-distro-default-smithi/7614326/teuthology.log">https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-20_23:26:29-rgw-wip-cbodley-testing-distro-default-smithi/7614326/teuthology.log</a></p>
</blockquote>
<p>There are some lifecycle tests (which include expiration) not marked as 'fails_on_dbstore'. I will skip those tests too and check if the issue occurs even with just applying lifecycle policy rules.</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257443
2024-03-19T18:37:40Z
Soumya Koduri
<p>With Valgrind enabled & lc_debug_interval set to `1`, the crash is not observed however in one of the runs a valgrind error "Leak_PossiblyLost" is reported possibly during cleanup -</p>
<p><a class="external" href="http://qa-proxy.ceph.com/teuthology/soumyakoduri-2024-03-19_16:14:48-rgw:cloud-transition-wip-skoduri-lc-distro-default-smithi/7610946/teuthology.log">http://qa-proxy.ceph.com/teuthology/soumyakoduri-2024-03-19_16:14:48-rgw:cloud-transition-wip-skoduri-lc-distro-default-smithi/7610946/teuthology.log</a></p>
<p>Attaching valgrind log file of that RGW service</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257442
2024-03-19T18:31:13Z
Soumya Koduri
<p>Two more crashes with similar bt are seen -<br /><a class="external" href="http://qa-proxy.ceph.com/teuthology/soumyakoduri-2024-03-19_02:37:28-rgw:cloud-transition-wip-skoduri-lc-distro-default-smithi/7610307/teuthology.log">http://qa-proxy.ceph.com/teuthology/soumyakoduri-2024-03-19_02:37:28-rgw:cloud-transition-wip-skoduri-lc-distro-default-smithi/7610307/teuthology.log</a></p>
<p>2024-03-19T03:09:59.640 INFO:tasks.rgw.client.0.smithi148.stdout:*** Caught signal (Segmentation fault) <b><br />2024-03-19T03:09:59.640 INFO:tasks.rgw.client.0.smithi148.stdout: in thread 7fc7f2910640 thread_name:wp_thrd: 0, 0<br />2024-03-19T03:09:59.645 INFO:tasks.rgw.client.0.smithi148.stdout: ceph version 19.0.0-2098-ge0537c1b (e0537c1b936b5049f35101185806ff470c74a520) squid (dev)<br />2024-03-19T03:09:59.645 INFO:tasks.rgw.client.0.smithi148.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7fc918654db0]<br />2024-03-19T03:09:59.645 INFO:tasks.rgw.client.0.smithi148.stdout: 2: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)+0xeb) [0x7fc91b82977b]<br />2024-03-19T03:09:59.645 INFO:tasks.rgw.client.0.smithi148.stdout: 3: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int)+0x30) [0x7fc91b829b20]<br />2024-03-19T03:09:59.645 INFO:tasks.rgw.client.0.smithi148.stdout: 4: radosgw(+0x488928) [0x555c4cb78928]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 5: (RGWSI_Bucket_SObj::read_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::Context, 4>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWBucketInfo*, std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > ><strong>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > ></strong>, optional_yield, DoutPrefixProvider const*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x22a) [0x555c4d0369ca]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 6: radosgw(+0x9f8069) [0x555c4d0e8069]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 7: (std::_Function_handler<int (RGWSI_MetaBackend_Handler::Op*), RGWBucketInstanceMetadataHandler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (ptr_wrapper<RGWSI_MetaBackend::Context, 4>&)>)::{lambda(RGWSI_MetaBackend_Handler::Op*)#1}>::_M_invoke(std::_Any_data const&, RGWSI_MetaBackend_Handler::Op*&&)+0x34) [0x555c4d0e5594]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 8: radosgw(+0x953ed0) [0x555c4d043ed0]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 9: (RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>)+0x5b) [0x555c4d043f6b]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 10: (RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>)+0x77) [0x555c4d0430f7]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 11: (RGWBucketCtl::read_bucket_instance_info(rgw_bucket const&, RGWBucketInfo*, optional_yield, DoutPrefixProvider const*, RGWBucketCtl::BucketInstance::GetParams const&)+0x11b) [0x555c4d0e89db]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 12: (rgw::sal::RadosBucket::load_bucket(DoutPrefixProvider const*, optional_yield)+0x211) [0x555c4cf3aee1]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 13: (rgw::notify::publish_reserve(DoutPrefixProvider const*, rgw::SiteConfig const&, rgw::notify::EventType, rgw::notify::reservation_t&, RGWObjTags const*)+0xb38) [0x555c4ceb4958]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 14: radosgw(+0x99e9f0) [0x555c4d08e9f0]<br />2024-03-19T03:09:59.646 INFO:tasks.rgw.client.0.smithi148.stdout: 15: radosgw(+0x99f7ad) [0x555c4d08f7ad]<br />2024-03-19T03:09:59.647 INFO:tasks.rgw.client.0.smithi148.stdout: 16: (LCOpRule::process(rgw_bucket_dir_entry&, DoutPrefixProvider const*, WorkQ*)+0x423) [0x555c4d091183]<br />2024-03-19T03:09:59.647 INFO:tasks.rgw.client.0.smithi148.stdout: 17: radosgw(+0x9a1b96) [0x555c4d091b96]<br />2024-03-19T03:09:59.647 INFO:tasks.rgw.client.0.smithi148.stdout: 18: radosgw(+0x99fac1) [0x555c4d08fac1]<br />2024-03-19T03:09:59.647 INFO:tasks.rgw.client.0.smithi148.stdout: 19: /lib64/libc.so.6(+0x9f802) [0x7fc91869f802]<br />2024-03-19T03:09:59.647 INFO:tasks.rgw.client.0.smithi148.stdout: 20: /lib64/libc.so.6(+0x3f450) [0x7fc91863f450]<br />2024-03-19T03:09:59.836 INFO:tasks.rgw.client.0.smithi148.stderr:daemon-helper: command crashed with signal 11<br />2024-03-19T03:10:04.263 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s<br />2024-03-19T03:10:09.867 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~6s<br />2024-03-19T03:10:15.472 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~11s<br />2024-03-19T03:10:21.076 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~17s<br />2024-03-19T03:10:24.602 INFO:tasks.rgw.client.1.smithi148.stdout:</b>* Caught signal (Segmentation fault) <b><br />2024-03-19T03:10:24.602 INFO:tasks.rgw.client.1.smithi148.stdout: in thread 7fc10624a640 thread_name:wp_thrd: 0, 2<br />2024-03-19T03:10:24.603 INFO:tasks.rgw.client.1.smithi148.stdout: ceph version 19.0.0-2098-ge0537c1b (e0537c1b936b5049f35101185806ff470c74a520) squid (dev)<br />2024-03-19T03:10:24.603 INFO:tasks.rgw.client.1.smithi148.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7fc22d054db0]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void</b>, void**)+0x3c) [0x7fc2302249bc]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x20) [0x7fc23022aef0]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x80) [0x7fc23022afb0]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (<strong>)(unsigned long))+0x76) [0x7fc23022b126]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 6: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xaf) [0x5578dccdbcef]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 7: (rgw_get_rados_ref(DoutPrefixProvider const</strong>, librados::v14_2_0::Rados*, rgw_raw_obj, rgw_rados_ref*)+0x3d) [0x5578dd1a43bd]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 8: (RGWRados::get_obj_head_ref(DoutPrefixProvider const*, rgw_placement_rule const&, rgw_obj const&, rgw_rados_ref*)+0x23a) [0x5578dd0f605a]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 9: (RGWRados::Object::Delete::delete_obj(optional_yield, DoutPrefixProvider const*, bool)+0x519) [0x5578dd11aed9]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 10: (rgw::sal::RadosObject::RadosDeleteOp::delete_obj(DoutPrefixProvider const*, optional_yield, unsigned int)+0x15a) [0x5578dd16155a]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 11: radosgw(+0x99cbf2) [0x5578dd29ebf2]<br />2024-03-19T03:10:24.604 INFO:tasks.rgw.client.1.smithi148.stdout: 12: radosgw(+0x99d149) [0x5578dd29f149]<br />2024-03-19T03:10:24.605 INFO:tasks.rgw.client.1.smithi148.stdout: 13: (LCOpRule::process(rgw_bucket_dir_entry&, DoutPrefixProvider const*, WorkQ*)+0x423) [0x5578dd2a3183]<br />2024-03-19T03:10:24.605 INFO:tasks.rgw.client.1.smithi148.stdout: 14: radosgw(+0x9a1b96) [0x5578dd2a3b96]<br />2024-03-19T03:10:24.605 INFO:tasks.rgw.client.1.smithi148.stdout: 15: radosgw(+0x99fac1) [0x5578dd2a1ac1]<br />2024-03-19T03:10:24.605 INFO:tasks.rgw.client.1.smithi148.stdout: 16: /lib64/libc.so.6(+0x9f802) [0x7fc22d09f802]<br />2024-03-19T03:10:24.605 INFO:tasks.rgw.client.1.smithi148.stdout: 17: /lib64/libc.so.6(+0x3f450) [0x7fc22d03f450]</p>
sepia - Support #64967 (In Progress): Sepia Lab Access Request
https://tracker.ceph.com/issues/64967
2024-03-18T15:58:15Z
Soumya Koduri
<p>1) Do you just need VPN access or will you also be running teuthology jobs?</p>
<p>I need VPN access and will also be running teuthology jobs.</p>
<p>2) Desired Username: <br />soumyakoduri</p>
<p>3) Alternate e-mail address(es) we can reach you at: <br /><a class="email" href="mailto:skoduri@redhat.com">skoduri@redhat.com</a></p>
<p>4) If you don't already have an established history of code contributions to Ceph, is there an existing community or core developer you've worked with who has reviewed your work and can vouch for your access request?</p>
<p style="padding-left:2em;">If you answered "No" to # 4, please answer the following (paste directly below the question to keep indentation):</p>
<p style="padding-left:2em;">4a) Paste a link to a Blueprint or planning doc of yours that was reviewed at a Ceph Developer Monthly.</p>
<p style="padding-left:2em;">4b) Paste a link to an accepted pull request for a major patch or feature.<br /><a class="external" href="https://github.com/ceph/ceph/pull/35100">https://github.com/ceph/ceph/pull/35100</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/31454">https://github.com/ceph/ceph/pull/31454</a></p>
<p style="padding-left:2em;">4c) If applicable, include a link to the current project (planning doc, dev branch, or pull request) that you are looking to test.</p>
<p>5) Paste your SSH public key(s) between the <code>pre</code> tags<br /><pre>
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCuGyEOD+7nmCuGh6N+nqBnhZEWqGDjjjYRiYv5lPKVLg+p2hTlgxCAXCqo4ylI7B+9R8F4Bx4T93SzYmuoirPT12anrkvrPQ56tRnB/86Fu6AGfKBhv6ttkuO//ET9+sot7WvMXdxFN+tytT0C+SGQP4JDESiQ7OuHnXsNraFnVo3mWNP8NI56Dlof+c0zKToK1iB9+ph7hgHgEJg803F3c0TEeBiv9tTeDrfFl9R3GQcVhayf5D7qRAPEN3ftLm9Bt2TJeNp/grOAqGjLlc+8Xg/2CxgBy/SAPbWaUUDhjuBfrVdEPqGwTFitfDwE9DsF8n3GN8oZr1WuHRR8amJSHUJ0r9QTNs/Pa4Y7ufPFZEGFffAT+/mzuAzIOcj7MSVk4/7jhXPtPstWRrQFBIW45YFngPddAYdrCV2LIBrQX6y4+9fozGuLFcn2vbF2FrA4cA/hzGaljcyucg7/PSiW0ChX48+pd1XCR9d07zfWkhfRk1t0KU1dDRnLVnurVRU= skoduri@li-1542d04c-27dd-11b2-a85c-e8459f08bc94.ibm.com
</pre></p>
<p>6) Paste your hashed VPN credentials between the <code>pre</code> tags (Format: <code>user@hostname 22CharacterSalt 65CharacterHashedPassword</code>)<br /><pre>
skoduri@localhost nFLji2lSZ4l4lD+ElGRU5Q 6fb271a147b3f0194a8f10dd7379bfb1e4bfad7ee45f400b0035e0b6a898cae7
</pre></p>
sepia - Support #40389: Sepia Lab Access Request
https://tracker.ceph.com/issues/40389#change-257304
2024-03-18T15:50:15Z
Soumya Koduri
<p>Hi David,</p>
<p>Below are the credentials for my new laptop. Kindly grant access. Thanks!</p>
<p>ssh public key -</p>
<p>ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCuGyEOD+7nmCuGh6N+nqBnhZEWqGDjjjYRiYv5lPKVLg+p2hTlgxCAXCqo4ylI7B+9R8F4Bx4T93SzYmuoirPT12anrkvrPQ56tRnB/86Fu6AGfKBhv6ttkuO//ET9+sot7WvMXdxFN+tytT0C+SGQP4JDESiQ7OuHnXsNraFnVo3mWNP8NI56Dlof+c0zKToK1iB9+ph7hgHgEJg803F3c0TEeBiv9tTeDrfFl9R3GQcVhayf5D7qRAPEN3ftLm9Bt2TJeNp/grOAqGjLlc+8Xg/2CxgBy/SAPbWaUUDhjuBfrVdEPqGwTFitfDwE9DsF8n3GN8oZr1WuHRR8amJSHUJ0r9QTNs/Pa4Y7ufPFZEGFffAT+/mzuAzIOcj7MSVk4/7jhXPtPstWRrQFBIW45YFngPddAYdrCV2LIBrQX6y4+9fozGuLFcn2vbF2FrA4cA/hzGaljcyucg7/PSiW0ChX48+pd1XCR9d07zfWkhfRk1t0KU1dDRnLVnurVRU= <a class="email" href="mailto:skoduri@li-1542d04c-27dd-11b2-a85c-e8459f08bc94.ibm.com">skoduri@li-1542d04c-27dd-11b2-a85c-e8459f08bc94.ibm.com</a></p>
<p>VPN creds:</p>
<p>skoduri@localhost nFLji2lSZ4l4lD+ElGRU5Q 6fb271a147b3f0194a8f10dd7379bfb1e4bfad7ee45f400b0035e0b6a898cae7</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257176
2024-03-16T06:54:06Z
Soumya Koduri
<blockquote><blockquote><blockquote>
</blockquote></blockquote></blockquote>
<p>#0 _<em>pthread_kill_implementation (<br /> threadid=<optimized out>, signo=signo@entry=11,<br /> no_tid=no_tid@entry=0) at pthread_kill.c:44<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> 0x00007ffff58aff33 in __pthread_kill_internal (<br /> signo=11, threadid=<optimized out>)<br /> at pthread_kill.c:78<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a> 0x00007ffff585fab6 in __GI_raise (sig=11)<br /> at ../sysdeps/posix/raise.c:26<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a> 0x0000555556d7bcc0 in reraise_fatal (<br /> signum=signum@entry=11)<br /> at /root/ceph/src/global/signal_handler.cc:88<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: lockdep warning in socket code (Closed)" href="https://tracker.ceph.com/issues/4">#4</a> 0x0000555556d7d351 in handle_oneshot_fatal_signal (<br /> signum=11)<br />--Type <RET> for more, q to quit, c to continue without paging--c<br /> at /root/ceph/src/global/signal_handler.cc:367<br /><a class="issue tracker-1 status-5 priority-3 priority-lowest closed" title="Bug: ./rados lspools sometimes hangs after listing all pools? (Closed)" href="https://tracker.ceph.com/issues/5">#5</a> <signal handler called><br /><a class="issue tracker-2 status-6 priority-3 priority-lowest closed" title="Feature: libceph could use a backward-compatible-to function (Rejected)" href="https://tracker.ceph.com/issues/6">#6</a> RGWHTTPClient::set_send_length (len=4036, this=0x0)<br /> at /root/ceph/src/rgw/rgw_http_client.h:128<br /><a class="issue tracker-6 status-3 priority-3 priority-lowest closed" title="Documentation: Document Monitor Commands (Resolved)" href="https://tracker.ceph.com/issues/7">#7</a> RGWLCCloudStreamPut::send_ready (<br /> this=this@entry=0x55555837b260, dpp=0x5555583e0460,<br /> rest_obj=...)<br /> at /root/ceph/src/rgw/driver/rados/rgw_lc_tier.cc:660<br /><a class="issue tracker-3 status-5 priority-4 priority-default closed" title="Support: Document differences from S3 (Closed)" href="https://tracker.ceph.com/issues/8">#8</a> 0x0000555556befef5 in cloud_tier_transfer_object (<br /> dpp=<optimized out>, readf=0x555558437b80,<br /> writef=0x55555837b260)<br /> at /root/ceph/src/rgw/driver/rados/rgw_lc_tier.cc:728<br /><a class="issue tracker-2 status-8 priority-3 priority-lowest closed" title="Feature: Access unimported data (Won't Fix)" href="https://tracker.ceph.com/issues/9">#9</a> 0x0000555556bf04e7 in cloud_tier_plain_transfer (<br /> tier_ctx=...)<br /> at /usr/include/c++/12/bits/shared_ptr_base.h:1665<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: osd: Replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/10">#10</a> 0x0000555556bf5a9b in rgw_cloud_tier_transfer_object<br /> (tier_ctx=...,<br /> cloud_targets=std::set with 1 element = {...})<br /> at /root/ceph/src/rgw/driver/rados/rgw_lc_tier.cc:1298<br /><a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="Cleanup: mds: replace ALLOW_MESSAGES_FROM macro (Resolved)" href="https://tracker.ceph.com/issues/11">#11</a> 0x00005555569eed13 in rgw::sal::RadosObject::transition_to_cloud (this=0x55555fa84000,<br /> bucket=0x55555d5db100, tier=0x55555d7ca340, o=...,<br /> cloud_targets=std::set with 1 element = {...},<br /> cct=0x5555583d6000, update_object=true,<br /> dpp=<optimized out>, y=...)<br /> at /root/ceph/src/rgw/driver/rados/rgw_sal_rados.cc:1766<br /><a class="issue tracker-2 status-3 priority-3 priority-lowest closed" title="Feature: uclient: Make cap handling smarter (Resolved)" href="https://tracker.ceph.com/issues/12">#12</a> 0x0000555556b1f61c in LCOpAction_Transition::transition_obj_to_cloud (this=this@entry=0x55555fa0ffc0, oc=...)<br /> at /root/ceph/src/rgw/rgw_lc.cc:1396<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: uclient: Make readdir use the cache (Resolved)" href="https://tracker.ceph.com/issues/13">#13</a> 0x0000555556b1fed1 in LCOpAction_Transition::process<br /> (this=0x55555fa0ffc0, oc=...)<br /> at /root/ceph/src/rgw/rgw_lc.cc:1453<br /><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: osd: pg split breaks if not all osds are up (Duplicate)" href="https://tracker.ceph.com/issues/14">#14</a> 0x0000555556b2087c in LCOpAction_CurrentTransition::process (this=<optimized out>, oc=...)<br /> at /root/ceph/src/rgw/rgw_lc.cc:1503<br /><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mds rejoin: invented dirfrags (MDCache.cc:3469) (Resolved)" href="https://tracker.ceph.com/issues/15">#15</a> 0x0000555556b0bf7f in LCOpRule::process (<br /> this=this@entry=0x7ffecf9d26a8, o=...,<br /> dpp=<optimized out>, wq=wq@entry=0x55555839f510)<br /> at /root/ceph/src/rgw/rgw_lc.cc:1618<br /><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: mds restart vs dbench (Resolved)" href="https://tracker.ceph.com/issues/16">#16</a> 0x0000555556b0c547 in operator() (<br /> __closure=0x55555839f5f0, wk=0x55555d1792c0,<br /> wq=0x55555839f510, wi=...)<br /> at /root/ceph/src/rgw/rgw_lc.cc:1704<br /><a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: rm -r failure (Rejected)" href="https://tracker.ceph.com/issues/17">#17</a> 0x0000555556b0c77b in std::</em>_invoke_impl<void, RGWLC::bucket_lc_process(std::string&, LCWorker*, time_t, bool)::<lambda(RGWLC::LCWorker*, WorkQ*, WorkItem&)>&, RGWLC::LCWorker*, WorkQ*, boost::variant<void*, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry>&> (_<em>f=...)<br /> at /usr/include/c++/12/bits/invoke.h:61<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: reconnect fixups (Resolved)" href="https://tracker.ceph.com/issues/18">#18</a> std::</em>_invoke_r<void, RGWLC::bucket_lc_process(std::string&, LCWorker*, time_t, bool)::<lambda(RGWLC::LCWorker*, WorkQ*, WorkItem&)>&, RGWLC::LCWorker*, WorkQ*, boost::variant<void*, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry>&> (__fn=...)<br /> at /usr/include/c++/12/bits/invoke.h:111<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: rbd (Resolved)" href="https://tracker.ceph.com/issues/19">#19</a> std::_Function_handler<void(RGWLC::LCWorker*, WorkQ*, boost::variant<void*, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry>&), RGWLC::bucket_lc_process(std::string&, LCWorker*, time_t, bool)::<lambda(RGWLC::LCWorker*, WorkQ*, WorkItem&)> >::_M_invoke(const std::_Any_data &, RGWLC::LCWorker <strong>&&, WorkQ *&&, boost::variant<void</strong>, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry> &) (<br /> __functor=..., __args#0=<optimized out>,<br /> __args#1=<optimized out>, __args#2=...)<br /> at /usr/include/c++/12/bits/std_function.h:290<br /><a class="issue tracker-2 status-3 priority-5 priority-high3 closed" title="Feature: client: recover from a killed session (w/ blacklist) (Resolved)" href="https://tracker.ceph.com/issues/20">#20</a> 0x0000555556b170d9 in std::function<void (RGWLC::LCWorker*, WorkQ*, boost::variant<void*, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry>&)>::operator()(RGWLC::LCWorker*, WorkQ*, boost::variant<void*, std::tuple<LCOpRule, rgw_bucket_dir_entry>, std::tuple<lc_op, rgw_bucket_dir_entry>, rgw_bucket_dir_entry>&) const (<br /> this=this@entry=0x55555839f5f0,<br /> __args#0=<optimized out>, __args#1=<optimized out>,<br /> __args#1@entry=0x55555839f510, __args#2=...)<br /> at /usr/include/c++/12/bits/std_function.h:591<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: optionally use libatomic for atomic_t (Resolved)" href="https://tracker.ceph.com/issues/21">#21</a> 0x0000555556b1b815 in WorkQ::entry (<br /> this=0x55555839f510)<br /> at /root/ceph/src/rgw/rgw_lc.cc:778<br /><a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:253 (Closed)" href="https://tracker.ceph.com/issues/22">#22</a> 0x00007ffff7033c0f in Thread::entry_wrapper (<br /> this=0x55555839f510)<br /> at /root/ceph/src/common/Thread.cc:87<br /><a class="issue tracker-2 status-3 priority-4 priority-default closed" title="Feature: fcntl/flock advisory lock support (Resolved)" href="https://tracker.ceph.com/issues/23">#23</a> 0x00007ffff7033c27 in Thread::_entry_func (<br /> arg=<optimized out>)<br /> at /root/ceph/src/common/Thread.cc:74<br /><a class="issue tracker-2 status-1 priority-4 priority-default" title="Feature: mdsc: preallocate reply msgs (New)" href="https://tracker.ceph.com/issues/24">#24</a> 0x00007ffff58ae19d in start_thread (<br /> arg=<optimized out>) at pthread_create.c:442<br /><a class="issue tracker-2 status-1 priority-3 priority-lowest" title="Feature: mdsc: mempool for cap writeback? (New)" href="https://tracker.ceph.com/issues/25">#25</a> 0x00007ffff592fc60 in clone3 ()<br /> at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257175
2024-03-16T06:53:43Z
Soumya Koduri
<p>Not able to reproduce this issue locally. Pasting the bt of two crashes reported so far -</p>
<blockquote><blockquote><blockquote>
</blockquote></blockquote></blockquote>
<p>2024-03-13T13:33:20.678 INFO:tasks.rgw.client.0.smithi106.stdout:*** Caught signal (Segmentation fault) <b><br />2024-03-13T13:33:20.679 INFO:tasks.rgw.client.0.smithi106.stdout: in thread 7fb15d0b9640 thread_name:wp_thrd: 1, 1<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: ceph version 19.0.0-2011-g06f9ec2c (06f9ec2cb3c57d3b59cb10ddd4c321a82bb853f4) squid (dev)<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb285ae5520]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 2: tc_new()<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 3: radosgw(+0x10d45c3) [0x55806f3b75c3]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 4: radosgw(+0x4a64d4) [0x55806e7894d4]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 5: radosgw(+0x6a6213) [0x55806e989213]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 6: (RGWSI_Bucket_SObj::read_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::Context, 4>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWBucketInfo*, std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > ><strong>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > ></strong>, optional_yield, DoutPrefixProvider const*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x13e) [0x55806ece616e]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 7: radosgw(+0xaa5ba9) [0x55806ed88ba9]<br />2024-03-13T13:33:20.690 INFO:tasks.rgw.client.0.smithi106.stdout: 8: (std::_Function_handler<int (RGWSI_MetaBackend_Handler::Op*), RGWBucketInstanceMetadataHandler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (ptr_wrapper<RGWSI_MetaBackend::Context, 4>&)>)::{lambda(RGWSI_MetaBackend_Handler::Op*)#1}>::_M_invoke(std::_Any_data const&, RGWSI_MetaBackend_Handler::Op*&&)+0x34) [0x55806ed84134]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 9: radosgw(+0xa110a0) [0x55806ecf40a0]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 10: (RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>)+0x5b) [0x55806ecf413b]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 11: (RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>)+0x77) [0x55806ecf32a7]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 12: (RGWBucketCtl::read_bucket_instance_info(rgw_bucket const&, RGWBucketInfo*, optional_yield, DoutPrefixProvider const*, RGWBucketCtl::BucketInstance::GetParams const&)+0x11b) [0x55806ed86f5b]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 13: (RGWRados::get_bucket_instance_info(rgw_bucket const&, RGWBucketInfo&, std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > ><strong>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > ></strong>, optional_yield, DoutPrefixProvider const*)+0x55) [0x55806ebc5915]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 14: (RGWRados::BucketShard::init(rgw_bucket const&, rgw_obj const&, RGWBucketInfo*, DoutPrefixProvider const*, optional_yield)+0x8a) [0x55806eba034a]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 15: (RGWRados::Bucket::UpdateIndex::guard_reshard(DoutPrefixProvider const*, rgw_obj const&, RGWRados::BucketShard</b>, std::function<int (RGWRados::BucketShard*)>, optional_yield)+0x368) [0x55806ebad4e8]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 16: (RGWRados::Bucket::UpdateIndex::prepare(DoutPrefixProvider const*, RGWModifyOp, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, optional_yield, bool)+0x113) [0x55806ebadad3]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 17: (RGWRados::Object::Delete::delete_obj(optional_yield, DoutPrefixProvider const*, bool)+0x6fc) [0x55806ebc000c]<br />2024-03-13T13:33:20.691 INFO:tasks.rgw.client.0.smithi106.stdout: 18: (rgw::sal::RadosObject::RadosDeleteOp::delete_obj(DoutPrefixProvider const*, optional_yield, unsigned int)+0x15a) [0x55806ebfbd9a]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 19: radosgw(+0xa53e42) [0x55806ed36e42]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 20: radosgw(+0xa54339) [0x55806ed37339]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 21: (LCOpRule::process(rgw_bucket_dir_entry&, DoutPrefixProvider const*, WorkQ*)+0x433) [0x55806ed3aef3]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 22: radosgw(+0xa588a9) [0x55806ed3b8a9]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 23: radosgw(+0xa56926) [0x55806ed39926]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 24: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fb285b37b43]<br />2024-03-13T13:33:20.692 INFO:tasks.rgw.client.0.smithi106.stdout: 25: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fb285bc9a00]</p>
rgw - Bug #64571: lifecycle transition crashes since merge end-to-end tracing
https://tracker.ceph.com/issues/64571#change-257133
2024-03-15T17:41:26Z
Soumya Koduri
<p>Casey Bodley wrote:</p>
<blockquote>
<p>hey Shreyansh and Soumya, any updates here? this is one urgent for the squid release</p>
</blockquote>
<p>Hi Casey..have started looking into it. I couldn't reproduce this issue locally with simple lc/cloud transition tests. Shreyansh mentioned that the crashes were frequently observed when using s3tests. I will run those now and update.</p>