Actions
Bug #54363
closedsegfault when Resharding occurs during LC
% Done:
0%
Source:
Tags:
lifecycle
Backport:
octopus pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Description
on master a segfault occurs when dynamic resharding is triggered during lc processing:
repro:
- write 10,000,000 objects in a single bucket
- in terminal #1: apply LC policy to delete all the objects and instruct to process it
s3cmd setlifecycle ./lc-expiration.xml s3://b001b000000000000 sudo ./bin/radosgw-admin lc process --debug_rgw=20 &> lc_process.log
- in terminal #2: add resharding and instruct to process it
sudo ./bin/radosgw-admin reshard add --bucket=b001b000000000000 --num-shards=199 sudo ./bin/radosgw-admin reshard process
a segfault occurs in radosgw-admin executing the ls process and in radosgw (if also performing the lc in parallel):
less ./out/radosgw.8000.log ... -10> 2022-02-21T21:28:49.597+0200 7fffba0e7700 20 lifecycle: get_obj_state: rctx=0x7fffba0e6128 obj=b001b000000000000:o001o000001254189 state=0x5f119e8 s->prefetch_data=0 -9> 2022-02-21T21:28:49.598+0200 7fffba0e7700 10 lifecycle: If-UnModified-Since: 2022-02-21T19:28:43.698805+0200 Last-Modified: 0.000000 -8> 2022-02-21T21:28:49.598+0200 7fffba0e7700 1 lifecycle: ERROR: publishing notification failed, with error: -2 -7> 2022-02-21T21:28:49.598+0200 7fffba0e7700 0 lifecycle: ERROR: remove_expired_obj :b001b000000000000[4da573f1-5c09-4e56-8196-088a61e2573f.4196.4]):o001o000001254189 (2) No such fi> -6> 2022-02-21T21:28:49.598+0200 7fffba0e7700 0 lifecycle: ERROR: remove_expired_obj :b001b000000000000[4da573f1-5c09-4e56-8196-088a61e2573f.4196.4]):o001o000001254189 (2) No such fi> -5> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: ERROR: orule.process() returned ret=-2thread:wp_thrd: 3, 9 -4> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: operator()(): key=o001o000001253282wp_thrd: 3, 9 -3> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: check(): key=o001o000001253282: is_expired=1 wp_thrd: 3, 9 -2> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: get_obj_state: rctx=0x7fffba0e6128 obj=b001b000000000000:o001o000001253282 state=0x5f119e8 s->prefetch_data=0 -1> 2022-02-21T21:28:49.598+0200 7fffba0e7700 10 lifecycle: If-UnModified-Since: 2022-02-21T19:28:43.319021+0200 Last-Modified: 0.000000 0> 2022-02-21T21:28:49.598+0200 7fffb98e6700 -1 *** Caught signal (Aborted) ** in thread 7fffb98e6700 thread_name:lifecycle_thr_3 ceph version 17.0.0-10766-g88eb23e585d (88eb23e585da3fda0e43c96161eccf1b45ac3bf0) quincy (dev) 1: /lib64/libpthread.so.0(+0x14a90) [0x7ffff5a21a90] 2: gsignal() 3: abort() 4: /lib64/libstdc++.so.6(+0x9e941) [0x7ffff5863941] 5: /lib64/libstdc++.so.6(+0xaa32c) [0x7ffff586f32c] 6: /lib64/libstdc++.so.6(+0xaa397) [0x7ffff586f397] 7: /lib64/libstdc++.so.6(+0xaa649) [0x7ffff586f649] 8: (std::__throw_logic_error(char const*)+0x41) [0x7ffff58662a0] 9: (LCOpRule::update()+0x314) [0x7ffff76a5914] 10: (RGWLC::bucket_lc_process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, RGWLC::LCWorker*, long, bool)+0x127f) [0x7ffff76a777f] 11: (RGWLC::process(int, int, RGWLC::LCWorker*, bool)+0x126e) [0x7ffff76ac0ee] 12: (RGWLC::process(RGWLC::LCWorker*, std::unique_ptr<rgw::sal::Bucket, std::default_delete<rgw::sal::Bucket> > const&, bool)+0x516) [0x7ffff76a3096] 13: (RGWLC::LCWorker::entry()+0x1f3) [0x7ffff76a22b3] 14: (Thread::entry_wrapper()+0xaa) [0x7ffff60fba6a] 15: /lib64/libpthread.so.0(+0x9432) [0x7ffff5a16432] 16: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions