Project

General

Profile

Actions

Bug #54363

closed

segfault when Resharding occurs during LC

Added by Mark Kogan about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
lifecycle
Backport:
octopus pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on master a segfault occurs when dynamic resharding is triggered during lc processing:

repro:

- write 10,000,000 objects in a single bucket

- in terminal #1: apply LC policy to delete all the objects and instruct to process it

s3cmd setlifecycle ./lc-expiration.xml s3://b001b000000000000
sudo ./bin/radosgw-admin lc process --debug_rgw=20 &> lc_process.log

- in terminal #2: add resharding and instruct to process it

sudo ./bin/radosgw-admin reshard add --bucket=b001b000000000000 --num-shards=199
sudo ./bin/radosgw-admin reshard process

a segfault occurs in radosgw-admin executing the ls process and in radosgw (if also performing the lc in parallel):

less ./out/radosgw.8000.log
...
   -10> 2022-02-21T21:28:49.597+0200 7fffba0e7700 20 lifecycle: get_obj_state: rctx=0x7fffba0e6128 obj=b001b000000000000:o001o000001254189 state=0x5f119e8 s->prefetch_data=0
    -9> 2022-02-21T21:28:49.598+0200 7fffba0e7700 10 lifecycle: If-UnModified-Since: 2022-02-21T19:28:43.698805+0200 Last-Modified: 0.000000
    -8> 2022-02-21T21:28:49.598+0200 7fffba0e7700  1 lifecycle: ERROR: publishing notification failed, with error: -2
    -7> 2022-02-21T21:28:49.598+0200 7fffba0e7700  0 lifecycle: ERROR: remove_expired_obj :b001b000000000000[4da573f1-5c09-4e56-8196-088a61e2573f.4196.4]):o001o000001254189 (2) No such fi>
    -6> 2022-02-21T21:28:49.598+0200 7fffba0e7700  0 lifecycle: ERROR: remove_expired_obj :b001b000000000000[4da573f1-5c09-4e56-8196-088a61e2573f.4196.4]):o001o000001254189 (2) No such fi>
    -5> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: ERROR: orule.process() returned ret=-2thread:wp_thrd: 3, 9
    -4> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: operator()(): key=o001o000001253282wp_thrd: 3, 9
    -3> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: check(): key=o001o000001253282: is_expired=1 wp_thrd: 3, 9
    -2> 2022-02-21T21:28:49.598+0200 7fffba0e7700 20 lifecycle: get_obj_state: rctx=0x7fffba0e6128 obj=b001b000000000000:o001o000001253282 state=0x5f119e8 s->prefetch_data=0
    -1> 2022-02-21T21:28:49.598+0200 7fffba0e7700 10 lifecycle: If-UnModified-Since: 2022-02-21T19:28:43.319021+0200 Last-Modified: 0.000000
     0> 2022-02-21T21:28:49.598+0200 7fffb98e6700 -1 *** Caught signal (Aborted) **
 in thread 7fffb98e6700 thread_name:lifecycle_thr_3

 ceph version 17.0.0-10766-g88eb23e585d (88eb23e585da3fda0e43c96161eccf1b45ac3bf0) quincy (dev)
 1: /lib64/libpthread.so.0(+0x14a90) [0x7ffff5a21a90]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9e941) [0x7ffff5863941]
 5: /lib64/libstdc++.so.6(+0xaa32c) [0x7ffff586f32c]
 6: /lib64/libstdc++.so.6(+0xaa397) [0x7ffff586f397]
 7: /lib64/libstdc++.so.6(+0xaa649) [0x7ffff586f649]
 8: (std::__throw_logic_error(char const*)+0x41) [0x7ffff58662a0]
 9: (LCOpRule::update()+0x314) [0x7ffff76a5914]
 10: (RGWLC::bucket_lc_process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, RGWLC::LCWorker*, long, bool)+0x127f) [0x7ffff76a777f]
 11: (RGWLC::process(int, int, RGWLC::LCWorker*, bool)+0x126e) [0x7ffff76ac0ee]
 12: (RGWLC::process(RGWLC::LCWorker*, std::unique_ptr<rgw::sal::Bucket, std::default_delete<rgw::sal::Bucket> > const&, bool)+0x516) [0x7ffff76a3096]
 13: (RGWLC::LCWorker::entry()+0x1f3) [0x7ffff76a22b3]
 14: (Thread::entry_wrapper()+0xaa) [0x7ffff60fba6a]
 15: /lib64/libpthread.so.0(+0x9432) [0x7ffff5a16432]
 16: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 3 (0 open3 closed)

Copied to rgw - Backport #54967: quincy: segfault when Resharding occurs during LCResolvedActions
Copied to rgw - Backport #54968: pacific: segfault when Resharding occurs during LCResolvedMark KoganActions
Copied to rgw - Backport #54969: octopus: segfault when Resharding occurs during LCResolvedActions
Actions

Also available in: Atom PDF