Bug #43427
closedbucket index reshard fails
0%
Description
radosgw-admin gets a SIGABRT when doing a manual reshard
using 15.0.0-8515-ga7a2987 on CentOS 8.
\\# radosgw-admin bucket limit check
shows that the bucket 'repbucket' has 147214 objects, and fill_status over 100.000000% percent, num_shards is 0
rgw dynamic resharding = false in ceph.conf. s3 works well, able to read and write objects to the bucket
\\# radosgw-admin reshard --bucket repbucket --num-shards 32
gets a SIGABRT, and leaves the following stack trace in log file. It also leaves a lock lying around, causing a subsequent reshard request to fail wih a RGWReshardLock::lock failed to acquire lock on repbucket:... ret=-16
1: (()+0x12d80) [0x7ff5a3e79d80]
2: (gsignal()+0x10f) [0x7ff5a1bee93f]
3: (abort()+0x127) [0x7ff5a1bd8c95]
4: (()+0x27f4b8) [0x5600f0ef74b8]
5: (RGWSI_Bucket_Sobj::store_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::....
This happens after 24 calls to get_auth_request, followed by: (not always 24)
-1 ** Caught signal (Aborted) *
in thread 0x6f341ed23000 thread_name: radosgw-admin
If I restart radosgw service without rgw dynamic resharding set to true, the bucket attempts to reshard automatically, fails, retries with the error in the rgw log about not able to acquire lock...and until the lock disappears the bucket is read only...and it appears the reshard is attempted multiple times, causing the read-only problem to be cyclic if I leave the dynamic resharding set
Unfortunately, I am unable to provide much more explicit details but can answer questions, We have 3 mons, 2 rgws (same issue on both), 240+osds
Updated by Kefu Chai over 4 years ago
- Is duplicate of Bug #43414: crash in RGWSI_Bucket_SObj::store_bucket_instance_info() added
Updated by Kefu Chai over 4 years ago
- Project changed from rgw-testing to rgw
- Status changed from New to Duplicate