Project

General

Profile

Bug #43427

bucket index reshard fails

Added by Chris Durham 6 months ago. Updated 6 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
rgw bucket reshard
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

radosgw-admin gets a SIGABRT when doing a manual reshard

using 15.0.0-8515-ga7a2987 on CentOS 8.

\\# radosgw-admin bucket limit check

shows that the bucket 'repbucket' has 147214 objects, and fill_status over 100.000000% percent, num_shards is 0

rgw dynamic resharding = false in ceph.conf. s3 works well, able to read and write objects to the bucket

\\# radosgw-admin reshard --bucket repbucket --num-shards 32

gets a SIGABRT, and leaves the following stack trace in log file. It also leaves a lock lying around, causing a subsequent reshard request to fail wih a RGWReshardLock::lock failed to acquire lock on repbucket:... ret=-16

1: (()+0x12d80) [0x7ff5a3e79d80]
2: (gsignal()+0x10f) [0x7ff5a1bee93f]
3: (abort()+0x127) [0x7ff5a1bd8c95]
4: (()+0x27f4b8) [0x5600f0ef74b8]
5: (RGWSI_Bucket_Sobj::store_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::....

This happens after 24 calls to get_auth_request, followed by: (not always 24)
-1 ** Caught signal (Aborted) *
in thread 0x6f341ed23000 thread_name: radosgw-admin

If I restart radosgw service without rgw dynamic resharding set to true, the bucket attempts to reshard automatically, fails, retries with the error in the rgw log about not able to acquire lock...and until the lock disappears the bucket is read only...and it appears the reshard is attempted multiple times, causing the read-only problem to be cyclic if I leave the dynamic resharding set

Unfortunately, I am unable to provide much more explicit details but can answer questions, We have 3 mons, 2 rgws (same issue on both), 240+osds


Related issues

Duplicates rgw - Bug #43414: crash in RGWSI_Bucket_SObj::store_bucket_instance_info() Resolved

History

#1 Updated by Kefu Chai 6 months ago

  • Duplicates Bug #43414: crash in RGWSI_Bucket_SObj::store_bucket_instance_info() added

#2 Updated by Kefu Chai 6 months ago

  • Project changed from rgw-testing to rgw
  • Status changed from New to Duplicate

Also available in: Atom PDF