Bug #43427: bucket index reshard fails - rgw - Ceph

Actions

Copy link

Bug #43427

closed

bucket index reshard fails

Added by Chris Durham over 4 years ago. Updated over 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Target version:

Ceph - v15.0.0

% Done:

Source:

Community (dev)

Tags:

rgw bucket reshard

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.0.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

radosgw-admin gets a SIGABRT when doing a manual reshard

using 15.0.0-8515-ga7a2987 on CentOS 8.

\\# radosgw-admin bucket limit check

shows that the bucket 'repbucket' has 147214 objects, and fill_status over 100.000000% percent, num_shards is 0

rgw dynamic resharding = false in ceph.conf. s3 works well, able to read and write objects to the bucket

\\# radosgw-admin reshard --bucket repbucket --num-shards 32

gets a SIGABRT, and leaves the following stack trace in log file. It also leaves a lock lying around, causing a subsequent reshard request to fail wih a RGWReshardLock::lock failed to acquire lock on repbucket:... ret=-16

1: (()+0x12d80) [0x7ff5a3e79d80]
2: (gsignal()+0x10f) [0x7ff5a1bee93f]
3: (abort()+0x127) [0x7ff5a1bd8c95]
4: (()+0x27f4b8) [0x5600f0ef74b8]
5: (RGWSI_Bucket_Sobj::store_bucket_instance_info(ptr_wrapper<RGWSI_MetaBackend::....

This happens after 24 calls to get_auth_request, followed by: (not always 24)
-1 ** Caught signal (Aborted) *
in thread 0x6f341ed23000 thread_name: radosgw-admin

If I restart radosgw service without rgw dynamic resharding set to true, the bucket attempts to reshard automatically, fails, retries with the error in the rgw log about not able to acquire lock...and until the lock disappears the bucket is read only...and it appears the reshard is attempted multiple times, causing the read-only problem to be cyclic if I leave the dynamic resharding set

Unfortunately, I am unable to provide much more explicit details but can answer questions, We have 3 mons, 2 rgws (same issue on both), 240+osds

Related issues 1 (0 open — 1 closed)