Bug #43199
recursive lock of RGWCoroutinesManager::lock (43)
% Done:
0%
Source:
Tags:
multisite elasticsearch
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I backported this pr (https://github.com/ceph/ceph/pull/29637) to luminous, and found that RGW is segfaulted as follow:
-2> 2019-12-09 09:43:52.213009 7f76635f9e00 20 _will_lock RGWCoroutinesManager::lock (43) -1> 2019-12-09 09:43:52.213010 7f76635f9e00 0 recursive lock of RGWCoroutinesManager::lock (43) ceph version 12.2.8-387-g14bf2e5 (14bf2e55a49eef6ddcc926f8a20844f8b0ecbdcf) luminous (stable) 1: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x167) [0x7f76630ce417] 2: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548] 3: (RGWElasticDataSyncModule::init(RGWDataSyncEnv*, unsigned long)+0x341) [0x7f76631e99c1] 4: (RGWDataSyncCR::operate()+0x1c3) [0x7f76630a0bb3] 5: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7f76630cbb8e] 6: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x4cb) [0x7f76630ce77b] 7: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548] 8: (RGWRemoteDataLog::run_sync(int)+0xe3) [0x7f766307c3f3] 9: (main()+0x1f7db) [0x7f7662f9547b] 10: (__libc_start_main()+0xf5) [0x7f765680c505] 11: (()+0x1751b0) [0x7f7662fa11b0] previously locked at ceph version 12.2.8-387-g14bf2e5 (14bf2e55a49eef6ddcc926f8a20844f8b0ecbdcf) luminous (stable) 1: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x4ab) [0x7f76630ce75b] 2: (RGWCoroutinesManager::run(RGWCoroutine*)+0x98) [0x7f76630cf548] 3: (RGWRemoteDataLog::run_sync(int)+0xe3) [0x7f766307c3f3] 4: (main()+0x1f7db) [0x7f7662f9547b] 5: (__libc_start_main()+0xf5) [0x7f765680c505] 6: (()+0x1751b0) [0x7f7662fa11b0]
it seems that it's invalid to run RGWCoroutinesManager in an RGWCoroutinesManager. RGW ES sync module works as expected if I disable the lockdep detection. as following:
void init(RGWDataSyncEnv *sync_env, uint64_t instance_id) override { conf->init_instance(sync_env->store->svc()->zone->get_realm(), instance_id); conf->init_instance(sync_env->store->svc()->zone->get_realm(), instance_id); // try to get elastic search version lockdep_unregister_ceph_context(sync_env->store->ctx()); .....
History
#1 Updated by Chang Liu over 4 years ago
could we re-use the RGWCoroutinesManager of data sync instance here?
#2 Updated by Casey Bodley over 4 years ago
- Assignee set to Casey Bodley
#3 Updated by Casey Bodley over 4 years ago
- Status changed from New to In Progress
I added a comment about this to https://github.com/ceph/ceph/pull/29637/files#r357250244. We should address that on master first, then include the change with this backport.
#4 Updated by Chang Liu over 4 years ago
- Pull request ID set to 32269
#5 Updated by Casey Bodley about 4 years ago
- Status changed from In Progress to Fix Under Review
- Tags set to multisite elasticsearch
#6 Updated by Daniel Gryniewicz about 4 years ago
- Status changed from Fix Under Review to Resolved