Project

General

Profile

Bug #20251

rgw: meta sync thread crash at RGWMetaSyncShardCR

Added by fang yuxiang 6 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
06/12/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel kraken
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
master
Needs Doc:
No

Description

radosgw crash in meta sync as below:

ceph version 10.2.3-124-g82a9117 (82a9117651e68c5b843b364353081da1b284475f)
1: (()+0x64b73a) [0x7f3139f0573a]
2: (()+0xf100) [0x7f3139241100]
3: (RGWCoroutinesStack::wakeup()+0xa) [0x7f3139c0a2aa]
4: (RGWMetaSyncShardCR::incremental_sync()+0xc96) [0x7f3139cbb586]
5: (RGWMetaSyncShardCR::operate()+0x44) [0x7f3139cbd084]
6: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7f3139c097be]
7: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f3) [0x7f3139c0c1d3]
8: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7f3139c0cf30]
9: (RGWRemoteMetaLog::run_sync()+0xf62) [0x7f3139cacd92]
10: (RGWMetaSyncProcessorThread::process()+0xd) [0x7f3139d9eead]
11: (RGWRadosThread::Worker::entry()+0x125) [0x7f3139d3bcf5]
12: (()+0x7dc5) [0x7f3139239dc5]
13: (clone()+0x6d) [0x7f313863d28d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph version 10.2.3-124-g82a9117 (82a9117651e68c5b843b364353081da1b284475f)
1: (()+0x64b73a) [0x7fa07fd7e73a]
2: (()+0xf100) [0x7fa07f0ba100]
3: (gsignal()+0x37) [0x7fa07e3f55f7]
4: (abort()+0x148) [0x7fa07e3f6ce8]
5: (()+0x75317) [0x7fa07e435317]
6: (()+0x7d023) [0x7fa07e43d023]
7: (RefCountedObject::put()+0xf8) [0x7fa0896d1898]
8: (RGWMetaSyncShardCR::collect_children()+0xc5) [0x7fa07fb2b255]
9: (RGWMetaSyncShardCR::incremental_sync()+0x628) [0x7fa07fb33f18]
10: (RGWMetaSyncShardCR::operate()+0x44) [0x7fa07fb36084]
11: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7fa07fa827be]
12: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f3) [0x7fa07fa851d3]
13: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7fa07fa85f30]
14: (RGWRemoteMetaLog::run_sync()+0xf62) [0x7fa07fb25d92]
15: (RGWMetaSyncProcessorThread::process()+0xd) [0x7fa07fc17ead]
16: (RGWRadosThread::Worker::entry()+0x125) [0x7fa07fbb4cf5]
17: (()+0x7dc5) [0x7fa07f0b2dc5]
18: (clone()+0x6d) [0x7fa07e4b628d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Copied to rgw - Backport #20346: jewel: rgw: meta sync thread crash at RGWMetaSyncShardCR Resolved
Copied to rgw - Backport #20347: kraken: rgw: meta sync thread crash at RGWMetaSyncShardCR Resolved

History

#1 Updated by fang yuxiang 6 months ago

I found this issue was caused by the lease_stack reference count, and will open a pr for this

#2 Updated by fang yuxiang 6 months ago

our ceph version is new rgw code + old rados cluster code, so it display version :10.2.3

but this issue should exist in master

pr:
https://github.com/ceph/ceph/pull/15660

#3 Updated by Andrey Tyurin 6 months ago

We have got the pair of those:

2017-06-13 17:43:41.807843 7fa990abf700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa990abf700 thread_name:radosgw

 ceph version 10.2.4 (9411351cc8ce9ee03fbd46225102fe3d28ddf611)
 1: (()+0x56ab9a) [0x7faaca07fb9a]
 2: (()+0xf130) [0x7faac947d130]
 3: (RGWCoroutinesStack::wakeup()+0xe) [0x7faac9de576e]
 4: (RGWMetaSyncShardCR::incremental_sync()+0xbf1) [0x7faac9e88dd1]
 5: (RGWMetaSyncShardCR::operate()+0x44) [0x7faac9e8a0d4]
 6: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7faac9de4cfe]
 7: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f1) [0x7faac9de7771]
 8: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7faac9de8330]
 9: (RGWRemoteMetaLog::run_sync()+0xde8) [0x7faac9e7a748]
 10: (RGWMetaSyncProcessorThread::process()+0xd) [0x7faac9f579bd]
 11: (RGWRadosThread::Worker::entry()+0x133) [0x7faac9efa083]
 12: (()+0x7df5) [0x7faac9475df5]
 13: (clone()+0x6d) [0x7faac8a811ad]

ceph version 10.2.4 (9411351cc8ce9ee03fbd46225102fe3d28ddf611)
 1: (()+0x56ab9a) [0x7f889b0f7b9a]
 2: (()+0xf130) [0x7f889a4f5130]
 3: (Mutex::Lock(bool)+0x4) [0x7f889b273c54]
 4: (RGWCompletionManager::wakeup(void*)+0x18) [0x7f889ae5d728]
 5: (RGWMetaSyncShardCR::incremental_sync()+0xbf1) [0x7f889af00dd1]
 6: (RGWMetaSyncShardCR::operate()+0x44) [0x7f889af020d4]
 7: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7f889ae5ccfe]
 8: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f1) [0x7f889ae5f771]
 9: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7f889ae60330]
 10: (RGWRemoteMetaLog::run_sync()+0xde8) [0x7f889aef2748]
 11: (RGWMetaSyncProcessorThread::process()+0xd) [0x7f889afcf9bd]
 12: (RGWRadosThread::Worker::entry()+0x133) [0x7f889af72083]
 13: (()+0x7df5) [0x7f889a4eddf5]
 14: (clone()+0x6d) [0x7f8899af91ad]

#4 Updated by Casey Bodley 6 months ago

  • Status changed from New to Need Review
  • Assignee set to Casey Bodley

#5 Updated by Casey Bodley 6 months ago

  • Backport set to jewel kraken

#6 Updated by Casey Bodley 6 months ago

  • Status changed from Need Review to Pending Backport
  • Priority changed from Urgent to High

#7 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #20346: jewel: rgw: meta sync thread crash at RGWMetaSyncShardCR added

#8 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #20347: kraken: rgw: meta sync thread crash at RGWMetaSyncShardCR added

#9 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF