Project

General

Profile

Actions

Bug #17435

open

Crash in ceph-fuse in ObjectCacher::trim while adding an OSD

Added by John Spray over 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From: http://tracker.ceph.com/issues/17270, in which there was initially a crash during writes, and this appears to be an unrelated crash during reads.

First guess would be that this is another ObjectCacher bug exposed by the unusual timing cases we see during PG migration.

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7fbac4e14d35]
 2: (ObjectCacher::trim()+0x36f) [0x7fbac4be2f53]
 3: (ObjectCacher::_readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*, bool)+0x2e86) [0x7fbac4be673a]
 4: (ObjectCacher::C_RetryRead::finish(int)+0x76) [0x7fbac4bf050c]
 5: (Context::complete(int)+0x27) [0x7fbac4b41419]
 6: (void finish_contexts<Context>(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x2ef) [0x7fbac4bf2634]
 7: (ObjectCacher::bh_read_finish(long, sobject_t, unsigned long, long, unsigned long, ceph::buffer::list&, int, bool)+0x194c) [0x7fbac4be0500]
 8: (ObjectCacher::C_ReadFinish::finish(int)+0x90) [0x7fbac4bf005a]
 9: (Context::complete(int)+0x27) [0x7fbac4b41419]
 10: (C_Lock::finish(int)+0x55) [0x7fbac4b4157d]
 11: (Context::complete(int)+0x27) [0x7fbac4b41419]
 12: (Finisher::finisher_thread_entry()+0x396) [0x7fbac4c91f82]
 13: (Finisher::FinisherThread::entry()+0x1c) [0x7fbac4b4e460]
 14: (Thread::entry_wrapper()+0xc1) [0x7fbac4dfdc35]
 15: (Thread::_entry_func(void*)+0x18) [0x7fbac4dfdb6a]
 16: (()+0x7dc5) [0x7fbac2f05dc5]
 17: (clone()+0x6d) [0x7fbac1debced]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Jay Lee almost 7 years ago

the ceph-fuse client crash when top application reading , the version is 10.2.3.
bh_lru_rest containt a wrong state BH which the ref is 0 buf the stat is not clean.

ceph version 10.2.3 ()
1: (()+0x3cf41a) [0x2b153a8bf41a]
2: (()+0xf100) [0x2b1544c1e100]
3: (gsignal()+0x37) [0x2b15463eb5f7]
4: (abort()+0x148) [0x2b15463ecce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x275) [0x2b153a9e0f85]
6: (ObjectCacher::trim(ObjectCacher::ObjectSet*)+0x804) [0x2b153a857d64]
7: (ObjectCacher::_readx(ObjectCacher::OSDRead*, ObjectCacher::ObjectSet*, Context*, bool)+0x14b8) [0x2b153a86c748]
8: (ObjectCacher::C_RetryRead::finish(int)+0x24) [0x2b153a83d2b4]
9: (Context::complete(int)+0x9) [0x2b153a7eba09]
10: (void finish_contexts<Context>(cephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xb6) [0x2b153a83f236]
11: (ObjectCacher::bh_read_finish(long, ObjectCacher::Object*, sobject_t, unsigned long, long, unsigned long, ceph::buffer::list&, int, bool)+0x55f) [0x2b153a861abf]
12: (ObjectCacher::C_ReadFinish::finish(int)+0x9a) [0x2b153a83dc5a]
13: (Context::complete(int)+0x9) [0x2b153a7eba09]
14: (C_Lock::finish(int)+0x29) [0x2b153a7ec4f9]
15: (Context::complete(int)+0x9) [0x2b153a7eba09]
16: (Finisher::finisher_thread_entry()+0x22e) [0x2b153a8e393e]
17: (()+0x7dc5) [0x2b1544c16dc5]
18: (clone()+0x6d) [0x2b15464ac21d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions

Also available in: Atom PDF