Project

General

Profile

Bug #40388

Mimic: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist

Added by Lazuardi Nasution 2 months ago. Updated about 2 months ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
06/16/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:

Description

Bug #19185 still happen on Mimic (v13.2.6). I must remove entire cache pool to have affected OSDs normal again. Itis happen on Jewel (v10.2.7) and Luminous (v12.2.8) too.


Related issues

Related to Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist Resolved 03/03/2017

History

#1 Updated by Lazuardi Nasution 2 months ago

Lazuardi Nasution wrote:

Bug #19185 still happen on Mimic (v13.2.6). I must remove entire cache pool to have affected OSDs normal again. Itis happen on Jewel (v10.2.7) and Luminous (v12.2.8) too.

Some piece of log.

2019-06-12 03:34:32.495 7fe36229d700 -1 *** Caught signal (Aborted) **
 in thread 7fe36229d700 thread_name:tp_osd_tp

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5e0) [0x7fe3844105e0]
 2: (gsignal()+0x37) [0x7fe3834391f7]
 3: (abort()+0x148) [0x7fe38343a8e8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x7fe38788c468]
 5: (()+0x26e4f7) [0x7fe38788c4f7]
 6: (PrimaryLogPG::hit_set_trim(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >&, unsigned int)+0x930) [0x56285a1f6820]
 7: (PrimaryLogPG::hit_set_persist()+0xa0c) [0x56285a1fae9c]
 8: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2989) [0x56285a2112d9]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xc99) [0x56285a215fd9]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b7) [0x56285a06d767]
 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x56285a2e8de2]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x592) [0x56285a08d772]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d3) [0x7fe3878920a3]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe387892c90]
 15: (()+0x7e25) [0x7fe384408e25]
 16: (clone()+0x6d) [0x7fe3834fc34d]

#2 Updated by Brad Hubbard 2 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
  • ceph-qa-suite rados added
  • Component(RADOS) OSD added

Can you upload a coredump as well as details of your OS and a log with debug_osd=20 set please? If the files are large you can use ceph-post-file and report the ID here.

#3 Updated by Brad Hubbard 2 months ago

  • Related to Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist added

#4 Updated by Lazuardi Nasution 2 months ago

I'm afraid I cannot replicate this problem and do debugging and core dump anymore since I have removed the entire cache pool as workaround. What I remember what caused the problem at the first was adjusting hit_set_count and hit_set_period.

#5 Updated by Neha Ojha 2 months ago

  • Status changed from New to Can't reproduce

#6 Updated by Lazuardi Nasution about 2 months ago

Is there any kind of logs I can add to help on this case?

Also available in: Atom PDF