Project

General

Profile

Actions

Bug #41348

closed

osd: need clear PG_STATE_CLEAN when repair object

Added by Zengran Zhang over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-08-20 09:32:38.216104 7f6504f12700 10 osd.0 pg_epoch: 133696 pg[3.36( v 133696'863063 (133630'861475,133696'863063] local-lis/les=133695/133696 n=106 ec=69/69 lis/c 133695/133695 les/c/f 133696/133696/0 133695/133695/133421) [0,9,11,4,7] r=0 lpr=133695 crt=133696'863063 mlcod 133696'863061 active+clean+snaptrim snaptrimq=[3bf4~7,3bfc~2d]] do_osd_op read 28672~8192
2019-08-20 09:32:38.216842 7f6504f12700 10 osd.0 pg_epoch: 133696 pg[3.36( v 133696'863063 (133630'861475,133696'863063] local-lis/les=133695/133696 n=106 ec=69/69 lis/c 133695/133695 les/c/f 133696/133696/0 133695/133695/133421) [0,9,11,4,7] r=0 lpr=133695 crt=133696'863063 mlcod 133696'863061 active+clean+snaptrim snaptrimq=[3bf4~7,3bfc~2d]] rep_repair_primary_object 3:6d4d0f80:::1000000feec.00000000:head peers osd.{0,4,7,9,11}
2019-08-20 09:32:38.216925 7f6504f12700 1 log_channel(cluster) log [ERR] : 3.36 missing primary copy of 3:6d4d0f80:::1000000feec.00000000:head, will try copies on 4,7,9,11
2019-08-20 09:32:38.216940 7f6504f12700 10 osd.0 pg_epoch: 133696 pg[3.36( v 133696'863063 (133630'861475,133696'863063] local-lis/les=133695/133696 n=106 ec=69/69 lis/c 133695/133695 les/c/f 133696/133696/0 133695/133695/133421) [0,9,11,4,7] r=0 lpr=133695 crt=133696'863063 mlcod 133696'863061 active+clean+snaptrim m=1 snaptrimq=[3bf4~7,3bfc~2d]] read got -11 / 8192 bytes from obj 3:6d4d0f80:::1000000feec.00000000:head. try again.
2019-08-20 09:32:38.216996 7f6506715700 10 osd.0 pg_epoch: 133696 pg[3.36( v 133696'863063 (133630'861475,133696'863063] local-lis/les=133695/133696 n=106 ec=69/69 lis/c 133695/133695 les/c/f 133696/133696/0 133695/133695/133421) [0,9,11,4,7] r=0 lpr=133695 crt=133696'863063 mlcod 133696'863061 active+clean+snaptrim m=1 snaptrimq=[3bf4~7,3bfc~2d]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork: trimming snap 3bf4
2019-08-20 09:32:38.218817 7f6506715700 10 osd.0 pg_epoch: 133696 pg[3.36( v 133696'863063 (133630'861475,133696'863063] local-lis/les=133695/133696 n=106 ec=69/69 lis/c 133695/133695 les/c/f 133696/133696/0 133695/133695/133421) [0,9,11,4,7] r=0 lpr=133695 crt=133696'863063 mlcod 133696'863061 active+clean+snaptrim m=1 snaptrimq=[3bf4~7,3bfc~2d]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork react trimming 3:6d4d0f80:::1000000feec.00000000:3bf4
/root/rpmbuild/BUILD/ceph-12.2.7-1326-gdb735a3/src/osd/PrimaryLogPG.cc: 10090: FAILED assert(attrs || !pg_log.get_missing().is_missing(soid) || (pg_log.get_log().objects.count(soid) && pg_log.get_log().objects.find(soid)
>second->op == pg_log_entry_t::LOST_REVERT))

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x56286c50f830]
2: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > > const*)+0x9b0) [0x56286c0d9460]
3: (PrimaryLogPG::trim_object(bool, hobject_t const&, std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x19c) [0x56286c0f2efc]
4: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x8ea) [0x56286c0f5a7a]
5: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
a, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const
)+0xb0) [0x56286c161910]
6: (PrimaryLogPG::snap_trimmer(unsigned int)+0x1fe) [0x56286c0b023e]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x19b4) [0x56286bf7e864]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x56286c515279]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56286c5171f0]
10: (()+0x7e25) [0x7f6528feae25]
11: (clone()+0x6d) [0x7f65280de34d]


Related issues 3 (0 open3 closed)

Copied to Ceph - Backport #41442: mimic: osd: need clear PG_STATE_CLEAN when repair objectResolvedNathan CutlerActions
Copied to Ceph - Backport #41443: nautilus: osd: need clear PG_STATE_CLEAN when repair objectResolvedNathan CutlerActions
Copied to Ceph - Backport #41733: luminous: osd: need clear PG_STATE_CLEAN when repair objectResolvedNathan CutlerActions
Actions #1

Updated by David Zafman over 4 years ago

  • Status changed from New to In Progress
  • Pull request ID set to 29756
Actions #2

Updated by Kefu Chai over 4 years ago

  • Status changed from In Progress to Pending Backport
  • Backport set to mimic, nautilus
Actions #3

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41442: mimic: osd: need clear PG_STATE_CLEAN when repair object added
Actions #4

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41443: nautilus: osd: need clear PG_STATE_CLEAN when repair object added
Actions #5

Updated by David Zafman over 4 years ago

This should probably back ported to Luminous

Actions #6

Updated by Neha Ojha over 4 years ago

  • Backport changed from mimic, nautilus to luminous,mimic,nautilus
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41733: luminous: osd: need clear PG_STATE_CLEAN when repair object added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF