Feature #13837: Make snap_trimming robust to inconsistent snaphots - Ceph - Ceph

Actions

Copy link

Feature #13837

closed

Make snap_trimming robust to inconsistent snaphots

Added by Paul Emmerich over 8 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

High

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

v9.2.1

Pull request ID:

Description

Hi,

we upgraded to 9.2 and we still encountering a crash that looks a lot like issue #12665
So it looks like the patch didn't make it to the 9.2 release, okay. So we installed versiob 9.2.0-859-gb2f2e6c (b2f2e6c84e89e3b5f02b5a97e8659d9338f9f772) from http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/master/dists/trusty/main/ which should have the fix mentioned in #12665.

However, one of our OSDs is still crashing constantly:

 ceph version 9.2.0-859-gb2f2e6c (b2f2e6c84e89e3b5f02b5a97e8659d9338f9f772)
 1: (()+0x80beba) [0x7f19284efeba]
 2: (()+0x10340) [0x7f1926bc8340]
 3: (ReplicatedPG::trim_object(hobject_t const&)+0x45d) [0x7f19281676fd]
 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x424) [0x7f192819f934]
 5: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x7f19281ce894]
 6: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x12b) [0x7f19281baccb]
 7: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x84) [0x7f19281bae94]
 8: (ReplicatedPG::snap_trimmer(unsigned int)+0x454) [0x7f192813e004]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x893) [0x7f192804b133]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f19285d293f]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f19285d4840]
 12: (()+0x8182) [0x7f1926bc0182]
 13: (clone()+0x6d) [0x7f1924f0747d]

A full log is available here: https://dl.dropboxusercontent.com/u/24773939/ceph-master.log.gz

I can't get a crash with debug 20/20 as the OSD doesn't seem to do anything at all with debug 20/20: https://dl.dropboxusercontent.com/u/24773939/ceph-master-debug20.log.gz

There are about 1k to 5k IOPS on the cluster, yet the log doesn't show any activity and the bug doesn't trigger. netstat also shows no client connections. This is reproducible, I tried to restart it multiple times with debug 20/20 or 15/15 and it just doesn't do anything.

(Logs as links because redmine gives a "Request Entity too large" error)

Infernalis also breaks the work-around with disk threads = 0 mentioned in the other bug report, so one of our OSDs is now essentially dead. Taking it out also doesn't help as this bug then migrates to other OSDs in our experience...

Any ideas?

Paul

Actions

Copy link

Updated by Paul Emmerich over 8 years ago

Looks like debug 10/10 works. I've uploaded a full log here, it contains several crashes: https://www.dropbox.com/s/yer4sj9chldrv43/ceph-osd.73-debug-10.log.gz?dl=0

The important part of one of the crashes seems to be this:
(Filtered to only contain the logs of thread 7fb56442b700 which segfaulted while trying to trim a snap)

2015-11-20 11:27:03.260104 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] snap_trimmer entry
2015-11-20 11:27:03.260116 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] snap_trimmer posting
2015-11-20 11:27:03.260121 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] SnapTrimmer state<NotTrimming>: NotTrimming react
2015-11-20 11:27:03.260125 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] SnapTrimmer state<NotTrimming>: NotTrimming: trimming 8
2015-11-20 11:27:03.260131 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects react
2015-11-20 11:27:03.260137 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects: trimming snap 8
2015-11-20 11:27:03.260263 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] SnapTrimmer state<Trimming/TrimmingObjects>: TrimmingObjects react trimming 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/8
2015-11-20 11:27:03.260276 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] get_object_context: obc NOT found in cache: 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/8
2015-11-20 11:27:03.260356 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] populate_obc_watchers 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/8
2015-11-20 11:27:03.260363 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] get_object_context: creating obc from disk: 0x7fb59e4ceb00
2015-11-20 11:27:03.260367 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] get_object_context: 0x7fb59e4ceb00 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/8 rwstate(none n=0 w=0) oi: 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/8(38790'708618 client.39610810.0:350304 [8] dirty|omap_digest s 4194304 uv 600425 od ffffffff) ssc: 0x7fb590dc70c0 snapset: 0=[]:[]
2015-11-20 11:27:03.260381 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] get_object_context: obc NOT found in cache: 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/snapdir
2015-11-20 11:27:03.260401 7fb56442b700 10 osd.73 pg_epoch: 132639 pg[34.96( v 132625'1688938 (132395'1685357,132625'1688938] local-les=132639 n=2136 ec=27285 les/c/f 132639/132639/0 132638/132638/132638) [73,59,30] r=0 lpr=132638 crt=132625'1688938 lcod 0'0 mlcod 0'0 active+clean NIBBLEWISE snaptrimq=[8~1,17~1]] get_object_context: no obc for soid 34/cbce0096/rbd_data.25c69b72ae8944a.0000000000000ebe/snapdir and !can_create
2015-11-20 11:27:03.262557 7fb56442b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fb56442b700

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (()+0x7d02ca) [0x7fb58c8982ca]
 2: (()+0x10340) [0x7fb58afac340]
 3: (ReplicatedPG::trim_object(hobject_t const&)+0x44b) [0x7fb58c5b326b]
 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x424) [0x7fb58c5e9534]
 5: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x7fb58c613e64]
 6: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x12b) [0x7fb58c60156b]
 7: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x84) [0x7fb58c601734]
 8: (ReplicatedPG::snap_trimmer(unsigned int)+0x4bc) [0x7fb58c58a29c]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8ae) [0x7fb58c418a6e]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875) [0x7fb58c9757f5]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fb58c9776f0]
 12: (()+0x8182) [0x7fb58afa4182]
 13: (clone()+0x6d) [0x7fb5892eb47d]

The objects on which it is crashing do not exist; the image they belong to have been deleted months ago. rados -p <pool> ls | grep 25c69b72ae8944a also doesn't find anything.

Actions

Copy link

Updated by Paul Emmerich over 8 years ago

I also tried to set osd pg max concurrent snap trims = 0 as a work-around to prevent trimming. However, it then crashes with a new error: osd/ReplicatedPG.cc: 10773: FAILED assert(r >= 0)

Log filtered to the crashing thread:

2015-11-20 15:48:21.174127 7f664b728700 10 osd.73 132695 dequeue_op 0x7f667aba3a00 prio 127 cost 0 latency 0.000075 pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695) v2 pg pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE]
2015-11-20 15:48:21.174155 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] handle_message: 0x7f667aba3a00
2015-11-20 15:48:21.174165 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] do_scan pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695) v2
2015-11-20 15:48:21.174181 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] scan_range from 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head
2015-11-20 15:48:21.174415 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE]  got 3 items, next MAX
2015-11-20 15:48:21.176796 7f664b728700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(int, int, PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f664b728700 time 2015-11-20 15:48:21.174486
  -136> 2015-11-20 15:48:21.174127 7f664b728700 10 osd.73 132695 dequeue_op 0x7f667aba3a00 prio 127 cost 0 latency 0.000075 pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695) v2 pg pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE]
  -134> 2015-11-20 15:48:21.174148 7f664b728700  5 -- op tracker -- seq: 652, time: 2015-11-20 15:48:21.174148, event: reached_pg, op: pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695)
  -133> 2015-11-20 15:48:21.174155 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] handle_message: 0x7f667aba3a00
  -132> 2015-11-20 15:48:21.174165 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] do_scan pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695) v2
  -129> 2015-11-20 15:48:21.174177 7f664b728700  5 -- op tracker -- seq: 652, time: 2015-11-20 15:48:21.174176, event: started, op: pg_scan(get_digest 39.252 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head-MIN e 132695/132695)
  -123> 2015-11-20 15:48:21.174181 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE] scan_range from 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head
  -101> 2015-11-20 15:48:21.174415 7f664b728700 10 osd.73 pg_epoch: 132695 pg[39.252( v 132681'98982 (130515'95523,132681'98982] lb 39/2dc03252/rbd_data.34a2dc92ae8944a.000000000000188c/head (NIBBLEWISE) local-les=132627 n=27 ec=123816 les/c/f 132695/132681/0 132694/132694/132652) [73,46,130]/[46,130,11] r=-1 lpr=132694 pi=131207-132693/32 luod=0'0 crt=132681'98982 lcod 0'0 active+remapped NIBBLEWISE]  got 3 items, next MAX
     0> 2015-11-20 15:48:21.176796 7f664b728700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(int, int, PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f664b728700 time 2015-11-20 15:48:21.174486
2015-11-20 15:48:21.235944 7f664b728700 -1 *** Caught signal (Aborted) **
 in thread 7f664b728700
     0> 2015-11-20 15:48:21.235944 7f664b728700 -1 *** Caught signal (Aborted) **
 in thread 7f664b728700

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (()+0x7d02ca) [0x7f66769b72ca]
 2: (()+0x10340) [0x7f66750cb340]
 3: (gsignal()+0x39) [0x7f6673346cc9]
 4: (abort()+0x148) [0x7f667334a0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f6673c51535]
 6: (()+0x5e6d6) [0x7f6673c4f6d6]
 7: (()+0x5e703) [0x7f6673c4f703]
 8: (()+0x5e922) [0x7f6673c4f922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7f6676aa4118]
 10: (ReplicatedPG::scan_range(int, int, PG::BackfillInterval*, ThreadPool::TPHandle&)+0xad2) [0x7f66766be1d2]
 11: (ReplicatedPG::do_scan(std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x7b4) [0x7f66766be9b4]
 12: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x3bf) [0x7f667669f9ef]
 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f66765128bd]
 14: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f6676512add]
 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x7f6676537a84]
 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875) [0x7f6676a947f5]
 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f6676a966f0]
 18: (()+0x8182) [0x7f66750c3182]
 19: (clone()+0x6d) [0x7f667340a47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

Updated by Paul Emmerich over 8 years ago

Applying https://github.com/ceph/ceph/compare/v9.2.0...dzafman:wip-snap-scrub on the v9.2.0 tag (with a small and obvious fix for the objectstore-tool part of the patch to make it compile with v9.2.0) fixes the issue.

The crash in my previous post seems to be unrelated as it happens in a completely different PG. I will open another issue for this.

Actions

Copy link

Updated by Paul Emmerich over 8 years ago

Nevermind, it just took longer until it crashed.

Issue #12665 is still present in both master and 9.2 with the "dzafman:wip-snap-scrub" branch applied.

Actions

Copy link

Updated by Paul Emmerich over 8 years ago

Related crash that happens when trying to repair one of the affected PGs:

(Filtered to only contain the actions of the crashing thread)

    -9> 2015-11-27 14:26:42.783133 7f11a03c8700 10 osd.73 133307 do_recovery can start 1 (0/2 rops)
    -8> 2015-11-27 14:26:42.783138 7f11a03c8700 10 osd.73 133307 do_recovery starting 1 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]]
    -7> 2015-11-27 14:26:42.783156 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] recover_primary recovering 0 in pg
    -6> 2015-11-27 14:26:42.783170 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] recover_primary missing(9)
    -5> 2015-11-27 14:26:42.783209 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] recover_primary 34/c0f24096/rbd_data.25c69b72ae8944a.0000000000000bbf/8 38790'708528 (missing)
    -4> 2015-11-27 14:26:42.783225 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] get_object_context: obc NOT found in cache: 34/c0f24096/rbd_data.25c69b72ae8944a.0000000000000bbf/head
    -3> 2015-11-27 14:26:42.783285 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] get_object_context: no obc for soid 34/c0f24096/rbd_data.25c69b72ae8944a.0000000000000bbf/head and !can_create
    -2> 2015-11-27 14:26:42.783299 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] get_object_context: obc NOT found in cache: 34/c0f24096/rbd_data.25c69b72ae8944a.0000000000000bbf/snapdir
    -1> 2015-11-27 14:26:42.783341 7f11a03c8700 10 osd.73 pg_epoch: 133307 pg[34.96( v 133307'1794179 (133225'1791174,133307'1794179] local-les=133307 n=2125 ec=27285 les/c/f 133307/133307/0 133306/133306/133306) [73,59,30] r=0 lpr=133306 crt=133307'1794177 lcod 133307'1794178 mlcod 133307'1794178 active+recovering+inconsistent NIBBLEWISE m=9 snaptrimq=[8~1,b~1,17~1,1b~1]] get_object_context: no obc for soid 34/c0f24096/rbd_data.25c69b72ae8944a.0000000000000bbf/snapdir and !can_create
     0> 2015-11-27 14:26:42.785276 7f11a03c8700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_missing(const hobject_t&, eversion_t, int, PGBackend::RecoveryHandle*)' thread 7f11a03c8700 time 2015-11-27 14:26:42.783351
osd/ReplicatedPG.cc: 8939: FAILED assert(head_obc)

 ceph version ad5c9bb63fbfb17b873e3628e9d8ca7bb63da21 (2ad5c9bb63fbfb17b873e3628e9d8ca7bb63da21)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f11cca6210b]
 2: (ReplicatedPG::recover_missing(hobject_t const&, eversion_t, int, PGBackend::RecoveryHandle*)+0x834) [0x7f11cc694544]
 3: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x11c3) [0x7f11cc697373]
 4: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0x53b) [0x7f11cc6b333b]
 5: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x31e) [0x7f11cc4ecb5e]
 6: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f11cc533307]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7f11cca53b86]
 8: (ThreadPool::WorkThread::entry()+0x10) [0x7f11cca54a50]
 9: (()+0x8182) [0x7f11cb080182]
 10: (clone()+0x6d) [0x7f11c93c747d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Version: v9.2.0 tag with PR 5783 applied to it.

To summarize: PR 5783/master fixes the crash when scrubbing, but snap trimming or trying to repair the errors still crashes :(

Actions

Copy link

Updated by Sage Weil over 8 years ago

Assignee set to David Zafman

Actions

Copy link

Updated by Samuel Just over 7 years ago

Tracker changed from Bug to Feature
Subject changed from crash while trimming snapshots to Make snap_trimming robust to inconsistent snaphots

Actions

Copy link

Updated by Christian Theune over 7 years ago

Argh. We're being bitten by this, too. We had a huge XFS consistency issue when updating from Linux 4.1 to 4.4 (maybe induced by a previous problem that had been lingering with on-disk vs. in-memory issues). When resolving that consistency problem, we had to delete some unfound objects which we used mark_unfound_lost for.

Looking at the various problems around trims with inconsistencies we're now in a rough spot: we need to trim snapshots because we regularly make backups, but we have to disable trimming completely on affected OSDs, and basically when rebalancing happens we have to repeat this on the OSDs that the inconsistent PGs move to.

After a dozen or so hours, I still have to find a way to get rid of the dangerous trimq entries. I was not able to:

- recreate a synthetic head object to convince the trimmer to move on
- delete the to-be-trimmed object manually
- remove the dangerous entries from the trimq

Actions

Copy link

Updated by Christian Theune over 7 years ago

As for what I would love: this is currently on Hammer 0.94.7. The newer releases do not seem to have anything that would alleviate that

I did have copies of the objects we marked unfound, but I just saw that the copy was moved to an ext4 system which apparently made it loose its xattr.

I wonder whether I can recreate the xattr for the snapset, actually.

What also didn't work:

- using the undocumented 'clear-snapsets' as that also needs a

Actions

Copy link

#10

Updated by Christian Theune over 7 years ago

I think a workaround we can consider is to move all images to a fresh pool and then just delete the pool blindly, that should allow us to get rid of the inconsistencies. We'll start trying that tomorrow.

Actions

Copy link

#11

Updated by David Zafman almost 7 years ago

Priority changed from Urgent to High

Actions

Copy link

#12

Updated by David Zafman almost 7 years ago

Status changed from New to In Progress

Actions

Copy link

#13

Updated by David Zafman almost 7 years ago

New pg state snaptrim_error. This occurs when snaptrim can't proceed until a pg is repaired.

$ bin/ceph -s
  cluster:
    id:     d6415748-f325-4e5c-ae3d-b5bb1f33cbb5
    health: HEALTH_WARN
            1 pgs snaptrim_error

  services:
    mon: 1 daemons, quorum a
    mgr: x(active)
    osd: 6 osds: 6 up, 6 in

  data:
    pools:   2 pools, 9 pgs
    objects: 147 objects, 168 kB
    usage:   354 GB used, 245 GB / 599 GB avail
    pgs:     8 active+clean
             1 active+clean+snaptrim_error

Actions

Copy link

#14

Updated by David Zafman almost 7 years ago

Status changed from In Progress to Resolved

https://github.com/ceph/ceph/pull/15635

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #13837

Make snap_trimming robust to inconsistent snaphots

Updated by Paul Emmerich over 8 years ago

Updated by Paul Emmerich over 8 years ago

Updated by Paul Emmerich over 8 years ago

Updated by Paul Emmerich over 8 years ago

Updated by Paul Emmerich over 8 years ago

Updated by Sage Weil over 8 years ago

Updated by Samuel Just over 7 years ago

Updated by Christian Theune over 7 years ago

Updated by Christian Theune over 7 years ago

Updated by Christian Theune over 7 years ago

Updated by David Zafman almost 7 years ago

Updated by David Zafman almost 7 years ago

Updated by David Zafman almost 7 years ago

Updated by David Zafman almost 7 years ago