Project

General

Profile

Actions

Bug #1688

closed

Benjamin: pg stuck in scrub

Added by Samuel Just over 12 years ago. Updated about 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Looks like the bug is related to last_update_applied not getting up to last_update on primary. No further scrubbing related output, and both replica maps have been received.

2011-11-03 09:40:33.212299 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] scrub start
2011-11-03 09:40:33.212326 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] update_stats 97'205
2011-11-03 09:40:33.212344 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] scrub requesting scrubmap from osd.185
2011-11-03 09:40:33.212383 7f261e10d700 -- [2607:f298:4:3243::6233]:6814/3611 --> osd.185 [2607:f298:4:3243::6337]:6806/27263 -- replica scrub(pg: 8.2,from:0'0epoch:109) v1 -- ?+0 0x2231c40
2011-11-03 09:40:33.212442 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] scrub requesting scrubmap from osd.235
2011-11-03 09:40:33.212457 7f261e10d700 -- [2607:f298:4:3243::6233]:6814/3611 --> osd.235 [2607:f298:4:3243::6345]:6810/26105 -- replica scrub(pg: 8.2,from:0'0epoch:109) v1 -- ?+0 0x258ee00
2011-11-03 09:40:33.212481 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] build_scrub_map
2011-11-03 09:40:33.212609 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] scan_list scanning 1 objects
2011-11-03 09:40:33.212657 7f261e10d700 filestore(/mnt/osd.81) stat 8.2_head/notify/head = 0 (size 0)
2011-11-03 09:40:33.212668 7f261e10d700 filestore(/mnt/osd.81) getattrs 8.2_head/notify/head
2011-11-03 09:40:33.212707 7f261e10d700 filestore(/mnt/osd.81) getattrs 8.2_head/notify/head getting '
'
2011-11-03 09:40:33.212779 7f261e10d700 filestore(/mnt/osd.81) getattrs 8.2_head/notify/head getting 'snapset'
2011-11-03 09:40:33.212810 7f261e10d700 filestore(/mnt/osd.81) getattrs 8.2_head/notify/head = 0
2011-11-03 09:40:33.212829 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] PG relocked, finalizing
2011-11-03 09:40:33.212845 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] _scan_list scanning 0 objects
2011-11-03 09:40:33.212852 7f261e10d700 filestore(/mnt/osd.81) collection_getattrs /mnt/osd.81/current/8.2_head
2011-11-03 09:40:33.212870 7f261e10d700 filestore(/mnt/osd.81) getattrs /mnt/osd.81/current/8.2_head getting 'ondisklog'
2011-11-03 09:40:33.212884 7f261e10d700 filestore(/mnt/osd.81) getattrs /mnt/osd.81/current/8.2_head getting 'info'
2011-11-03 09:40:33.212905 7f261e10d700 filestore(/mnt/osd.81) collection_getattrs /mnt/osd.81/current/8.2_head = 0
2011-11-03 09:40:33.212913 7f261e10d700 filestore(/mnt/osd.81) read meta/pglog_8.2/0 0~0
2011-11-03 09:40:33.212977 7f261e10d700 filestore(/mnt/osd.81) FileStore::read meta/pglog_8.2/0 0~564/564
2011-11-03 09:40:33.212987 7f261e10d700 filestore(/mnt/osd.81) collection_getattrs /mnt/osd.81/current/8.2_head
2011-11-03 09:40:33.213004 7f261e10d700 filestore(/mnt/osd.81) getattrs /mnt/osd.81/current/8.2_head getting 'ondisklog'
2011-11-03 09:40:33.213017 7f261e10d700 filestore(/mnt/osd.81) getattrs /mnt/osd.81/current/8.2_head getting 'info'
2011-11-03 09:40:33.213036 7f261e10d700 filestore(/mnt/osd.81) collection_getattrs /mnt/osd.81/current/8.2_head = 0
2011-11-03 09:40:33.213044 7f261e10d700 filestore(/mnt/osd.81) read meta/pglog_8.2/0 0~0
2011-11-03 09:40:33.213077 7f261e10d700 filestore(/mnt/osd.81) FileStore::read meta/pglog_8.2/0 0~564/564
2011-11-03 09:40:33.213098 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] done. pg log is 1128 bytes
2011-11-03 09:40:33.213116 7f261e10d700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] wait for cleanup
2011-11-03 09:40:33.213734 7f2622216700 -- [2607:f298:4:3243::6233]:6814/3611 <== osd.235 [2607:f298:4:3243::6345]:6810/26105 5648 ==== osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 ==== 498+0+1805 (2895784529 0 1805060905) 0x21a0b00 con 0x1c7a500
2011-11-03 09:40:33.213761 7f2622216700 osd.81 109 _dispatch 0x21a0b00 osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.213772 7f2622216700 osd.81 109 handle_sub_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 epoch 109
2011-11-03 09:40:33.213778 7f2622216700 osd.81 109 require_same_or_newer_map 109 (i am 109) 0x21a0b00
2011-11-03 09:40:33.213787 7f2622216700 osd.81 109 _share_map_incoming osd.235 [2607:f298:4:3243::6345]:6810/26105 109
2011-11-03 09:40:33.213808 7f2622216700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] enqueue_op 0x21a0b00 osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.213848 7f261f910700 osd.81 109 dequeue_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 pg pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing], 0 more pending
2011-11-03 09:40:33.213866 7f261f910700 osd.81 109 _share_map_outgoing osd.185 [2607:f298:4:3243::6337]:6806/27263 already has epoch 109
2011-11-03 09:40:33.213876 7f261f910700 osd.81 109 _share_map_outgoing osd.235 [2607:f298:4:3243::6345]:6810/26105 already has epoch 109
2011-11-03 09:40:33.213893 7f261f910700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] do_sub_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.213910 7f261f910700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] sub_op_scrub_map
2011-11-03 09:40:33.213925 7f261f910700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] got osd.235 scrub map
2011-11-03 09:40:33.213962 7f261f910700 osd.81 109 dequeue_op 0x21a0b00 finish
2011-11-03 09:40:33.214367 7f2622216700 -- [2607:f298:4:3243::6233]:6814/3611 <== osd.185 [2607:f298:4:3243::6337]:6806/27263 5656 ==== osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 ==== 498+0+1805 (2895784529 0 1805060905) 0x270a080 con 0x1e113c0
2011-11-03 09:40:33.214380 7f2622216700 osd.81 109 _dispatch 0x270a080 osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.214389 7f2622216700 osd.81 109 handle_sub_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 epoch 109
2011-11-03 09:40:33.214396 7f2622216700 osd.81 109 require_same_or_newer_map 109 (i am 109) 0x270a080
2011-11-03 09:40:33.214405 7f2622216700 osd.81 109 _share_map_incoming osd.185 [2607:f298:4:3243::6337]:6806/27263 109
2011-11-03 09:40:33.214424 7f2622216700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] enqueue_op 0x270a080 osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.214454 7f261f10f700 osd.81 109 dequeue_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 pg pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing], 0 more pending
2011-11-03 09:40:33.214466 7f261f10f700 osd.81 109 _share_map_outgoing osd.185 [2607:f298:4:3243::6337]:6806/27263 already has epoch 109
2011-11-03 09:40:33.214474 7f261f10f700 osd.81 109 _share_map_outgoing osd.235 [2607:f298:4:3243::6345]:6810/26105 already has epoch 109
2011-11-03 09:40:33.214505 7f261f10f700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] do_sub_op osd_sub_op(unknown.0.0:0 8.2 /0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2011-11-03 09:40:33.214521 7f261f10f700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] sub_op_scrub_map
2011-11-03 09:40:33.214536 7f261f10f700 osd.81 109 pg[8.2( v 109'110 (58'106,109'110] n=1 ec=45 les/c 108/109 99/99/97) [81,185,235] r=0 luod=109'111 lcod 109'109 mlcod 109'109 !hml active+clean+scrubbing] got osd.185 scrub map

Actions #1

Updated by Samuel Just over 12 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Sage Weil over 12 years ago

  • Category set to OSD

is this addressed by the pg lock vs transaction submit ordering changes?

Actions #3

Updated by Sage Weil over 12 years ago

  • Status changed from New to Won't Fix

old code.

Actions #4

Updated by Samuel Just over 12 years ago

  • Status changed from Won't Fix to 12

Still happening, I'm looking into an instance on benjamin now.

Actions #5

Updated by Sage Weil about 12 years ago

is this old/fixed? haven't seen it in a while

Actions #6

Updated by Sage Weil about 12 years ago

  • Status changed from 12 to Closed
Actions

Also available in: Atom PDF