Project

General

Profile

Actions

Bug #491

closed

osd: pg incorrectly going active

Added by Sage Weil over 13 years ago. Updated about 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This wiped out some data on ceph-playground:

10.10.14_12:41:43.669886 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=4407 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] got 1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) log(0'0,4571'1] missing(0)
10.10.14_12:41:43.669907 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] my log = log(4187'58229,4187'58234]
4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10.10.14_01:19:01.605147 indexed
4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10.10.14_01:19:01.605829 indexed
4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10.10.14_01:19:01.606433 indexed
4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10.10.14_01:19:01.606576 indexed
4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384 indexed

10.10.14_12:41:43.669969 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] osd1 log = log(0'0,4571'1]
4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875

10.10.14_12:41:43.669993 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log log(0'0,4571'1] from osd1 into log(4187'58229,4187'58234]
10.10.14_12:41:43.670009 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log extending tail to 0'0
10.10.14_12:41:43.670022 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log extending head to 4571'1
10.10.14_12:41:43.670046 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray]   ? 4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875
10.10.14_12:41:43.670065 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log 4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875
10.10.14_12:41:43.670088 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_log divergent 4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384
10.10.14_12:41:43.670109 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58233]) lcod 0'0 stray m=1] merge_log divergent 4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10
.10.14_01:19:01.606576
10.10.14_12:41:43.670238 7faf37fff910 filestore(/mnt/osd) getattr /mnt/osd/current/0.133_head/100000da1c4.00000000_head '_' = 133
10.10.14_12:41:43.670264 7faf37fff910 osd0 5185 pg[0.133( v 4253'16471 (4241'16469,4253'16471] n=14846 ec=2 les=5181 5153/5177/5177) [0,3]/[0] r=0 lcod 0'0 mlcod 0'0 active+degraded] build_backlog_map  1107'4492 (0'0) b 100000da1c4.00000000/head by client13752.1:745451 10.09.07_20:05:50.34
2044
10.10.14_12:41:43.670294 7faf37fff910 filestore(/mnt/osd) getattr /mnt/osd/current/0.133_head/100000da1ec.00000000_head '_'
10.10.14_12:41:43.670364 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58232]) lcod 0'0 stray m=1] merge_log divergent 4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10
.10.14_01:19:01.606433
10.10.14_12:41:43.670389 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58231]) lcod 0'0 stray m=1] merge_log divergent 4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10
.10.14_01:19:01.605829
10.10.14_12:41:43.670411 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58230]) lcod 0'0 stray m=1] merge_log divergent 4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10
.10.14_01:19:01.605147
10.10.14_12:41:43.670445 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry  had 4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10.10.14_01:19:01.605147 new dne : deleting
10.10.14_12:41:43.670471 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry  had 4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10.10.14_01:19:01.605829 new dne : deleting
10.10.14_12:41:43.670493 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry  had 4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10.10.14_01:19:01.606433 new dne : deleting
10.10.14_12:41:43.670515 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry  had 4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10.10.14_01:19:01.606576 new dne : deleting
10.10.14_12:41:43.670537 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry  had 4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384 new dne : deleting


Notice that osd2 (unfortunately, no logs! :( ) made a write to an empty pg in epoch 4571, generating a skewed version number. Then the merge_log code deleted the divergent object versions.

Was running ceph version 0.21.3 (e5882981b55f3c74d6b8b22a2bf5fbec81b775e6).


Related issues 2 (0 open2 closed)

Related to Ceph - Feature #453: osd: return error (instead of blocking) on lost objectsResolvedColin McCabe10/19/201010/19/2010

Actions
Related to Ceph - Feature #526: osd: unfound objects reworkResolvedColin McCabe10/29/2010

Actions
Actions #1

Updated by Sage Weil about 13 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF