Actions
Bug #491
closedosd: pg incorrectly going active
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This wiped out some data on ceph-playground:
10.10.14_12:41:43.669886 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=4407 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] got 1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) log(0'0,4571'1] missing(0) 10.10.14_12:41:43.669907 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] my log = log(4187'58229,4187'58234] 4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10.10.14_01:19:01.605147 indexed 4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10.10.14_01:19:01.605829 indexed 4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10.10.14_01:19:01.606433 indexed 4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10.10.14_01:19:01.606576 indexed 4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384 indexed 10.10.14_12:41:43.669969 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] osd1 log = log(0'0,4571'1] 4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875 10.10.14_12:41:43.669993 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log log(0'0,4571'1] from osd1 into log(4187'58229,4187'58234] 10.10.14_12:41:43.670009 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (4187'58229,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log extending tail to 0'0 10.10.14_12:41:43.670022 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log extending head to 4571'1 10.10.14_12:41:43.670046 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] ? 4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875 10.10.14_12:41:43.670065 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray] merge_log 4571'1 (0'0) m 10000700580.00000000/head by mds0.70:39005 10.10.14_07:12:41.491875 10.10.14_12:41:43.670088 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_log divergent 4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384 10.10.14_12:41:43.670109 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58233]) lcod 0'0 stray m=1] merge_log divergent 4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10 .10.14_01:19:01.606576 10.10.14_12:41:43.670238 7faf37fff910 filestore(/mnt/osd) getattr /mnt/osd/current/0.133_head/100000da1c4.00000000_head '_' = 133 10.10.14_12:41:43.670264 7faf37fff910 osd0 5185 pg[0.133( v 4253'16471 (4241'16469,4253'16471] n=14846 ec=2 les=5181 5153/5177/5177) [0,3]/[0] r=0 lcod 0'0 mlcod 0'0 active+degraded] build_backlog_map 1107'4492 (0'0) b 100000da1c4.00000000/head by client13752.1:745451 10.09.07_20:05:50.34 2044 10.10.14_12:41:43.670294 7faf37fff910 filestore(/mnt/osd) getattr /mnt/osd/current/0.133_head/100000da1ec.00000000_head '_' 10.10.14_12:41:43.670364 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58232]) lcod 0'0 stray m=1] merge_log divergent 4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10 .10.14_01:19:01.606433 10.10.14_12:41:43.670389 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58231]) lcod 0'0 stray m=1] merge_log divergent 4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10 .10.14_01:19:01.605829 10.10.14_12:41:43.670411 7faf3f92c910 osd0 5185 pg[1.c1( v 4187'58234 (0'0,4187'58234] n=2663 ec=2 les=5185 5152/5152/5149) [1,0] r=1 (log bound mismatch, actual=[4187'58230,4187'58230]) lcod 0'0 stray m=1] merge_log divergent 4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10 .10.14_01:19:01.605147 10.10.14_12:41:43.670445 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry had 4187'58230 (4187'58229) m 200.00003af5/head by mds0.70:32451 10.10.14_01:19:01.605147 new dne : deleting 10.10.14_12:41:43.670471 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry had 4187'58231 (4187'58230) m 200.00003af5/head by mds0.70:32452 10.10.14_01:19:01.605829 new dne : deleting 10.10.14_12:41:43.670493 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry had 4187'58232 (4187'58231) m 200.00003af5/head by mds0.70:32456 10.10.14_01:19:01.606433 new dne : deleting 10.10.14_12:41:43.670515 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry had 4187'58233 (4187'58232) m 200.00003af5/head by mds0.70:32457 10.10.14_01:19:01.606576 new dne : deleting 10.10.14_12:41:43.670537 7faf3f92c910 osd0 5185 pg[1.c1( v 4571'1 lc 4187'58234 (0'0,4571'1] n=1 ec=2 les=5185 5152/5152/5149) [1,0] r=1 lcod 0'0 stray m=1] merge_old_entry had 4187'58234 (4187'58233) m 200.00003af5/head by mds0.70:32463 10.10.14_01:19:01.942384 new dne : deleting
Notice that osd2 (unfortunately, no logs! :( ) made a write to an empty pg in epoch 4571, generating a skewed version number. Then the merge_log code deleted the divergent object versions.
Was running ceph version 0.21.3 (e5882981b55f3c74d6b8b22a2bf5fbec81b775e6).
Updated by Sage Weil about 13 years ago
- Status changed from New to Can't reproduce
Actions