Project

General

Profile

Actions

Bug #599

closed

recover_master_log, doesn't

Added by Colin McCabe over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is another peering bug. We found it on wido's cluster. Basically, peering never completes.

I just examined PG 0.0, one of the eternally peering PGs.

root@logger:~# ceph pg dump o - | head
2010-11-22 21:16:39.315764 mon <
[pg,dump]
2010-11-22 21:16:39.342433 mon1 -> 'ok' (0)
version 90443
last_osdmap_epoch 6673
last_pg_scan 1693
full_ratio 0.95
nearfull_ratio 0.8
pg_stat objects mip unf degr kb bytes log disklog state v reported up acting last_scrub
0.0 420 0 0 0 444416 454918662 297 297 crashed+peering 1486'422 6549'4376 [0,9,3] [0,9,3] 1486'422 2010-11-15 20:37:17.925299
0.1 432 0 0 0 349946 358160785 198 198 crashed+peering 1486'432 6608'1391 [5,11,9] [5,11,9] 1486'432 2010-11-15 20:42:38.744814

On OSD0, the last function to be called from do_peer is recover_master_log.

2010-11-22 20:22:28.107043 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] peer up [0,9,3],   acting [0,9,3]
2010-11-22 20:22:28.107056 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] peer prior_set is 3,9
2010-11-22 20:22:28.107068 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] recover_master_log
2010-11-22 20:22:28.107080 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] waiting for osd3
2010-11-22 20:22:28.107092 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] waiting for osd9
2010-11-22 20:22:28.107105 7f965f12d710 osd0 6633 pg[0.0( v 1486'422 (1486'419,1486'422] n=420 ec=2 les=6491 6626/6626/6549) [0,9,3] r=0 lcod 0'0 mlcod 0'0 !hml crashed+peering] update_stats 6549'4342

After that, we get pings and osd_maps from osd3 and osd9, but never the PGnot and other messages that we would need to make progress on peering.

It seems like osd3 and osd9 never send us anything except pings and osd_maps.

$ grep '<==' /var/log/ceph/osd.0.log | grep osd3 | grep -v osd_map | grep -v osd_ping 
$
$ grep '<==' /var/log/ceph/osd.0.log | grep osd9 | grep -v osd_map | grep -v osd_ping
$

By comparison, osd1 sent us a lot of stuff.

root@node01:~# grep '<==' /var/log/ceph/osd.0.log | grep osd1 | grep v osd_map | grep -v osd_ping | head -n 10
2010-11-22 20:19:21.961809 7f965f12d710 -
[2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 1 ==== PGnot v1 ==== 115508+0+0 (569735235 0 0) 0x979d700
2010-11-22 20:19:21.997131 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 2 ==== PGq v1 ==== 4757+0+0 (2153010741 0 0) 0xadd21c0
2010-11-22 20:19:36.383863 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 3 ==== PGq v1 ==== 888+0+0 (3717460735 0 0) 0x9ac2380
2010-11-22 20:19:36.385796 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 4 ==== PGnot v1 ==== 22184+0+0 (843986286 0 0) 0xb322540
2010-11-22 20:19:36.509454 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 5 ==== PGnot v1 ==== 101340+0+0 (35662047 0 0) 0xb322380
2010-11-22 20:19:36.821688 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 6 ==== PGq v1 ==== 450+0+0 (3299570323 0 0) 0xb3221c0
2010-11-22 20:19:49.564773 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 7 ==== PGq v1 ==== 888+0+0 (2410340724 0 0) 0x96358c0
2010-11-22 20:19:49.746765 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 8 ==== PGnot v1 ==== 101340+0+0 (3564056599 0 0) 0xa91be00
2010-11-22 20:19:54.402375 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 9 ==== PGnot v1 ==== 124440+0+0 (2647411595 0 0) 0xc070a80
2010-11-22 20:19:54.436352 7f965f12d710 -- [2001:16f8:10:2::c3c3:8f6b]:6803/31146 <== osd1 [2001:16f8:10:2::c3c3:a24f]:6803/25291 10 ==== PGq v1 ==== 1764+0+0 (449635568 0 0) 0xd6a4e00
Actions #1

Updated by Colin McCabe over 13 years ago

Also, I have verified that osd3 and osd9 did NOT crash. They're still running, and they did receive the messages from osd0.

Some messages that osd3 got from osd0:

2010-11-22 20:18:14.111849 7f39747b5710 -- [2001:16f8:10:2::c3c3:2e3a]:6800/7226 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 19 ==== PGnot v1 ==== 40356+0+0 (2525361914 0 0) 0xb5a3e00
2010-11-22 20:18:14.114411 7f39747b5710 -- [2001:16f8:10:2::c3c3:2e3a]:6800/7226 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 20 ==== PGq v1 ==== 1034+0+0 (2241523147 0 0) 0xb5a3a80
2010-11-22 20:18:14.267079 7f39747b5710 -- [2001:16f8:10:2::c3c3:2e3a]:6800/7226 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 21 ==== PGnot v1 ==== 40356+0+0 (611242803 0 0) 0xab5e000
2010-11-22 20:19:11.888429 7f8beb520710 -- [::]:6801/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 1 ==== PGnot v1 ==== 77932+0+0 (3127085297 0 0) 0x9450a80
2010-11-22 20:19:21.083770 7f8beb520710 -- [::]:6801/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 2 ==== PGq v1 ==== 7312+0+0 (168926164 0 0) 0x9450e00
2010-11-22 20:20:27.727473 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 1 ==== PGnot v1 ==== 81320+0+0 (1962983453 0 0) 0xf27fc40
2010-11-22 20:20:34.285880 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 2 ==== PGq v1 ==== 7531+0+0 (4146886754 0 0) 0xf27f8c0
2010-11-22 20:20:34.295002 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 3 ==== PGnot v1 ==== 81320+0+0 (3652672661 0 0) 0xf27f000
2010-11-22 20:20:34.304692 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 4 ==== PGq v1 ==== 1034+0+0 (1435119767 0 0) 0xac2e380
2010-11-22 20:20:34.628956 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 5 ==== PGnot v1 ==== 81320+0+0 (388352774 0 0) 0xac2e1c0
2010-11-22 20:20:36.733526 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 6 ==== PGq v1 ==== 1034+0+0 (1256282998 0 0) 0xac2e000
2010-11-22 20:20:37.483897 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 7 ==== PGnot v1 ==== 77932+0+0 (4037181739 0 0) 0xc8d3700
2010-11-22 20:20:38.204298 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 8 ==== PGq v1 ==== 1253+0+0 (3581570754 0 0) 0xc8d3c40
2010-11-22 20:20:38.235228 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 9 ==== PGnot v1 ==== 77932+0+0 (1585171465 0 0) 0x1423000
2010-11-22 20:20:38.451787 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 10 ==== PGq v1 ==== 1034+0+0 (525024447 0 0) 0xd6a3e00
2010-11-22 20:20:38.453663 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 11 ==== PGnot v1 ==== 77932+0+0 (571880610 0 0) 0xd6a38c0
2010-11-22 20:20:38.531262 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 12 ==== PGq v1 ==== 1034+0+0 (1056342172 0 0) 0xd6a3540
2010-11-22 20:20:38.532591 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 13 ==== PGnot v1 ==== 77932+0+0 (3670918644 0 0) 0x14231c0
2010-11-22 20:20:38.536986 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 14 ==== PGq v1 ==== 1034+0+0 (2106456282 0 0) 0x1423e00
2010-11-22 20:21:02.794440 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 16 ==== PGnot v1 ==== 81320+0+0 (3836752355 0 0) 0xac2ee00
2010-11-22 20:21:09.947090 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 17 ==== PGq v1 ==== 1472+0+0 (800805693 0 0) 0xfb5ba80
2010-11-22 20:21:11.604370 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 19 ==== PGnot v1 ==== 81320+0+0 (3001810855 0 0) 0xfb5bc40
2010-11-22 20:21:15.123799 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 20 ==== PGq v1 ==== 1034+0+0 (1130049304 0 0) 0x1423c40
2010-11-22 20:21:24.591352 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 22 ==== PGnot v1 ==== 81320+0+0 (2088331316 0 0) 0xa457540
2010-11-22 20:21:24.635701 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 23 ==== PGq v1 ==== 1034+0+0 (1546667257 0 0) 0xa457380
2010-11-22 20:21:24.773761 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 25 ==== PGnot v1 ==== 88096+0+0 (3812911271 0 0) 0x9dee380
2010-11-22 20:21:24.858066 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 26 ==== PGq v1 ==== 1910+0+0 (2415948325 0 0) 0x9deea80
2010-11-22 20:21:26.485898 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 28 ==== PGnot v1 ==== 88096+0+0 (1030804688 0 0) 0x1438700
2010-11-22 20:21:28.705036 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 29 ==== PGq v1 ==== 1034+0+0 (3670455413 0 0) 0x9c2a700
2010-11-22 20:21:28.956676 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 31 ==== PGnot v1 ==== 88096+0+0 (3471199970 0 0) 0xac2ec40
2010-11-22 20:21:29.039638 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 32 ==== PGq v1 ==== 1034+0+0 (2262214610 0 0) 0x9e0b8c0
2010-11-22 20:21:29.237313 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 34 ==== PGnot v1 ==== 88096+0+0 (2230704847 0 0) 0x9e0b700
2010-11-22 20:21:29.346235 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 35 ==== PGq v1 ==== 1034+0+0 (2579352627 0 0) 0x9dee540
2010-11-22 20:21:29.358915 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 37 ==== PGnot v1 ==== 88096+0+0 (1522624184 0 0) 0x9dee000
2010-11-22 20:21:29.359629 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 38 ==== PGq v1 ==== 1034+0+0 (3087036432 0 0) 0x9deec40
2010-11-22 20:21:29.490679 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 40 ==== PGnot v1 ==== 88096+0+0 (282260117 0 0) 0x9450000
2010-11-22 20:21:29.491446 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 41 ==== PGq v1 ==== 1034+0+0 (2808814577 0 0) 0x9450700
2010-11-22 20:21:29.596578 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 43 ==== PGnot v1 ==== 84708+0+0 (2988394254 0 0) 0x94501c0
2010-11-22 20:21:29.597807 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 44 ==== PGq v1 ==== 1253+0+0 (738459087 0 0) 0xb63c700
2010-11-22 20:21:29.650756 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 46 ==== PGnot v1 ==== 84708+0+0 (2518517577 0 0) 0xa42b540
2010-11-22 20:21:29.653784 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 47 ==== PGq v1 ==== 1034+0+0 (3960599646 0 0) 0xa209700
2010-11-22 20:21:29.655188 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 49 ==== PGnot v1 ==== 84708+0+0 (4196059008 0 0) 0xa209000
2010-11-22 20:21:29.767711 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 50 ==== PGq v1 ==== 1034+0+0 (3450620029 0 0) 0xa209380
2010-11-22 20:21:29.778391 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 52 ==== PGnot v1 ==== 84708+0+0 (3726116807 0 0) 0xa209e00
2010-11-22 20:21:29.903612 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 53 ==== PGq v1 ==== 1034+0+0 (3536266140 0 0) 0xf987000
2010-11-22 20:21:30.069159 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 55 ==== PGnot v1 ==== 84708+0+0 (1779549852 0 0) 0xf9878c0
2010-11-22 20:21:30.100783 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 56 ==== PGq v1 ==== 1034+0+0 (2952914937 0 0) 0xf987c40
2010-11-22 20:21:36.662288 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 58 ==== PGnot v1 ==== 77932+0+0 (3981915097 0 0) 0x9e0b380
2010-11-22 20:21:36.678727 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 59 ==== PGq v1 ==== 1910+0+0 (3936879213 0 0) 0xcf601c0
2010-11-22 20:21:38.587076 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 61 ==== PGnot v1 ==== 77932+0+0 (1700880175 0 0) 0xcf60000
2010-11-22 20:21:38.591007 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 62 ==== PGq v1 ==== 1034+0+0 (136429751 0 0) 0x1009bc40
2010-11-22 20:21:58.443078 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 64 ==== PGnot v1 ==== 84708+0+0 (4237639730 0 0) 0x1009ba80
2010-11-22 20:21:59.197706 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 65 ==== PGq v1 ==== 1910+0+0 (2184682327 0 0) 0x1009b1c0
2010-11-22 20:21:59.462476 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 67 ==== PGnot v1 ==== 84708+0+0 (2425487611 0 0) 0x1009b000
2010-11-22 20:21:59.568952 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 68 ==== PGq v1 ==== 1034+0+0 (922163061 0 0) 0x1009b380
2010-11-22 20:22:01.859184 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 70 ==== PGnot v1 ==== 84708+0+0 (3029385404 0 0) 0xcf61380
2010-11-22 20:22:02.221391 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 71 ==== PGq v1 ==== 1034+0+0 (698121364 0 0) 0x9967540
2010-11-22 20:22:02.304591 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 73 ==== PGnot v1 ==== 84708+0+0 (1218216297 0 0) 0x9967000
2010-11-22 20:22:02.336410 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 74 ==== PGq v1 ==== 1034+0+0 (1972137779 0 0) 0xf27fc40
2010-11-22 20:22:02.502141 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 76 ==== PGnot v1 ==== 84708+0+0 (1822310702 0 0) 0xfb5b1c0
2010-11-22 20:22:03.545107 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 77 ==== PGq v1 ==== 1034+0+0 (1793524946 0 0) 0xa7ace00
2010-11-22 20:22:03.808792 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 79 ==== PGnot v1 ==== 84708+0+0 (10027495 0 0) 0xd045e00
2010-11-22 20:22:03.881424 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 80 ==== PGq v1 ==== 1034+0+0 (1264236785 0 0) 0xd045000
2010-11-22 20:22:19.241677 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 82 ==== PGnot v1 ==== 94256+0+0 (436076725 0 0) 0xd045700
2010-11-22 20:22:21.450929 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 83 ==== PGq v1 ==== 2129+0+0 (2025701445 0 0) 0xd045c40
2010-11-22 20:22:25.171025 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 85 ==== PGnot v1 ==== 84708+0+0 (413364937 0 0) 0xd045380
2010-11-22 20:22:26.121965 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 86 ==== PGq v1 ==== 1983+0+0 (1191598700 0 0) 0xd0451c0
2010-11-22 20:22:26.379587 7f8beb520710 -- [2001:16f8:10:2::c3c3:2e3a]:6803/26653 <== osd0 [2001:16f8:10:2::c3c3:8f6b]:6803/31146 88 ==== PGnot v1 ==== 84708+0+0 (3836687132 0 0) 0xa393380

Actions #2

Updated by Colin McCabe over 13 years ago

  • Status changed from New to Resolved

There were two problems here:

1) we were restarting the osds before the monitors, which in this case prevented a fix to the messenger code from taking effect

2) there were some issues with the recovery code that led to "endless peering" that were fixed recently.

This should be resolved.

Actions

Also available in: Atom PDF