### OSD0: OSD::handle_pg_create for PG 0.5 on osd0 ### Here, we can see that PG 0.5 is originally created with OSD0 as the primary and OSD1 as a replica. 2010-11-18 15:03:05.123870 7f7d89ea6710 osd0 2 mkpg 0.5 e1 2010-11-18 15:03:05.123885 7f7d89ea6710 osd0 2 project_pg_history 0.5 from 1 to 2, start ec=0 les=0 0/0/0 2010-11-18 15:03:05.123905 7f7d89ea6710 osd0 2 project_pg_history 0.5 changed in 2 from [] -> [0] 2010-11-18 15:03:05.123912 7f7d89ea6710 osd0 2 project_pg_history 0.5 changed in 2 from [] -> [0] 2010-11-18 15:03:05.123919 7f7d89ea6710 osd0 2 project_pg_history 0.5 primary changed in 2 2010-11-18 15:03:05.123925 7f7d89ea6710 osd0 2 project_pg_history end ec=0 les=0 2/2/2 2010-11-18 15:03:05.123935 7f7d89ea6710 osd0 2 calc_priors_during 0.5 [1,2) 2010-11-18 15:03:05.123953 7f7d89ea6710 osd0 2 0.5 in epoch 1 was [] 2010-11-18 15:03:05.123960 7f7d89ea6710 osd0 2 calc_priors_during 0.5 [1,2) = 2010-11-18 15:03:05.123967 7f7d89ea6710 osd0 2 mkpg 0.5 e1 h ec=0 les=0 2/2/2 : querying priors 2010-11-18 15:03:05.123976 7f7d89ea6710 osd0 2 can_create_pg 0.5 - can create now 2010-11-18 15:03:05.123986 7f7d89ea6710 osd0 2 _create_lock_new_pg pgid 0.5 -> [0] 2010-11-18 15:03:05.123993 7f7d89ea6710 osd0 2 _open_lock_pg 0.5 2010-11-18 15:03:05.123999 7f7d89ea6710 osd0 2 _get_pool 0 5 -> 6 2010-11-18 15:03:05.124052 7f7d89ea6710 filestore(dev/osd0) collection_stat dev/osd0/current/0.5_head = -2 2010-11-18 15:03:05.124082 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_info info 309 2010-11-18 15:03:05.124099 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_info bigbl 8 2010-11-18 15:03:05.124115 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_log 2010-11-18 15:03:05.124132 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_log to 0~0 2010-11-18 15:03:05.124143 7f7d89ea6710 osd0 2 _create_lock_new_pg pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] 2010-11-18 15:03:05.124154 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] peer up [0], acting [0] 2010-11-18 15:03:05.124164 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] clear_prior 2010-11-18 15:03:05.124175 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] build_prior have started since joining this pg 2010-11-18 15:03:05.124186 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] build_prior = down = ... 2010-11-18 15:03:05.124197 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] peer prior_set is 2010-11-18 15:03:05.124207 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] num_missing = 0, num_unfound = 0 2010-11-18 15:03:05.124218 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] up_thru 0 < same_since 2, must notify monitor 2010-11-18 15:03:05.124228 7f7d89ea6710 osd0 2 queue_want_up_thru want 2 <= queued 2, currently 0 2010-11-18 15:03:05.124238 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] update_stats 2'1 ### osd1: OSD::project_pg_history for PG 0.5 2010-11-18 15:04:59.732991 7fe83b066710 osd1 6 project_pg_history 0.5 from 3 to 6, start ec=2 les=2 3/3/2 2010-11-18 15:04:59.733018 7fe83b066710 osd1 6 project_pg_history end ec=2 les=2 3/3/2 2010-11-18 15:04:59.733026 7fe83b066710 osd1 6 _create_lock_pg 0.5 2010-11-18 15:04:59.733032 7fe83b066710 osd1 6 _open_lock_pg 0.5 2010-11-18 15:04:59.733038 7fe83b066710 osd1 6 _get_pool 0 5 -> 6 2010-11-18 15:04:59.733098 7fe83b066710 filestore(dev/osd1) collection_stat dev/osd1/current/0.5_head = -2 2010-11-18 15:04:59.733125 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_info info 309 2010-11-18 15:04:59.733143 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_info bigbl 8 2010-11-18 15:04:59.733157 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_log 2010-11-18 15:04:59.733179 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_log to 0~0 2010-11-18 15:04:59.733208 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_info info 309 2010-11-18 15:04:59.733226 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_info bigbl 8 2010-11-18 15:04:59.733241 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_log 2010-11-18 15:04:59.733258 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_log to 0~0 2010-11-18 15:04:59.733287 7fe83b066710 filestore(dev/osd1) queue_transactions (writeahead) 35 0x1716000 2010-11-18 15:04:59.733296 7fe83b066710 journal journal_transactions 35 2010-11-18 15:04:59.733312 7fe83b066710 journal submit_entry seq 35 len 1216 (0x16f2aa0) 2010-11-18 15:04:59.733332 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] dne (before), but i am role 1 2010-11-18 15:04:59.733353 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] sending info ### Hmm. We are sending info from osd1 to osd0 here. This happens inside OSD::handle_pg_query ### OSD1: OSD::do_notifies ### The comment for this function says: ### Send an MOSDPGNotify to a primary, with a list of PGs that I have content for, and they are primary for. ### So osd1 is notifying osd0 about the PGs here! osd0 must be the primary for 0.5. 2010-11-18 15:04:59.739228 7fe83b066710 osd1 6 do_notify osd0 on 14 PGs 2010-11-18 15:04:59.739239 7fe83b066710 -- 10.3.14.10:6802/29157 --> osd0 10.3.14.10:6800/29113 -- PGnot v1 -- ?+0 0x17b1540 ### OSD0 is marked as 'out' + ./ceph osd out 0 2010-11-18 15:05:00.809554 mon <- [osd,out,0] 2010-11-18 15:05:01.606338 mon0 -> 'marked out osd0' (0) ### Entering epoch 14, the final epoch. ### Epoch 14 stats: created 2010-11-18 15:02:44.930075 modifed 2010-11-18 15:05:28.319869 max_osd 3 osd0 up outup_from 2 up_thru 3 down_at 0 last_clean_interval 0-0 10.3.14.10:6800/29113 10.3.14.10:6801/29113 osd1 up in weight 1up_from 10 up_thru 11 down_at 8 last_clean_interval 3-9 10.3.14.10:6802/29157 10.3.14.10:6803/29157 osd2 up in weight 1up_from 4 up_thru 11 down_at 0 last_clean_interval 0-0 10.3.14.10:6804/29550 10.3.14.10:6805/29550 ### OSD2 sends "some info" to osd1 in response to its MOSDPGQuery ### This info does *not* include PG 0.5, because OSD2 doesn't know about that PG. 2010-11-18 15:05:01.931574 7fd48dc51710 osd2 6 handle_pg_query from osd1 epoch 6 ... 2010-11-18 15:05:01.931950 7fd48dc51710 osd2 6 pg[0.7( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info 2010-11-18 15:05:01.932323 7fd48dc51710 osd2 6 pg[1.6( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info 2010-11-18 15:05:01.933848 7fd48dc51710 osd2 6 pg[2.5( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info 2010-11-18 15:05:01.934229 7fd48dc51710 osd2 6 pg[3.4( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info 2010-11-18 15:05:01.934860 7fd48dc51710 osd2 6 pg[0.0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.935237 7fd48dc51710 osd2 6 pg[0.1( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.935613 7fd48dc51710 osd2 6 pg[0.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.935965 7fd48dc51710 osd2 6 pg[0.b( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.936346 7fd48dc51710 osd2 6 pg[0.0p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.937602 7fd48dc51710 osd2 6 pg[0.1p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.937973 7fd48dc51710 osd2 6 pg[1.0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.938332 7fd48dc51710 osd2 6 pg[1.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.938687 7fd48dc51710 osd2 6 pg[1.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.939052 7fd48dc51710 osd2 6 pg[1.b( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.939417 7fd48dc51710 osd2 6 pg[1.0p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.939768 7fd48dc51710 osd2 6 pg[2.8( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.940120 7fd48dc51710 osd2 6 pg[2.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.940471 7fd48dc51710 osd2 6 pg[2.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.940831 7fd48dc51710 osd2 6 pg[3.7( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.941188 7fd48dc51710 osd2 6 pg[3.8( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.941542 7fd48dc51710 osd2 6 pg[3.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info 2010-11-18 15:05:01.941916 7fd48dc51710 osd2 6 pg[3.1p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info ### OSD2: OSD::do_mon_report ### OSD2: OSD::send_pg_stats ### Things to notice here: ### 1) There is no record of PG 0.5 here. ### We don't know about it since, as we've seen, OSD 1 sent the PG information to OSD0 instead. 2010-11-18 15:05:15.361342 7fd49245a710 osd2 9 send_pg_stats 2010-11-18 15:05:15.361378 7fd49245a710 osd2 9 send_pg_stats - 42 pgs updated 2010-11-18 15:05:15.361397 7fd49245a710 osd2 9 sending 3.1p2 4'4 2010-11-18 15:05:15.361412 7fd49245a710 osd2 9 sending 2.0p2 4'4 2010-11-18 15:05:15.361426 7fd49245a710 osd2 9 sending 3.0p2 4'4 2010-11-18 15:05:15.361440 7fd49245a710 osd2 9 sending 2.1p2 4'4 2010-11-18 15:05:15.361457 7fd49245a710 osd2 9 sending 1.1p2 4'4 2010-11-18 15:05:15.361471 7fd49245a710 osd2 9 sending 0.0p2 4'4 2010-11-18 15:05:15.361485 7fd49245a710 osd2 9 sending 1.0p2 4'4 2010-11-18 15:05:15.361499 7fd49245a710 osd2 9 sending 0.1p2 4'4 2010-11-18 15:05:15.361512 7fd49245a710 osd2 9 sending 1.8 4'4 2010-11-18 15:05:15.361529 7fd49245a710 osd2 9 sending 0.9 4'4 2010-11-18 15:05:15.361542 7fd49245a710 osd2 9 sending 0.8 4'4 2010-11-18 15:05:15.361556 7fd49245a710 osd2 9 sending 3.5 4'4 2010-11-18 15:05:15.361571 7fd49245a710 osd2 9 sending 2.4 4'4 2010-11-18 15:05:15.361585 7fd49245a710 osd2 9 sending 1.7 4'4 2010-11-18 15:05:15.361598 7fd49245a710 osd2 9 sending 0.6 4'4 2010-11-18 15:05:15.361612 7fd49245a710 osd2 9 sending 3.6 4'4 2010-11-18 15:05:15.361626 7fd49245a710 osd2 9 sending 2.7 4'4 2010-11-18 15:05:15.361639 7fd49245a710 osd2 9 sending 2.6 4'4 2010-11-18 15:05:15.361653 7fd49245a710 osd2 9 sending 1.5 4'4 2010-11-18 15:05:15.361666 7fd49245a710 osd2 9 sending 3.3 4'4 2010-11-18 15:05:15.361681 7fd49245a710 osd2 9 sending 0.0p0 8'1 2010-11-18 15:05:15.361696 7fd49245a710 osd2 9 sending 1.0p0 8'1 2010-11-18 15:05:15.361710 7fd49245a710 osd2 9 sending 0.1p0 8'1 2010-11-18 15:05:15.361724 7fd49245a710 osd2 9 sending 3.1p0 8'1 2010-11-18 15:05:15.361738 7fd49245a710 osd2 9 sending 3.8 8'1 2010-11-18 15:05:15.361752 7fd49245a710 osd2 9 sending 2.9 8'1 2010-11-18 15:05:15.361766 7fd49245a710 osd2 9 sending 1.a 8'1 2010-11-18 15:05:15.361782 7fd49245a710 osd2 9 sending 0.b 8'1 2010-11-18 15:05:15.361796 7fd49245a710 osd2 9 sending 3.9 8'1 2010-11-18 15:05:15.361811 7fd49245a710 osd2 9 sending 2.8 8'1 2010-11-18 15:05:15.361824 7fd49245a710 osd2 9 sending 1.b 8'1 2010-11-18 15:05:15.361839 7fd49245a710 osd2 9 sending 0.a 8'1 2010-11-18 15:05:15.361852 7fd49245a710 osd2 9 sending 2.a 8'1 2010-11-18 15:05:15.361866 7fd49245a710 osd2 9 sending 1.9 8'1 2010-11-18 15:05:15.361881 7fd49245a710 osd2 9 sending 3.4 8'1 2010-11-18 15:05:15.361895 7fd49245a710 osd2 9 sending 2.5 8'1 2010-11-18 15:05:15.361909 7fd49245a710 osd2 9 sending 1.6 8'1 2010-11-18 15:05:15.361923 7fd49245a710 osd2 9 sending 0.7 8'1 2010-11-18 15:05:15.361937 7fd49245a710 osd2 9 sending 3.7 8'1 2010-11-18 15:05:15.361953 7fd49245a710 osd2 9 sending 1.0 8'1 2010-11-18 15:05:15.361967 7fd49245a710 osd2 9 sending 0.1 8'1 2010-11-18 15:05:15.361991 7fd49245a710 osd2 9 sending 0.0 8'1 2010-11-18 15:05:15.362001 7fd49245a710 monclient: _send_mon_message to mon.b at mon1 10.3.14.10:6790/0 2010-11-18 15:05:15.362010 7fd49245a710 -- 10.3.14.10:6804/29550 --> mon1 10.3.14.10:6790/0 -- pg_stats(42 pgs v 9) v1 -- ?+0 0x2ea1000 2010-11-18 15:05:15.362213 7fd48dc51710 osd2 9 do_waiters -- start 2010-11-18 15:05:15.362238 7fd48dc51710 osd2 9 _dispatch 0x29f0b40 osd_op(mds0.1:5 1.00000000.inode [writefull 0~425] 1.e14 RETRY) v1 2010-11-18 15:05:15.362260 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f0b40 2010-11-18 15:05:15.362274 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8 2010-11-18 15:05:15.362298 7fd48dc51710 osd2 9 hit non-existent pg 1.4, waiting 2010-11-18 15:05:15.362544 7fd48dc51710 osd2 9 hit non-existent pg 1.4, waiting 2010-11-18 15:05:15.362553 7fd48dc51710 osd2 9 _dispatch 0x29f06c0 osd_op(mds0.1:6 600.00000000 [tmapput 0~205,setxattr path (11),setxattr parent (26)] 1.41b0 RETRY) v1 2010-11-18 15:05:15.362572 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f06c0 2010-11-18 15:05:15.362583 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8 2010-11-18 15:05:15.362594 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0 2010-11-18 15:05:15.362604 7fd48dc51710 osd2 9 pg[1.0( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet) 2010-11-18 15:05:15.362619 7fd48dc51710 osd2 9 _dispatch 0x2a146c0 osd_op(mds0.1:7 100.00000000 [tmapput 0~903,setxattr path (5)] 1.5ab3 RETRY) v1 2010-11-18 15:05:15.362634 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2a146c0 2010-11-18 15:05:15.362644 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8 2010-11-18 15:05:15.362653 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting 2010-11-18 15:05:15.362662 7fd48dc51710 osd2 9 _dispatch 0x29f4900 osd_op(mds1.1:3 601.00000000 [tmapput 0~205,setxattr path (11),setxattr parent (26)] 1.290 RETRY) v1 2010-11-18 15:05:15.362678 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f4900 2010-11-18 15:05:15.362689 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8 2010-11-18 15:05:15.362700 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0 2010-11-18 15:05:15.362718 7fd48dc51710 osd2 9 pg[1.0( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet) 2010-11-18 15:05:15.362733 7fd48dc51710 osd2 9 _dispatch 0x2a14d80 osd_op(mds1.1:6 mds1_inotable [writefull 0~29] 1.75db RETRY) v1 2010-11-18 15:05:15.362745 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2a14d80 2010-11-18 15:05:15.362755 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8 2010-11-18 15:05:15.362767 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0 2010-11-18 15:05:15.362775 7fd48dc51710 osd2 9 pg[1.b( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet) 2010-11-18 15:05:15.362789 7fd48dc51710 osd2 9 _dispatch 0x2d246c0 osd_op(mds1.1:4 101.00000000 [tmapput 0~903,setxattr path (5)] 1.4883 RETRY) v1 2010-11-18 15:05:15.362803 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d246c0 2010-11-18 15:05:15.362813 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8 2010-11-18 15:05:15.362822 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting 2010-11-18 15:05:15.362831 7fd48dc51710 osd2 9 _dispatch 0x2d24480 osd_op(mds1.1:2 201.00000001 [write 0~107] 1.a2db RETRY) v1 2010-11-18 15:05:15.362843 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d24480 2010-11-18 15:05:15.362853 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8 2010-11-18 15:05:15.362864 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0 2010-11-18 15:05:15.362872 7fd48dc51710 osd2 9 pg[1.b( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet) 2010-11-18 15:05:15.362887 7fd48dc51710 osd2 9 _dispatch 0x2d24240 osd_op(mds1.1:7 mds1_sessionmap [writefull 0~17] 1.fe43 RETRY) v1 2010-11-18 15:05:15.362898 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d24240 2010-11-18 15:05:15.362909 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8 2010-11-18 15:05:15.362918 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting 2010-11-18 15:05:15.362926 7fd48dc51710 osd2 9 _dispatch 0x2d24d80 osd_op(client4113.0:1 obj01 [writefull 0~1000000] 0.92bd) v1 2010-11-18 15:05:15.362938 7fd48dc51710 osd2 9 require_same_or_newer_map 9 (i am 9) 0x2d24d80 2010-11-18 15:05:15.362949 7fd48dc51710 osd2 9 _share_map_incoming client4113 10.3.14.10:0/1691 9 2010-11-18 15:05:15.362959 7fd48dc51710 osd2 9 hit non-existent pg 0.5, waiting 2010-11-18 15:05:15.362968 7fd48dc51710 osd2 9 do_waiters -- finish ### OSD1 : do_peer for PG 0.5 2010-11-18 15:05:17.751121 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] got osd0 info 0.5( empty n=0 ec=2 les=2 10/10/10) 2010-11-18 15:05:17.751147 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] osd0 has stray content: 0.5( empty n=0 ec=2 les=2 10/10/10) 2010-11-18 15:05:17.751161 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] peer up [1,2], acting [1,2] 2010-11-18 15:05:17.751172 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] peer prior_set is 0,2 2010-11-18 15:05:17.751183 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] recover_master_log 2010-11-18 15:05:17.751193 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] have info from osd0: 0.5( empty n=0 ec=2 les=2 10/10/10) ### OSD1: This is from PG::recover_master_log. ### OSD1: We're waiting for a response from osd2. This response NEVER COMES. ### Ok. time to find out if OSD2 ever received this query. 2010-11-18 15:05:17.751206 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] waiting for osd2 2010-11-18 15:05:17.751219 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] update_stats 10'2 2010-11-18 15:05:17.751244 7fe83b066710 filestore(dev/osd1) queue_transactions (writeahead) 165 0x13f41c0 2010-11-18 15:05:17.751252 7fe83b066710 journal journal_transactions 165 2010-11-18 15:05:17.751261 7fe83b066710 journal submit_entry seq 165 len 33 (0x1361960) ### OSD2: radostool blocks waiting for PG 0.5 ### OSD2: Note: this is the *first* reference to PG 0.5 *ever* in the log for osd2 2010-11-18 15:05:33.161056 7fd48dc51710 osd2 14 _dispatch 0x2d24d80 osd_op(client4113.0:1 obj01 [writefull 0~1000000] 0.92bd) v1 2010-11-18 15:05:33.161069 7fd48dc51710 osd2 14 require_same_or_newer_map 9 (i am 14) 0x2d24d80 2010-11-18 15:05:33.161079 7fd48dc51710 osd2 14 _share_map_incoming client4113 10.3.14.10:0/1691 9 2010-11-18 15:05:33.161088 7fd48dc51710 osd2 14 client4113 has old map 9 < 14 2010-11-18 15:05:33.161094 7fd48dc51710 osd2 14 send_incremental_map 9 - > 14 to client4113 10.3.14.10:0/1691 2010-11-18 15:05:33.161131 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.14_0 0~116 = 116 2010-11-18 15:05:33.161152 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.13_0 0~116 = 116 2010-11-18 15:05:33.161170 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.12_0 0~116 = 116 2010-11-18 15:05:33.161187 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.11_0 0~116 = 116 2010-11-18 15:05:33.161214 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.10_0 0~424 = 424 2010-11-18 15:05:33.161224 7fd48dc51710 -- 10.3.14.10:6804/29550 --> client4113 10.3.14.10:0/1691 -- osd_map(10,14) v1 -- ?+0 0x2e94000 ### OSD2: This is in OSD::handle_op ### OSD2: But handle_op can hardly be expected to do anything else. The real bug must be earlier on. Why can't we ever get out of peering? 2010-11-18 15:05:33.161260 7fd48dc51710 osd2 14 hit non-existent pg 0.5, waiting ### Final PG map cmccabe@flab:~/src/ceph2/src$ ./ceph pg map 0.5 2010-11-18 21:19:46.532101 mon <- [pg,map,0.5] 2010-11-18 21:19:46.532658 mon1 -> 'osdmap e14 pg 0.5 -> up [1,2] acting [1,2]' (0)