Project

General

Profile

Bug #592 » notes.txt

Colin McCabe, 11/18/2010 09:50 PM

 
### OSD0: OSD::handle_pg_create for PG 0.5 on osd0
### Here, we can see that PG 0.5 is originally created with OSD0 as the primary and OSD1 as a replica.
2010-11-18 15:03:05.123870 7f7d89ea6710 osd0 2 mkpg 0.5 e1
2010-11-18 15:03:05.123885 7f7d89ea6710 osd0 2 project_pg_history 0.5 from 1 to 2, start ec=0 les=0 0/0/0
2010-11-18 15:03:05.123905 7f7d89ea6710 osd0 2 project_pg_history 0.5 changed in 2 from [] -> [0]
2010-11-18 15:03:05.123912 7f7d89ea6710 osd0 2 project_pg_history 0.5 changed in 2 from [] -> [0]
2010-11-18 15:03:05.123919 7f7d89ea6710 osd0 2 project_pg_history 0.5 primary changed in 2
2010-11-18 15:03:05.123925 7f7d89ea6710 osd0 2 project_pg_history end ec=0 les=0 2/2/2
2010-11-18 15:03:05.123935 7f7d89ea6710 osd0 2 calc_priors_during 0.5 [1,2)
2010-11-18 15:03:05.123953 7f7d89ea6710 osd0 2 0.5 in epoch 1 was []
2010-11-18 15:03:05.123960 7f7d89ea6710 osd0 2 calc_priors_during 0.5 [1,2) =
2010-11-18 15:03:05.123967 7f7d89ea6710 osd0 2 mkpg 0.5 e1 h ec=0 les=0 2/2/2 : querying priors
2010-11-18 15:03:05.123976 7f7d89ea6710 osd0 2 can_create_pg 0.5 - can create now
2010-11-18 15:03:05.123986 7f7d89ea6710 osd0 2 _create_lock_new_pg pgid 0.5 -> [0]
2010-11-18 15:03:05.123993 7f7d89ea6710 osd0 2 _open_lock_pg 0.5
2010-11-18 15:03:05.123999 7f7d89ea6710 osd0 2 _get_pool 0 5 -> 6
2010-11-18 15:03:05.124052 7f7d89ea6710 filestore(dev/osd0) collection_stat dev/osd0/current/0.5_head = -2
2010-11-18 15:03:05.124082 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_info info 309
2010-11-18 15:03:05.124099 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_info bigbl 8
2010-11-18 15:03:05.124115 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_log
2010-11-18 15:03:05.124132 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] write_log to 0~0
2010-11-18 15:03:05.124143 7f7d89ea6710 osd0 2 _create_lock_new_pg pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive]
2010-11-18 15:03:05.124154 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 inactive] peer up [0], acting [0]
2010-11-18 15:03:05.124164 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] clear_prior
2010-11-18 15:03:05.124175 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] build_prior have started since joining this pg
2010-11-18 15:03:05.124186 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] build_prior = down = ...
2010-11-18 15:03:05.124197 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] peer prior_set is
2010-11-18 15:03:05.124207 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] num_missing = 0, num_unfound = 0
2010-11-18 15:03:05.124218 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] up_thru 0 < same_since 2, must notify monitor
2010-11-18 15:03:05.124228 7f7d89ea6710 osd0 2 queue_want_up_thru want 2 <= queued 2, currently 0
2010-11-18 15:03:05.124238 7f7d89ea6710 osd0 2 pg[0.5( empty n=0 ec=2 les=2 2/2/2) [0] r=0 mlcod 0'0 peering] update_stats 2'1

### osd1: OSD::project_pg_history for PG 0.5
2010-11-18 15:04:59.732991 7fe83b066710 osd1 6 project_pg_history 0.5 from 3 to 6, start ec=2 les=2 3/3/2
2010-11-18 15:04:59.733018 7fe83b066710 osd1 6 project_pg_history end ec=2 les=2 3/3/2
2010-11-18 15:04:59.733026 7fe83b066710 osd1 6 _create_lock_pg 0.5
2010-11-18 15:04:59.733032 7fe83b066710 osd1 6 _open_lock_pg 0.5
2010-11-18 15:04:59.733038 7fe83b066710 osd1 6 _get_pool 0 5 -> 6
2010-11-18 15:04:59.733098 7fe83b066710 filestore(dev/osd1) collection_stat dev/osd1/current/0.5_head = -2
2010-11-18 15:04:59.733125 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_info info 309
2010-11-18 15:04:59.733143 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_info bigbl 8
2010-11-18 15:04:59.733157 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_log
2010-11-18 15:04:59.733179 7fe83b066710 osd1 6 pg[0.5( DNE empty n=0 ec=0 les=0 0/0/0) [] r=0 mlcod 0'0 inactive] write_log to 0~0
2010-11-18 15:04:59.733208 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_info info 309
2010-11-18 15:04:59.733226 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_info bigbl 8
2010-11-18 15:04:59.733241 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_log
2010-11-18 15:04:59.733258 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] write_log to 0~0
2010-11-18 15:04:59.733287 7fe83b066710 filestore(dev/osd1) queue_transactions (writeahead) 35 0x1716000
2010-11-18 15:04:59.733296 7fe83b066710 journal journal_transactions 35
2010-11-18 15:04:59.733312 7fe83b066710 journal submit_entry seq 35 len 1216 (0x16f2aa0)
2010-11-18 15:04:59.733332 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] dne (before), but i am role 1
2010-11-18 15:04:59.733353 7fe83b066710 osd1 6 pg[0.5( empty n=0 ec=2 les=2 3/3/2) [0,1] r=1 inactive] sending info
### Hmm. We are sending info from osd1 to osd0 here. This happens inside OSD::handle_pg_query

### OSD1: OSD::do_notifies
### The comment for this function says:
### Send an MOSDPGNotify to a primary, with a list of PGs that I have content for, and they are primary for.
### So osd1 is notifying osd0 about the PGs here! osd0 must be the primary for 0.5.
2010-11-18 15:04:59.739228 7fe83b066710 osd1 6 do_notify osd0 on 14 PGs
2010-11-18 15:04:59.739239 7fe83b066710 -- 10.3.14.10:6802/29157 --> osd0 10.3.14.10:6800/29113 -- PGnot v1 -- ?+0 0x17b1540

### OSD0 is marked as 'out'
+ ./ceph osd out 0
2010-11-18 15:05:00.809554 mon <- [osd,out,0]
2010-11-18 15:05:01.606338 mon0 -> 'marked out osd0' (0)

### Entering epoch 14, the final epoch.
### Epoch 14 stats:
created 2010-11-18 15:02:44.930075
modifed 2010-11-18 15:05:28.319869
max_osd 3
osd0 up outup_from 2 up_thru 3 down_at 0 last_clean_interval 0-0 10.3.14.10:6800/29113 10.3.14.10:6801/29113
osd1 up in weight 1up_from 10 up_thru 11 down_at 8 last_clean_interval 3-9 10.3.14.10:6802/29157 10.3.14.10:6803/29157
osd2 up in weight 1up_from 4 up_thru 11 down_at 0 last_clean_interval 0-0 10.3.14.10:6804/29550 10.3.14.10:6805/29550

### OSD2 sends "some info" to osd1 in response to its MOSDPGQuery
### This info does *not* include PG 0.5, because OSD2 doesn't know about that PG.
2010-11-18 15:05:01.931574 7fd48dc51710 osd2 6 handle_pg_query from osd1 epoch 6
...
2010-11-18 15:05:01.931950 7fd48dc51710 osd2 6 pg[0.7( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info
2010-11-18 15:05:01.932323 7fd48dc51710 osd2 6 pg[1.6( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info
2010-11-18 15:05:01.933848 7fd48dc51710 osd2 6 pg[2.5( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info
2010-11-18 15:05:01.934229 7fd48dc51710 osd2 6 pg[3.4( empty n=0 ec=2 les=2 4/4/3) [1,2] r=1 inactive] sending info
2010-11-18 15:05:01.934860 7fd48dc51710 osd2 6 pg[0.0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.935237 7fd48dc51710 osd2 6 pg[0.1( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.935613 7fd48dc51710 osd2 6 pg[0.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.935965 7fd48dc51710 osd2 6 pg[0.b( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.936346 7fd48dc51710 osd2 6 pg[0.0p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.937602 7fd48dc51710 osd2 6 pg[0.1p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.937973 7fd48dc51710 osd2 6 pg[1.0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.938332 7fd48dc51710 osd2 6 pg[1.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.938687 7fd48dc51710 osd2 6 pg[1.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.939052 7fd48dc51710 osd2 6 pg[1.b( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.939417 7fd48dc51710 osd2 6 pg[1.0p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.939768 7fd48dc51710 osd2 6 pg[2.8( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.940120 7fd48dc51710 osd2 6 pg[2.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.940471 7fd48dc51710 osd2 6 pg[2.a( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.940831 7fd48dc51710 osd2 6 pg[3.7( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.941188 7fd48dc51710 osd2 6 pg[3.8( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.941542 7fd48dc51710 osd2 6 pg[3.9( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info
2010-11-18 15:05:01.941916 7fd48dc51710 osd2 6 pg[3.1p0( empty n=0 ec=2 les=2 4/4/2) [0,2] r=1 inactive] sending info

### OSD2: OSD::do_mon_report
### OSD2: OSD::send_pg_stats
### Things to notice here:
### 1) There is no record of PG 0.5 here.
### We don't know about it since, as we've seen, OSD 1 sent the PG information to OSD0 instead.
2010-11-18 15:05:15.361342 7fd49245a710 osd2 9 send_pg_stats
2010-11-18 15:05:15.361378 7fd49245a710 osd2 9 send_pg_stats - 42 pgs updated
2010-11-18 15:05:15.361397 7fd49245a710 osd2 9 sending 3.1p2 4'4
2010-11-18 15:05:15.361412 7fd49245a710 osd2 9 sending 2.0p2 4'4
2010-11-18 15:05:15.361426 7fd49245a710 osd2 9 sending 3.0p2 4'4
2010-11-18 15:05:15.361440 7fd49245a710 osd2 9 sending 2.1p2 4'4
2010-11-18 15:05:15.361457 7fd49245a710 osd2 9 sending 1.1p2 4'4
2010-11-18 15:05:15.361471 7fd49245a710 osd2 9 sending 0.0p2 4'4
2010-11-18 15:05:15.361485 7fd49245a710 osd2 9 sending 1.0p2 4'4
2010-11-18 15:05:15.361499 7fd49245a710 osd2 9 sending 0.1p2 4'4
2010-11-18 15:05:15.361512 7fd49245a710 osd2 9 sending 1.8 4'4
2010-11-18 15:05:15.361529 7fd49245a710 osd2 9 sending 0.9 4'4
2010-11-18 15:05:15.361542 7fd49245a710 osd2 9 sending 0.8 4'4
2010-11-18 15:05:15.361556 7fd49245a710 osd2 9 sending 3.5 4'4
2010-11-18 15:05:15.361571 7fd49245a710 osd2 9 sending 2.4 4'4
2010-11-18 15:05:15.361585 7fd49245a710 osd2 9 sending 1.7 4'4
2010-11-18 15:05:15.361598 7fd49245a710 osd2 9 sending 0.6 4'4
2010-11-18 15:05:15.361612 7fd49245a710 osd2 9 sending 3.6 4'4
2010-11-18 15:05:15.361626 7fd49245a710 osd2 9 sending 2.7 4'4
2010-11-18 15:05:15.361639 7fd49245a710 osd2 9 sending 2.6 4'4
2010-11-18 15:05:15.361653 7fd49245a710 osd2 9 sending 1.5 4'4
2010-11-18 15:05:15.361666 7fd49245a710 osd2 9 sending 3.3 4'4
2010-11-18 15:05:15.361681 7fd49245a710 osd2 9 sending 0.0p0 8'1
2010-11-18 15:05:15.361696 7fd49245a710 osd2 9 sending 1.0p0 8'1
2010-11-18 15:05:15.361710 7fd49245a710 osd2 9 sending 0.1p0 8'1
2010-11-18 15:05:15.361724 7fd49245a710 osd2 9 sending 3.1p0 8'1
2010-11-18 15:05:15.361738 7fd49245a710 osd2 9 sending 3.8 8'1
2010-11-18 15:05:15.361752 7fd49245a710 osd2 9 sending 2.9 8'1
2010-11-18 15:05:15.361766 7fd49245a710 osd2 9 sending 1.a 8'1
2010-11-18 15:05:15.361782 7fd49245a710 osd2 9 sending 0.b 8'1
2010-11-18 15:05:15.361796 7fd49245a710 osd2 9 sending 3.9 8'1
2010-11-18 15:05:15.361811 7fd49245a710 osd2 9 sending 2.8 8'1
2010-11-18 15:05:15.361824 7fd49245a710 osd2 9 sending 1.b 8'1
2010-11-18 15:05:15.361839 7fd49245a710 osd2 9 sending 0.a 8'1
2010-11-18 15:05:15.361852 7fd49245a710 osd2 9 sending 2.a 8'1
2010-11-18 15:05:15.361866 7fd49245a710 osd2 9 sending 1.9 8'1
2010-11-18 15:05:15.361881 7fd49245a710 osd2 9 sending 3.4 8'1
2010-11-18 15:05:15.361895 7fd49245a710 osd2 9 sending 2.5 8'1
2010-11-18 15:05:15.361909 7fd49245a710 osd2 9 sending 1.6 8'1
2010-11-18 15:05:15.361923 7fd49245a710 osd2 9 sending 0.7 8'1
2010-11-18 15:05:15.361937 7fd49245a710 osd2 9 sending 3.7 8'1
2010-11-18 15:05:15.361953 7fd49245a710 osd2 9 sending 1.0 8'1
2010-11-18 15:05:15.361967 7fd49245a710 osd2 9 sending 0.1 8'1
2010-11-18 15:05:15.361991 7fd49245a710 osd2 9 sending 0.0 8'1
2010-11-18 15:05:15.362001 7fd49245a710 monclient: _send_mon_message to mon.b at mon1 10.3.14.10:6790/0
2010-11-18 15:05:15.362010 7fd49245a710 -- 10.3.14.10:6804/29550 --> mon1 10.3.14.10:6790/0 -- pg_stats(42 pgs v 9) v1 -- ?+0 0x2ea1000
2010-11-18 15:05:15.362213 7fd48dc51710 osd2 9 do_waiters -- start
2010-11-18 15:05:15.362238 7fd48dc51710 osd2 9 _dispatch 0x29f0b40 osd_op(mds0.1:5 1.00000000.inode [writefull 0~425] 1.e14 RETRY) v1
2010-11-18 15:05:15.362260 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f0b40
2010-11-18 15:05:15.362274 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8
2010-11-18 15:05:15.362298 7fd48dc51710 osd2 9 hit non-existent pg 1.4, waiting
2010-11-18 15:05:15.362544 7fd48dc51710 osd2 9 hit non-existent pg 1.4, waiting
2010-11-18 15:05:15.362553 7fd48dc51710 osd2 9 _dispatch 0x29f06c0 osd_op(mds0.1:6 600.00000000 [tmapput 0~205,setxattr path (11),setxattr parent (26)] 1.41b0 RETRY) v1
2010-11-18 15:05:15.362572 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f06c0
2010-11-18 15:05:15.362583 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8
2010-11-18 15:05:15.362594 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0
2010-11-18 15:05:15.362604 7fd48dc51710 osd2 9 pg[1.0( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet)
2010-11-18 15:05:15.362619 7fd48dc51710 osd2 9 _dispatch 0x2a146c0 osd_op(mds0.1:7 100.00000000 [tmapput 0~903,setxattr path (5)] 1.5ab3 RETRY) v1
2010-11-18 15:05:15.362634 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2a146c0
2010-11-18 15:05:15.362644 7fd48dc51710 osd2 9 _share_map_incoming mds0 10.3.14.10:6806/30589 8
2010-11-18 15:05:15.362653 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting
2010-11-18 15:05:15.362662 7fd48dc51710 osd2 9 _dispatch 0x29f4900 osd_op(mds1.1:3 601.00000000 [tmapput 0~205,setxattr path (11),setxattr parent (26)] 1.290 RETRY) v1
2010-11-18 15:05:15.362678 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x29f4900
2010-11-18 15:05:15.362689 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8
2010-11-18 15:05:15.362700 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0
2010-11-18 15:05:15.362718 7fd48dc51710 osd2 9 pg[1.0( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet)
2010-11-18 15:05:15.362733 7fd48dc51710 osd2 9 _dispatch 0x2a14d80 osd_op(mds1.1:6 mds1_inotable [writefull 0~29] 1.75db RETRY) v1
2010-11-18 15:05:15.362745 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2a14d80
2010-11-18 15:05:15.362755 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8
2010-11-18 15:05:15.362767 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0
2010-11-18 15:05:15.362775 7fd48dc51710 osd2 9 pg[1.b( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet)
2010-11-18 15:05:15.362789 7fd48dc51710 osd2 9 _dispatch 0x2d246c0 osd_op(mds1.1:4 101.00000000 [tmapput 0~903,setxattr path (5)] 1.4883 RETRY) v1
2010-11-18 15:05:15.362803 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d246c0
2010-11-18 15:05:15.362813 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8
2010-11-18 15:05:15.362822 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting
2010-11-18 15:05:15.362831 7fd48dc51710 osd2 9 _dispatch 0x2d24480 osd_op(mds1.1:2 201.00000001 [write 0~107] 1.a2db RETRY) v1
2010-11-18 15:05:15.362843 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d24480
2010-11-18 15:05:15.362853 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8
2010-11-18 15:05:15.362864 7fd48dc51710 osd2 9 request for pool=1 (metadata) owner=0 perm=7 may_read=0 may_write=1 may_exec=0 require_exec_caps=0
2010-11-18 15:05:15.362872 7fd48dc51710 osd2 9 pg[1.b( empty n=0 ec=2 les=2 8/8/8) [2] r=0 mlcod 0'0 !hml degraded+peering] not active (yet)
2010-11-18 15:05:15.362887 7fd48dc51710 osd2 9 _dispatch 0x2d24240 osd_op(mds1.1:7 mds1_sessionmap [writefull 0~17] 1.fe43 RETRY) v1
2010-11-18 15:05:15.362898 7fd48dc51710 osd2 9 require_same_or_newer_map 8 (i am 9) 0x2d24240
2010-11-18 15:05:15.362909 7fd48dc51710 osd2 9 _share_map_incoming mds1 10.3.14.10:6808/31526 8
2010-11-18 15:05:15.362918 7fd48dc51710 osd2 9 hit non-existent pg 1.3, waiting
2010-11-18 15:05:15.362926 7fd48dc51710 osd2 9 _dispatch 0x2d24d80 osd_op(client4113.0:1 obj01 [writefull 0~1000000] 0.92bd) v1
2010-11-18 15:05:15.362938 7fd48dc51710 osd2 9 require_same_or_newer_map 9 (i am 9) 0x2d24d80
2010-11-18 15:05:15.362949 7fd48dc51710 osd2 9 _share_map_incoming client4113 10.3.14.10:0/1691 9
2010-11-18 15:05:15.362959 7fd48dc51710 osd2 9 hit non-existent pg 0.5, waiting
2010-11-18 15:05:15.362968 7fd48dc51710 osd2 9 do_waiters -- finish

### OSD1 : do_peer for PG 0.5
2010-11-18 15:05:17.751121 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] got osd0 info 0.5( empty n=0 ec=2 les=2 10/10/10)
2010-11-18 15:05:17.751147 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] osd0 has stray content: 0.5( empty n=0 ec=2 les=2 10/10/10)
2010-11-18 15:05:17.751161 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] peer up [1,2], acting [1,2]
2010-11-18 15:05:17.751172 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] peer prior_set is 0,2
2010-11-18 15:05:17.751183 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] recover_master_log
2010-11-18 15:05:17.751193 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] have info from osd0: 0.5( empty n=0 ec=2 les=2 10/10/10)
### OSD1: This is from PG::recover_master_log.
### OSD1: We're waiting for a response from osd2. This response NEVER COMES.
### Ok. time to find out if OSD2 ever received this query.
2010-11-18 15:05:17.751206 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] waiting for osd2
2010-11-18 15:05:17.751219 7fe83b066710 osd1 10 pg[0.5( empty n=0 ec=2 les=2 10/10/10) [1,2] r=0 mlcod 0'0 !hml peering] update_stats 10'2
2010-11-18 15:05:17.751244 7fe83b066710 filestore(dev/osd1) queue_transactions (writeahead) 165 0x13f41c0
2010-11-18 15:05:17.751252 7fe83b066710 journal journal_transactions 165
2010-11-18 15:05:17.751261 7fe83b066710 journal submit_entry seq 165 len 33 (0x1361960)

### OSD2: radostool blocks waiting for PG 0.5
### OSD2: Note: this is the *first* reference to PG 0.5 *ever* in the log for osd2
2010-11-18 15:05:33.161056 7fd48dc51710 osd2 14 _dispatch 0x2d24d80 osd_op(client4113.0:1 obj01 [writefull 0~1000000] 0.92bd) v1
2010-11-18 15:05:33.161069 7fd48dc51710 osd2 14 require_same_or_newer_map 9 (i am 14) 0x2d24d80
2010-11-18 15:05:33.161079 7fd48dc51710 osd2 14 _share_map_incoming client4113 10.3.14.10:0/1691 9
2010-11-18 15:05:33.161088 7fd48dc51710 osd2 14 client4113 has old map 9 < 14
2010-11-18 15:05:33.161094 7fd48dc51710 osd2 14 send_incremental_map 9 - > 14 to client4113 10.3.14.10:0/1691
2010-11-18 15:05:33.161131 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.14_0 0~116 = 116
2010-11-18 15:05:33.161152 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.13_0 0~116 = 116
2010-11-18 15:05:33.161170 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.12_0 0~116 = 116
2010-11-18 15:05:33.161187 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.11_0 0~116 = 116
2010-11-18 15:05:33.161214 7fd48dc51710 filestore(dev/osd2) read dev/osd2/current/meta/inc_osdmap.10_0 0~424 = 424
2010-11-18 15:05:33.161224 7fd48dc51710 -- 10.3.14.10:6804/29550 --> client4113 10.3.14.10:0/1691 -- osd_map(10,14) v1 -- ?+0 0x2e94000
### OSD2: This is in OSD::handle_op
### OSD2: But handle_op can hardly be expected to do anything else. The real bug must be earlier on. Why can't we ever get out of peering?
2010-11-18 15:05:33.161260 7fd48dc51710 osd2 14 hit non-existent pg 0.5, waiting

### Final PG map
cmccabe@flab:~/src/ceph2/src$ ./ceph pg map 0.5
2010-11-18 21:19:46.532101 mon <- [pg,map,0.5]
2010-11-18 21:19:46.532658 mon1 -> 'osdmap e14 pg 0.5 -> up [1,2] acting [1,2]' (0)
(4-4/4)