Actions
Bug #2094
closedosd: pgs remapped to down+out osd
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This is why the dump_stuck test fails on master. When one osd is marked out, the pg is remapped incorrectly:
2012-02-22 18:42:55.809068 mon <- [osd,dump] 2012-02-22 18:42:55.809762 mon.0 -> 'dumped osdmap epoch 234' (0) epoch 234 fsid ed6c6aab-41ac-42e3-b2af-fda425a5a835 created 2012-02-22 18:25:37.311048 modifed 2012-02-22 18:42:54.751454 flags pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0 max_osd 2 osd.0 down out weight 0 up_from 232 up_thru 122 down_at 234 last_clean_interval [2,231) 10.3.14.175:6800/24051 10.3.14.175:6802/24051 10.3.14.175:6807/24051 exists osd.1 up in weight 1 up_from 196 up_thru 232 down_at 195 last_clean_interval [3,195) 10.3.14.174:6800/11841 10.3.14.174:6801/11841 10.3.14.174:6802/11841 exists,up pg_temp 0.0 [1,0] pg_temp 0.1 [1,0] pg_temp 0.2 [1,0] pg_temp 0.3 [1,0] pg_temp 0.4 [1,0] pg_temp 0.5 [1,0] pg_temp 0.6 [1,0] pg_temp 0.7 [1,0] pg_temp 0.0p0 [1,0] pg_temp 0.1p0 [1,0] pg_temp 0.0p1 [1,0] pg_temp 0.1p1 [1,0] pg_temp 1.0 [1,0] pg_temp 1.1 [1,0] pg_temp 1.2 [1,0] pg_temp 1.3 [1,0] pg_temp 1.4 [1,0] pg_temp 1.5 [1,0] pg_temp 1.6 [1,0] pg_temp 1.7 [1,0] pg_temp 1.0p0 [1,0] pg_temp 1.1p0 [1,0] pg_temp 1.0p1 [1,0] pg_temp 1.1p1 [1,0] pg_temp 2.0 [1,0] pg_temp 2.1 [1,0] pg_temp 2.2 [1,0] pg_temp 2.3 [1,0] pg_temp 2.4 [1,0] pg_temp 2.5 [1,0] pg_temp 2.6 [1,0] pg_temp 2.7 [1,0] pg_temp 2.0p0 [1,0] pg_temp 2.1p0 [1,0] pg_temp 2.0p1 [1,0] pg_temp 2.1p1 [1,0]
2012-02-22 18:56:23.154864 mon <- [pg,dump] 2012-02-22 18:56:23.155765 mon.0 -> 'dumped all in format plain' (0) version 443 last_osdmap_epoch 364 last_pg_scan 1 full_ratio 0.95 nearfull_ratio 0.85 pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up acting last_scrub scrub_stamp 1.0p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193199 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:53.289635 1.1p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193130 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:52.289457 0.1p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193165 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:29.286840 0.0p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193092 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:25.286337 1.1p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193344 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:56.290013 1.0p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193267 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:51.289321 0.0p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193304 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:28.286759 0.1p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193233 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:26.286477 2.1p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193415 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:55.528783 2.0p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193380 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:47:39.999936 2.0p1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193485 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:50.222973 2.1p0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193450 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:47:49.387450 2.5 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193599 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:24.293020 1.6 1 0 0 0 205 116 116 active+clean+remapped 2012-02-22 18:55:35.193561 9'1 360'429 [1] [1,0] 9'1 2012-02-22 18:46:49.289771 0.7 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193521 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:24.286268 2.4 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193730 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:21.292609 1.7 5 0 0 0 1395 585 585 active+clean+remapped 2012-02-22 18:55:35.193691 9'5 360'433 [1] [1,0] 9'5 2012-02-22 18:46:50.290696 0.6 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:10.882835 0'0 360'430 [1] [1,0] 0'0 2012-02-22 18:56:10.882807 2.7 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193869 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:35.347991 1.4 5 0 0 0 1724 816 816 active+clean+remapped 2012-02-22 18:55:35.193831 9'7 360'432 [1] [1,0] 9'7 2012-02-22 18:46:37.289327 0.5 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.193769 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:20.285813 2.6 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194000 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:29.293549 1.5 3 0 0 0 451 349 349 active+clean+remapped 2012-02-22 18:55:35.193960 9'3 360'431 [1] [1,0] 9'3 2012-02-22 18:46:46.290116 0.4 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:09.883158 0'0 360'430 [1] [1,0] 0'0 2012-02-22 18:56:09.883131 2.1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194120 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:47:14.291870 1.2 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194079 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:46:33.287326 0.3 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:16.883499 0'0 360'429 [1] [1,0] 0'0 2012-02-22 18:56:16.883472 2.0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194269 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:02.290673 1.3 3 0 0 0 4135 352 352 active+clean+remapped 2012-02-22 18:55:35.194221 9'3 360'431 [1] [1,0] 9'3 2012-02-22 18:46:36.288397 0.2 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:01.881836 0'0 360'430 [1] [1,0] 0'0 2012-02-22 18:56:01.881809 2.3 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194425 0'0 360'427 [1] [1,0] 0'0 2012-02-22 18:47:17.292216 1.0 4 0 0 0 820 464 464 active+clean+remapped 2012-02-22 18:55:35.194374 4'4 360'429 [1] [1,0] 4'4 2012-02-22 18:46:31.288174 0.1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:14.883395 0'0 360'432 [1] [1,0] 0'0 2012-02-22 18:56:14.883369 2.2 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194576 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:47:15.291877 1.1 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:55:35.194524 0'0 360'428 [1] [1,0] 0'0 2012-02-22 18:46:32.287193 0.0 0 0 0 0 0 0 0 active+clean+remapped 2012-02-22 18:56:07.883842 0'0 360'432 [1] [1,0] 0'0 2012-02-22 18:56:07.883813 pool 0 0 0 0 0 0 0 0 pool 1 21 0 0 0 8730 2682 2682 pool 2 0 0 0 0 0 0 0 sum 21 0 0 0 8730 2682 2682 osdstat kbused kbavail kb hb in hb out 0 32560448 38491776 74798272 [] [1] 1 29821900 41230324 74798272 [0] [] sum 62382348 79722100 149596544
I think this was uncovered due to 8e6f9ca8ac70b6aafabcce4d96e467bfb46e6505 dropping the up != acting check before going clean.
Actions