Project

General

Profile

Actions

Bug #2094

closed

osd: pgs remapped to down+out osd

Added by Josh Durgin about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is why the dump_stuck test fails on master. When one osd is marked out, the pg is remapped incorrectly:

2012-02-22 18:42:55.809068 mon <- [osd,dump]
2012-02-22 18:42:55.809762 mon.0 -> 'dumped osdmap epoch 234' (0)
epoch 234
fsid ed6c6aab-41ac-42e3-b2af-fda425a5a835
created 2012-02-22 18:25:37.311048
modifed 2012-02-22 18:42:54.751454
flags 

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 2 lpgp_num 2 last_change 1 owner 0

max_osd 2
osd.0 down out weight 0 up_from 232 up_thru 122 down_at 234 last_clean_interval [2,231) 10.3.14.175:6800/24051 10.3.14.175:6802/24051 10.3.14.175:6807/24051 exists
osd.1 up   in  weight 1 up_from 196 up_thru 232 down_at 195 last_clean_interval [3,195) 10.3.14.174:6800/11841 10.3.14.174:6801/11841 10.3.14.174:6802/11841 exists,up

pg_temp 0.0 [1,0]
pg_temp 0.1 [1,0]
pg_temp 0.2 [1,0]
pg_temp 0.3 [1,0]
pg_temp 0.4 [1,0]
pg_temp 0.5 [1,0]
pg_temp 0.6 [1,0]
pg_temp 0.7 [1,0]
pg_temp 0.0p0 [1,0]
pg_temp 0.1p0 [1,0]
pg_temp 0.0p1 [1,0]
pg_temp 0.1p1 [1,0]
pg_temp 1.0 [1,0]
pg_temp 1.1 [1,0]
pg_temp 1.2 [1,0]
pg_temp 1.3 [1,0]
pg_temp 1.4 [1,0]
pg_temp 1.5 [1,0]
pg_temp 1.6 [1,0]
pg_temp 1.7 [1,0]
pg_temp 1.0p0 [1,0]
pg_temp 1.1p0 [1,0]
pg_temp 1.0p1 [1,0]
pg_temp 1.1p1 [1,0]
pg_temp 2.0 [1,0]
pg_temp 2.1 [1,0]
pg_temp 2.2 [1,0]
pg_temp 2.3 [1,0]
pg_temp 2.4 [1,0]
pg_temp 2.5 [1,0]
pg_temp 2.6 [1,0]
pg_temp 2.7 [1,0]
pg_temp 2.0p0 [1,0]
pg_temp 2.1p0 [1,0]
pg_temp 2.0p1 [1,0]
pg_temp 2.1p1 [1,0]
2012-02-22 18:56:23.154864 mon <- [pg,dump]
2012-02-22 18:56:23.155765 mon.0 -> 'dumped all in format plain' (0)
version 443
last_osdmap_epoch 364
last_pg_scan 1
full_ratio 0.95
nearfull_ratio 0.85
pg_stat    objects    mip    degr    unf    bytes    log    disklog    state    state_stamp    v    reported    up    acting    last_scrub    scrub_stamp
1.0p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193199    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:53.289635
1.1p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193130    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:52.289457
0.1p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193165    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:29.286840
0.0p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193092    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:25.286337
1.1p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193344    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:56.290013
1.0p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193267    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:51.289321
0.0p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193304    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:28.286759
0.1p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193233    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:26.286477
2.1p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193415    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:55.528783
2.0p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193380    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:47:39.999936
2.0p1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193485    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:50.222973
2.1p0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193450    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:47:49.387450
2.5    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193599    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:24.293020
1.6    1    0    0    0    205    116    116    active+clean+remapped    2012-02-22 18:55:35.193561    9'1    360'429    [1]    [1,0]    9'1    2012-02-22 18:46:49.289771
0.7    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193521    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:24.286268
2.4    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193730    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:21.292609
1.7    5    0    0    0    1395    585    585    active+clean+remapped    2012-02-22 18:55:35.193691    9'5    360'433    [1]    [1,0]    9'5    2012-02-22 18:46:50.290696
0.6    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:10.882835    0'0    360'430    [1]    [1,0]    0'0    2012-02-22 18:56:10.882807
2.7    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193869    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:35.347991
1.4    5    0    0    0    1724    816    816    active+clean+remapped    2012-02-22 18:55:35.193831    9'7    360'432    [1]    [1,0]    9'7    2012-02-22 18:46:37.289327
0.5    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.193769    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:20.285813
2.6    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194000    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:29.293549
1.5    3    0    0    0    451    349    349    active+clean+remapped    2012-02-22 18:55:35.193960    9'3    360'431    [1]    [1,0]    9'3    2012-02-22 18:46:46.290116
0.4    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:09.883158    0'0    360'430    [1]    [1,0]    0'0    2012-02-22 18:56:09.883131
2.1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194120    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:47:14.291870
1.2    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194079    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:46:33.287326
0.3    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:16.883499    0'0    360'429    [1]    [1,0]    0'0    2012-02-22 18:56:16.883472
2.0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194269    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:02.290673
1.3    3    0    0    0    4135    352    352    active+clean+remapped    2012-02-22 18:55:35.194221    9'3    360'431    [1]    [1,0]    9'3    2012-02-22 18:46:36.288397
0.2    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:01.881836    0'0    360'430    [1]    [1,0]    0'0    2012-02-22 18:56:01.881809
2.3    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194425    0'0    360'427    [1]    [1,0]    0'0    2012-02-22 18:47:17.292216
1.0    4    0    0    0    820    464    464    active+clean+remapped    2012-02-22 18:55:35.194374    4'4    360'429    [1]    [1,0]    4'4    2012-02-22 18:46:31.288174
0.1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:14.883395    0'0    360'432    [1]    [1,0]    0'0    2012-02-22 18:56:14.883369
2.2    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194576    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:47:15.291877
1.1    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:55:35.194524    0'0    360'428    [1]    [1,0]    0'0    2012-02-22 18:46:32.287193
0.0    0    0    0    0    0    0    0    active+clean+remapped    2012-02-22 18:56:07.883842    0'0    360'432    [1]    [1,0]    0'0    2012-02-22 18:56:07.883813
pool 0    0    0    0    0    0    0    0
pool 1    21    0    0    0    8730    2682    2682
pool 2    0    0    0    0    0    0    0
 sum    21    0    0    0    8730    2682    2682
osdstat    kbused    kbavail    kb    hb in    hb out
0    32560448    38491776    74798272    []    [1]
1    29821900    41230324    74798272    [0]    []
 sum    62382348    79722100    149596544

I think this was uncovered due to 8e6f9ca8ac70b6aafabcce4d96e467bfb46e6505 dropping the up != acting check before going clean.

Actions

Also available in: Atom PDF