Project

General

Profile

sess.log

Session log with annotations. - Frank Schilder, 08/11/2020 12:04 PM

Download (41.9 KB)

 
1
The transcript below shows a test on a cluster with new OSDs during rebalancing.
2
The test shows the effect of
3

    
4
- stopping+starting a new OSD (osd-phy6, ID 289),
5
- stopping+starting an old OSD (osd-phy9, ID 74).
6

    
7
In any test, we wait for peering to complete before taking "ceph status".
8

    
9
Set noout and norebalance to avoid disturbances during test, wait for recovery
10
to cease. Status at this point:
11

    
12
# ceph status
13
  cluster:
14
    id:     xxx-x-xxx
15
    health: HEALTH_WARN
16
            noout,norebalance flag(s) set
17
            8235970/1498751416 objects misplaced (0.550%)
18
            1 pools nearfull
19
 
20
  services:
21
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
22
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
23
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
24
    osd: 297 osds: 272 up, 272 in; 46 remapped pgs
25
         flags noout,norebalance
26
 
27
  data:
28
    pools:   11 pools, 3215 pgs
29
    objects: 178.0 M objects, 491 TiB
30
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
31
    pgs:     8235970/1498751416 objects misplaced (0.550%)
32
             3163 active+clean
33
             40   active+remapped+backfill_wait
34
             6    active+remapped+backfilling
35
             5    active+clean+scrubbing+deep
36
             1    active+clean+snaptrim
37
 
38
  io:
39
    client:   74 MiB/s rd, 42 MiB/s wr, 1.19 kop/s rd, 889 op/s wr
40
 
41
# docker stop osd-phy6
42
osd-phy6
43

    
44
# ceph status
45
  cluster:
46
    id:     xxx-x-xxx
47
    health: HEALTH_WARN
48
            noout,norebalance flag(s) set
49
            1 osds down
50
            8342724/1498792326 objects misplaced (0.557%)
51
            Degraded data redundancy: 5717609/1498792326 objects degraded (0.381%), 74 pgs degraded
52
            1 pools nearfull
53
 
54
  services:
55
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
56
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
57
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
58
    osd: 297 osds: 271 up, 272 in; 46 remapped pgs
59
         flags noout,norebalance
60
 
61
  data:
62
    pools:   11 pools, 3215 pgs
63
    objects: 178.0 M objects, 491 TiB
64
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
65
    pgs:     5717609/1498792326 objects degraded (0.381%)
66
             8342724/1498792326 objects misplaced (0.557%)
67
             3089 active+clean
68
             74   active+undersized+degraded
69
             31   active+remapped+backfill_wait
70
             11   active+remapped+backfilling
71
             5    active+clean+scrubbing+deep
72
             4    active+clean+remapped+snaptrim
73
             1    active+clean+scrubbing
74
 
75
  io:
76
    client:   69 MiB/s rd, 45 MiB/s wr, 1.28 kop/s rd, 838 op/s wr
77
 
78
# ceph health detail
79
HEALTH_WARN noout,norebalance flag(s) set; 1 osds down; 8342692/1498794289 objects misplaced (0.557%); Degraded data redundancy: 5717610/1498794289 objects degraded (0.381%), 74 pgs degraded, 74 pgs undersized; 1 pools nearfull
80
OSDMAP_FLAGS noout,norebalance flag(s) set
81
OSD_DOWN 1 osds down
82
    osd.289 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-05) is down
83
OBJECT_MISPLACED 8342692/1498794289 objects misplaced (0.557%)
84
PG_DEGRADED Degraded data redundancy: 5717610/1498794289 objects degraded (0.381%), 74 pgs degraded, 74 pgs undersized
85
    pg 11.2 is stuck undersized for 70.197385, current state active+undersized+degraded, last acting [87,292,2147483647,296,229,168,0,263]
86
    pg 11.16 is stuck undersized for 70.178478, current state active+undersized+degraded, last acting [2147483647,181,60,233,237,294,293,292]
87
    pg 11.1f is stuck undersized for 70.190040, current state active+undersized+degraded, last acting [230,238,182,292,84,2147483647,86,239]
88
    pg 11.39 is stuck undersized for 70.193683, current state active+undersized+degraded, last acting [158,148,293,73,168,2,2147483647,236]
89
    pg 11.3b is stuck undersized for 70.200823, current state active+undersized+degraded, last acting [2147483647,85,229,145,170,172,0,230]
90
    pg 11.47 is stuck undersized for 70.196419, current state active+undersized+degraded, last acting [3,296,2147483647,0,233,84,182,238]
91
    pg 11.59 is stuck undersized for 70.190002, current state active+undersized+degraded, last acting [2147483647,76,73,235,156,263,234,172]
92
    pg 11.63 is stuck undersized for 70.160846, current state active+undersized+degraded, last acting [0,146,1,156,2147483647,228,172,238]
93
    pg 11.66 is stuck undersized for 70.086237, current state active+undersized+degraded, last acting [291,159,296,233,2147483647,293,170,145]
94
    pg 11.6d is stuck undersized for 70.210387, current state active+undersized+degraded, last acting [84,235,73,290,295,2147483647,0,183]
95
    pg 11.7b is stuck undersized for 70.202578, current state active+undersized+degraded, last acting [2147483647,146,293,294,296,181,0,263]
96
    pg 11.7d is stuck undersized for 70.178488, current state active+undersized+degraded, last acting [294,2,263,2147483647,170,237,292,235]
97
    pg 11.7f is active+undersized+degraded, acting [148,232,2147483647,230,87,236,168,72]
98
    pg 11.146 is stuck undersized for 70.197744, current state active+undersized+degraded, last acting [235,183,156,295,2147483647,294,146,260]
99
    pg 11.155 is stuck undersized for 70.203091, current state active+undersized+degraded, last acting [73,72,170,259,260,63,84,2147483647]
100
    pg 11.15d is stuck undersized for 70.135909, current state active+undersized+degraded, last acting [259,182,0,63,234,294,233,2147483647]
101
    pg 11.171 is stuck undersized for 70.209391, current state active+undersized+degraded, last acting [170,168,232,72,231,172,2147483647,237]
102
    pg 11.176 is stuck undersized for 70.202583, current state active+undersized+degraded, last acting [146,237,181,2147483647,294,72,236,293]
103
    pg 11.177 is stuck undersized for 70.192564, current state active+undersized+degraded, last acting [156,146,236,235,63,2147483647,3,291]
104
    pg 11.179 is stuck undersized for 70.190284, current state active+undersized+degraded, last acting [87,156,233,86,2147483647,172,259,158]
105
    pg 11.17e is stuck undersized for 70.188938, current state active+undersized+degraded, last acting [3,231,290,260,76,183,2147483647,293]
106
    pg 11.181 is stuck undersized for 70.175985, current state active+undersized+degraded, last acting [2147483647,290,239,148,1,228,145,2]
107
    pg 11.188 is stuck undersized for 70.208638, current state active+undersized+degraded, last acting [2147483647,170,237,172,291,168,232,85]
108
    pg 11.18b is stuck undersized for 70.186336, current state active+undersized+degraded, last acting [233,148,228,87,2147483647,182,235,0]
109
    pg 11.18f is stuck undersized for 70.197416, current state active+undersized+degraded, last acting [73,237,238,2147483647,156,0,292,182]
110
    pg 11.19d is stuck undersized for 70.083071, current state active+undersized+degraded, last acting [291,172,146,145,238,2147483647,296,231]
111
    pg 11.1a5 is stuck undersized for 70.184859, current state active+undersized+degraded, last acting [293,145,2,230,159,239,85,2147483647]
112
    pg 11.1a6 is stuck undersized for 70.209851, current state active+undersized+degraded, last acting [229,145,158,296,0,292,2147483647,239]
113
    pg 11.1ac is stuck undersized for 70.192130, current state active+undersized+degraded, last acting [234,84,2147483647,86,239,183,294,232]
114
    pg 11.1b0 is stuck undersized for 70.180993, current state active+undersized+degraded, last acting [168,293,290,2,2147483647,159,296,73]
115
    pg 11.1b1 is stuck undersized for 70.175329, current state active+undersized+degraded, last acting [172,259,168,260,73,2147483647,146,263]
116
    pg 11.1b7 is stuck undersized for 70.208713, current state active+undersized+degraded, last acting [263,172,2147483647,259,0,87,145,228]
117
    pg 11.1c1 is stuck undersized for 70.170314, current state active+undersized+degraded, last acting [182,148,263,293,2,2147483647,228,294]
118
    pg 11.1c3 is stuck undersized for 70.192088, current state active+undersized+degraded, last acting [234,290,63,239,85,156,76,2147483647]
119
    pg 11.1c7 is stuck undersized for 70.192194, current state active+undersized+degraded, last acting [1,2147483647,263,232,86,234,84,172]
120
    pg 11.1dd is stuck undersized for 70.183525, current state active+undersized+degraded, last acting [293,172,295,156,170,237,2147483647,86]
121
    pg 11.1de is stuck undersized for 69.972952, current state active+undersized+degraded, last acting [296,293,76,63,231,146,2147483647,168]
122
    pg 11.1e8 is stuck undersized for 70.172003, current state active+undersized+degraded, last acting [172,3,290,229,236,156,2147483647,228]
123
    pg 11.1f2 is stuck undersized for 70.196870, current state active+undersized+degraded, last acting [234,0,159,2147483647,232,73,290,181]
124
    pg 11.1f5 is stuck undersized for 70.190841, current state active+undersized+degraded, last acting [238,234,73,2147483647,158,291,172,168]
125
    pg 11.1fc is stuck undersized for 70.181133, current state active+undersized+degraded, last acting [172,86,85,230,182,2147483647,238,233]
126
    pg 11.1fd is stuck undersized for 70.221124, current state active+undersized+degraded, last acting [72,145,237,293,2147483647,60,87,172]
127
    pg 11.203 is stuck undersized for 70.193700, current state active+undersized+degraded, last acting [2147483647,235,168,60,87,63,295,230]
128
    pg 11.20d is stuck undersized for 70.197909, current state active+undersized+degraded, last acting [236,172,73,182,228,168,2147483647,293]
129
    pg 11.20f is stuck undersized for 70.196571, current state active+undersized+degraded, last acting [85,84,76,60,238,233,159,2147483647]
130
    pg 11.211 is stuck undersized for 70.197522, current state active+undersized+degraded, last acting [156,2147483647,170,234,0,238,1,231]
131
    pg 11.212 is stuck undersized for 70.201683, current state active+undersized+degraded, last acting [148,2147483647,85,182,84,232,86,230]
132
    pg 11.21e is stuck undersized for 70.202044, current state active+undersized+degraded, last acting [146,156,159,2147483647,230,238,239,2]
133
    pg 11.224 is stuck undersized for 70.095494, current state active+undersized+degraded, last acting [291,148,237,2147483647,170,1,156,233]
134
    pg 11.22d is stuck undersized for 70.195735, current state active+undersized+degraded, last acting [3,168,296,158,292,236,0,2147483647]
135
    pg 11.22f is stuck undersized for 70.192480, current state active+undersized+degraded, last acting [1,2147483647,292,60,296,231,259,72]
136
POOL_NEAR_FULL 1 pools nearfull
137
    pool 'sr-rbd-data-one-hdd' has 164 TiB (max 200 TiB)
138

    
139
# ceph pg 11.2 query | jq ".acting,.up,.recovery_state"
140
[
141
  87,
142
  292,
143
  2147483647,
144
  296,
145
  229,
146
  168,
147
  0,
148
  263
149
]
150
[
151
  87,
152
  292,
153
  2147483647,
154
  296,
155
  229,
156
  168,
157
  0,
158
  263
159
]
160
[
161
  {
162
    "name": "Started/Primary/Active",
163
    "enter_time": "2020-08-11 10:35:49.109129",
164
    "might_have_unfound": [],
165
    "recovery_progress": {
166
      "backfill_targets": [],
167
      "waiting_on_backfill": [],
168
      "last_backfill_started": "MIN",
169
      "backfill_info": {
170
        "begin": "MIN",
171
        "end": "MIN",
172
        "objects": []
173
      },
174
      "peer_backfill_info": [],
175
      "backfills_in_flight": [],
176
      "recovering": [],
177
      "pg_backend": {
178
        "recovery_ops": [],
179
        "read_ops": []
180
      }
181
    },
182
    "scrub": {
183
      "scrubber.epoch_start": "0",
184
      "scrubber.active": false,
185
      "scrubber.state": "INACTIVE",
186
      "scrubber.start": "MIN",
187
      "scrubber.end": "MIN",
188
      "scrubber.max_end": "MIN",
189
      "scrubber.subset_last_update": "0'0",
190
      "scrubber.deep": false,
191
      "scrubber.waiting_on_whom": []
192
    }
193
  },
194
  {
195
    "name": "Started",
196
    "enter_time": "2020-08-11 10:35:48.137595"
197
  }
198
]
199

    
200
# docker start osd-phy6
201
osd-phy6
202

    
203
# After starting the OSD again, the cluster almost recovers. The PG showing
204
# up as backfill_toofull is due to a known bug (fixed in 13.2.10?). No
205
# degraded objects, just misplaced ones.
206

    
207
# ceph status
208
  cluster:
209
    id:     xxx-x-xxx
210
    health: HEALTH_ERR
211
            noout,norebalance flag(s) set
212
            8181843/1498795556 objects misplaced (0.546%)
213
            Degraded data redundancy (low space): 1 pg backfill_toofull
214
            1 pools nearfull
215
 
216
  services:
217
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
218
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
219
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
220
    osd: 297 osds: 272 up, 272 in; 46 remapped pgs
221
         flags noout,norebalance
222
 
223
  data:
224
    pools:   11 pools, 3215 pgs
225
    objects: 178.0 M objects, 491 TiB
226
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
227
    pgs:     8181843/1498795556 objects misplaced (0.546%)
228
             3163 active+clean
229
             39   active+remapped+backfill_wait
230
             6    active+remapped+backfilling
231
             5    active+clean+scrubbing+deep
232
             1    active+remapped+backfill_toofull
233
             1    active+clean+snaptrim
234
 
235
  io:
236
    client:   35 MiB/s rd, 23 MiB/s wr, 672 op/s rd, 686 op/s wr
237
 
238
# docker stop osd-phy9
239
osd-phy9
240

    
241
# After stopping an old OSD, we observe an immediate degradation. The cluster seems
242
# to loose track of objects already here. The recovery operation does not stop in
243
# contrast to the situation when stopping a new OSD as seen above, where it shows up
244
# only temporarily.
245

    
246
# ceph status
247
  cluster:
248
    id:     xxx-x-xxx
249
    health: HEALTH_WARN
250
            noout,norebalance flag(s) set
251
            1 osds down
252
            7967641/1498798381 objects misplaced (0.532%)
253
            Degraded data redundancy: 5763425/1498798381 objects degraded (0.385%), 75 pgs degraded
254
            1 pools nearfull
255
 
256
  services:
257
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
258
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
259
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
260
    osd: 297 osds: 271 up, 272 in; 46 remapped pgs
261
         flags noout,norebalance
262
 
263
  data:
264
    pools:   11 pools, 3215 pgs
265
    objects: 178.0 M objects, 491 TiB
266
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
267
    pgs:     5763425/1498798381 objects degraded (0.385%)
268
             7967641/1498798381 objects misplaced (0.532%)
269
             3092 active+clean
270
             70   active+undersized+degraded
271
             41   active+remapped+backfill_wait
272
             4    active+clean+scrubbing+deep
273
             4    active+undersized+degraded+remapped+backfilling
274
             2    active+clean+scrubbing
275
             1    active+undersized+degraded+remapped+backfill_wait
276
             1    active+clean+snaptrim
277
 
278
  io:
279
    client:   76 MiB/s rd, 76 MiB/s wr, 736 op/s rd, 881 op/s wr
280
    recovery: 93 MiB/s, 23 objects/s
281
 
282
# ceph health detail
283
HEALTH_WARN noout,norebalance flag(s) set; 1 osds down; 7966306/1498798501 objects misplaced (0.532%); Degraded data redundancy: 5762977/1498798501 objects degraded (0.385%), 75 pgs degraded; 1 pools nearfull
284
OSDMAP_FLAGS noout,norebalance flag(s) set
285
OSD_DOWN 1 osds down
286
    osd.74 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-05) is down
287
OBJECT_MISPLACED 7966306/1498798501 objects misplaced (0.532%)
288
PG_DEGRADED Degraded data redundancy: 5762977/1498798501 objects degraded (0.385%), 75 pgs degraded
289
    pg 11.4 is active+undersized+degraded, acting [86,2147483647,237,235,182,63,231,84]
290
    pg 11.5 is active+undersized+degraded, acting [1,2147483647,183,0,239,145,293,170]
291
    pg 11.a is active+undersized+degraded+remapped+backfilling, acting [170,156,148,2147483647,234,86,236,232]
292
    pg 11.b is active+undersized+degraded, acting [3,228,0,146,239,292,145,2147483647]
293
    pg 11.e is active+undersized+degraded, acting [237,181,73,183,72,290,2147483647,295]
294
    pg 11.1b is active+undersized+degraded, acting [85,183,72,2147483647,156,232,263,146]
295
    pg 11.1d is active+undersized+degraded, acting [290,296,183,86,293,2147483647,236,3]
296
    pg 11.37 is active+undersized+degraded, acting [60,233,181,183,2147483647,296,87,86]
297
    pg 11.50 is active+undersized+degraded+remapped+backfill_wait, acting [231,259,228,87,182,156,2147483647,172]
298
    pg 11.52 is active+undersized+degraded, acting [237,60,1,2147483647,233,232,292,86]
299
    pg 11.57 is active+undersized+degraded, acting [231,259,230,170,72,87,181,2147483647]
300
    pg 11.5a is active+undersized+degraded, acting [290,2147483647,237,183,293,84,295,1]
301
    pg 11.5c is active+undersized+degraded, acting [84,2147483647,259,0,85,234,146,148]
302
    pg 11.62 is active+undersized+degraded, acting [182,294,293,2147483647,63,234,181,0]
303
    pg 11.68 is active+undersized+degraded, acting [158,296,229,168,76,3,159,2147483647]
304
    pg 11.6a is active+undersized+degraded, acting [2147483647,288,238,172,1,237,0,290]
305
    pg 11.73 is active+undersized+degraded, acting [236,234,259,2147483647,170,63,3,0]
306
    pg 11.77 is active+undersized+degraded, acting [2147483647,72,87,183,236,156,290,293]
307
    pg 11.78 is active+undersized+degraded, acting [172,230,236,156,294,60,2147483647,76]
308
    pg 11.84 is active+undersized+degraded, acting [86,84,239,296,294,182,2147483647,293]
309
    pg 11.87 is active+undersized+degraded+remapped+backfilling, acting [148,60,231,260,235,87,2147483647,181]
310
    pg 11.8b is active+undersized+degraded, acting [263,170,2147483647,259,296,172,73,76]
311
    pg 11.15a is active+undersized+degraded, acting [2147483647,260,182,0,263,73,159,288]
312
    pg 11.15f is active+undersized+degraded+remapped+backfilling, acting [146,233,2147483647,76,234,172,181,229]
313
    pg 11.162 is active+undersized+degraded, acting [84,294,230,2,293,290,2147483647,295]
314
    pg 11.16d is active+undersized+degraded, acting [236,230,2147483647,183,0,1,235,181]
315
    pg 11.172 is active+undersized+degraded, acting [181,148,237,3,231,293,76,2147483647]
316
    pg 11.185 is active+undersized+degraded, acting [296,0,236,238,2147483647,294,181,146]
317
    pg 11.18a is active+undersized+degraded, acting [0,2147483647,159,145,293,233,85,146]
318
    pg 11.192 is active+undersized+degraded, acting [148,76,170,296,295,2147483647,3,235]
319
    pg 11.193 is active+undersized+degraded, acting [2147483647,148,295,230,232,168,76,290]
320
    pg 11.198 is active+undersized+degraded, acting [260,76,87,2147483647,145,183,229,239]
321
    pg 11.19a is active+undersized+degraded, acting [146,294,230,238,2147483647,0,295,288]
322
    pg 11.1a1 is active+undersized+degraded, acting [84,183,294,2147483647,234,170,263,238]
323
    pg 11.1a7 is active+undersized+degraded, acting [63,236,158,84,86,237,87,2147483647]
324
    pg 11.1ae is active+undersized+degraded, acting [296,172,238,2147483647,170,288,294,295]
325
    pg 11.1c5 is active+undersized+degraded, acting [76,172,236,232,2147483647,296,288,170]
326
    pg 11.1c6 is active+undersized+degraded, acting [236,72,230,170,2147483647,238,181,148]
327
    pg 11.1d1 is active+undersized+degraded, acting [259,170,291,3,156,2147483647,292,296]
328
    pg 11.1d4 is active+undersized+degraded, acting [263,228,182,84,2,2147483647,259,87]
329
    pg 11.1e1 is active+undersized+degraded, acting [158,145,233,1,259,296,2,2147483647]
330
    pg 11.1ea is active+undersized+degraded, acting [84,183,260,259,85,60,2,2147483647]
331
    pg 11.1ec is active+undersized+degraded, acting [292,293,233,2,2147483647,85,288,146]
332
    pg 11.1ed is active+undersized+degraded, acting [156,237,293,233,148,2147483647,291,85]
333
    pg 11.1ee is active+undersized+degraded, acting [1,229,0,63,228,2147483647,233,156]
334
    pg 11.201 is active+undersized+degraded, acting [229,239,296,63,76,294,182,2147483647]
335
    pg 11.206 is active+undersized+degraded, acting [235,288,76,158,296,263,85,2147483647]
336
    pg 11.20a is active+undersized+degraded, acting [158,1,263,232,0,230,292,2147483647]
337
    pg 11.218 is active+undersized+degraded, acting [0,296,87,2147483647,263,148,156,232]
338
    pg 11.21a is active+undersized+degraded, acting [2147483647,230,159,231,60,235,73,291]
339
    pg 11.21b is active+undersized+degraded, acting [84,159,238,87,291,230,2147483647,182]
340
POOL_NEAR_FULL 1 pools nearfull
341
    pool 'sr-rbd-data-one-hdd' has 164 TiB (max 200 TiB)
342

    
343
# ceph pg 11.4 query | jq ".acting,.up,.recovery_state"
344
[
345
  86,
346
  2147483647,
347
  237,
348
  235,
349
  182,
350
  63,
351
  231,
352
  84
353
]
354
[
355
  86,
356
  2147483647,
357
  237,
358
  235,
359
  182,
360
  63,
361
  231,
362
  84
363
]
364
[
365
  {
366
    "name": "Started/Primary/Active",
367
    "enter_time": "2020-08-11 10:41:55.983304",
368
    "might_have_unfound": [],
369
    "recovery_progress": {
370
      "backfill_targets": [],
371
      "waiting_on_backfill": [],
372
      "last_backfill_started": "MIN",
373
      "backfill_info": {
374
        "begin": "MIN",
375
        "end": "MIN",
376
        "objects": []
377
      },
378
      "peer_backfill_info": [],
379
      "backfills_in_flight": [],
380
      "recovering": [],
381
      "pg_backend": {
382
        "recovery_ops": [],
383
        "read_ops": [
384
          {
385
            "tid": 8418509,
386
            "to_read": "{11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head=read_request_t(to_read=[1646592,24576,0], need={63(5)=[0,1],86(0)=[0,1],182(4)=[0,1],231(6)=[0,1],235(3)=[0,1],237(2)=[0,1]}, want_attrs=0)}",
387
            "complete": "{11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head=read_result_t(r=0, errors={}, noattrs, returned=(1646592, 24576, [63(5),4096, 86(0),4096, 231(6),4096, 235(3),4096, 237(2),4096]))}",
388
            "priority": 127,
389
            "obj_to_source": "{11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head=63(5),86(0),182(4),231(6),235(3),237(2)}",
390
            "source_to_obj": "{63(5)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head,86(0)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head,182(4)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head,231(6)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head,235(3)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head,237(2)=11:2003536f:::rbd_data.1.a508f96b8b4567.000000000000182c:head}",
391
            "in_progress": "182(4)"
392
          }
393
        ]
394
      }
395
    },
396
    "scrub": {
397
      "scrubber.epoch_start": "0",
398
      "scrubber.active": false,
399
      "scrubber.state": "INACTIVE",
400
      "scrubber.start": "MIN",
401
      "scrubber.end": "MIN",
402
      "scrubber.max_end": "MIN",
403
      "scrubber.subset_last_update": "0'0",
404
      "scrubber.deep": false,
405
      "scrubber.waiting_on_whom": []
406
    }
407
  },
408
  {
409
    "name": "Started",
410
    "enter_time": "2020-08-11 10:41:55.202575"
411
  }
412
]
413

    
414
# docker start osd-phy9
415
osd-phy9
416

    
417
# After starting the old OSD, a lot of objects remain degraded. It looks like PGs that were
418
# in state "...+remapped+backfilling" are affected. All others seem to recover, see below.
419

    
420
# ceph status
421
  cluster:
422
    id:     xxx-x-xxx
423
    health: HEALTH_ERR
424
            noout,norebalance flag(s) set
425
            7954306/1498800854 objects misplaced (0.531%)
426
            Degraded data redundancy: 208493/1498800854 objects degraded (0.014%), 3 pgs degraded, 3 pgs undersized
427
            Degraded data redundancy (low space): 4 pgs backfill_toofull
428
            1 pools nearfull
429
 
430
  services:
431
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
432
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
433
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
434
    osd: 297 osds: 272 up, 272 in; 46 remapped pgs
435
         flags noout,norebalance
436
 
437
  data:
438
    pools:   11 pools, 3215 pgs
439
    objects: 178.0 M objects, 491 TiB
440
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
441
    pgs:     208493/1498800854 objects degraded (0.014%)
442
             7954306/1498800854 objects misplaced (0.531%)
443
             3162 active+clean
444
             39   active+remapped+backfill_wait
445
             6    active+clean+scrubbing+deep
446
             4    active+remapped+backfill_toofull
447
             3    active+undersized+degraded+remapped+backfilling
448
             1    active+clean+snaptrim
449
 
450
  io:
451
    client:   111 MiB/s rd, 42 MiB/s wr, 763 op/s rd, 750 op/s wr
452
    recovery: 66 MiB/s, 16 objects/s
453
 
454
# ceph health detail
455
HEALTH_ERR noout,norebalance flag(s) set; 7953632/1498800881 objects misplaced (0.531%); Degraded data redundancy: 208184/1498800881 objects degraded (0.014%), 3 pgs degraded, 3 pgs undersized; Degraded data redundancy (low space): 4 pgs backfill_toofull; 1 pools nearfull
456
OSDMAP_FLAGS noout,norebalance flag(s) set
457
OBJECT_MISPLACED 7953632/1498800881 objects misplaced (0.531%)
458
PG_DEGRADED Degraded data redundancy: 208184/1498800881 objects degraded (0.014%), 3 pgs degraded, 3 pgs undersized
459
    pg 11.a is stuck undersized for 311.488352, current state active+undersized+degraded+remapped+backfilling, last acting [170,156,148,2147483647,234,86,236,232]
460
    pg 11.87 is stuck undersized for 311.487625, current state active+undersized+degraded+remapped+backfilling, last acting [148,60,231,260,235,87,2147483647,181]
461
    pg 11.ed is stuck undersized for 311.465765, current state active+undersized+degraded+remapped+backfilling, last acting [233,2147483647,156,259,159,182,230,85]
462
PG_DEGRADED_FULL Degraded data redundancy (low space): 4 pgs backfill_toofull
463
    pg 11.8 is active+remapped+backfill_wait+backfill_toofull, acting [86,158,237,85,159,259,144,263]
464
    pg 11.5d is active+remapped+backfill_wait+backfill_toofull, acting [263,158,230,73,183,84,2,169]
465
    pg 11.165 is active+remapped+backfill_wait+backfill_toofull, acting [60,148,234,73,2,229,84,180]
466
    pg 11.1f0 is active+remapped+backfill_wait+backfill_toofull, acting [237,148,2,238,169,231,60,87]
467
POOL_NEAR_FULL 1 pools nearfull
468
    pool 'sr-rbd-data-one-hdd' has 164 TiB (max 200 TiB)
469

    
470
# ceph pg 11.4 query | jq ".acting,.up,.recovery_state"
471
[
472
  86,
473
  74,
474
  237,
475
  235,
476
  182,
477
  63,
478
  231,
479
  84
480
]
481
[
482
  86,
483
  74,
484
  237,
485
  235,
486
  182,
487
  63,
488
  231,
489
  84
490
]
491
[
492
  {
493
    "name": "Started/Primary/Active",
494
    "enter_time": "2020-08-11 10:46:20.882683",
495
    "might_have_unfound": [],
496
    "recovery_progress": {
497
      "backfill_targets": [],
498
      "waiting_on_backfill": [],
499
      "last_backfill_started": "MIN",
500
      "backfill_info": {
501
        "begin": "MIN",
502
        "end": "MIN",
503
        "objects": []
504
      },
505
      "peer_backfill_info": [],
506
      "backfills_in_flight": [],
507
      "recovering": [],
508
      "pg_backend": {
509
        "recovery_ops": [],
510
        "read_ops": []
511
      }
512
    },
513
    "scrub": {
514
      "scrubber.epoch_start": "0",
515
      "scrubber.active": false,
516
      "scrubber.state": "INACTIVE",
517
      "scrubber.start": "MIN",
518
      "scrubber.end": "MIN",
519
      "scrubber.max_end": "MIN",
520
      "scrubber.subset_last_update": "0'0",
521
      "scrubber.deep": false,
522
      "scrubber.waiting_on_whom": []
523
    }
524
  },
525
  {
526
    "name": "Started",
527
    "enter_time": "2020-08-11 10:46:19.736862"
528
  }
529
]
530

    
531
# ceph pg 11.a query | jq ".acting,.up,.recovery_state"
532
[
533
  170,
534
  156,
535
  148,
536
  2147483647,
537
  234,
538
  86,
539
  236,
540
  232
541
]
542
[
543
  170,
544
  156,
545
  292,
546
  289,
547
  234,
548
  86,
549
  236,
550
  232
551
]
552
[
553
  {
554
    "name": "Started/Primary/Active",
555
    "enter_time": "2020-08-11 10:41:55.982261",
556
    "might_have_unfound": [
557
      {
558
        "osd": "74(3)",
559
        "status": "already probed"
560
      }
561
    ],
562
    "recovery_progress": {
563
      "backfill_targets": [
564
        "289(3)",
565
        "292(2)"
566
      ],
567
      "waiting_on_backfill": [],
568
      "last_backfill_started": "11:500368d3:::rbd_data.1.af0b536b8b4567.000000000062aef2:head",
569
      "backfill_info": {
570
        "begin": "11:5003695e:::rbd_data.1.318f016b8b4567.000000000004aa48:head",
571
        "end": "11:5003e7ba:::rbd_data.1.ac314b6b8b4567.00000000000a3032:head",
572
        "objects": [
573
          {
574
            "object": "11:5003695e:::rbd_data.1.318f016b8b4567.000000000004aa48:head",
575
            "version": "66191'195037"
576
          },
577
          
578
          [... many many similar entries removed ...]
579
          
580
          {
581
            "object": "11:5003e76b:::rbd_data.1.b023c26b8b4567.0000000000b8efd1:head",
582
            "version": "181835'908372"
583
          }
584
        ]
585
      },
586
      "peer_backfill_info": [
587
        "289(3)",
588
        {
589
          "begin": "MAX",
590
          "end": "MAX",
591
          "objects": []
592
        },
593
        "292(2)",
594
        {
595
          "begin": "MAX",
596
          "end": "MAX",
597
          "objects": []
598
        }
599
      ],
600
      "backfills_in_flight": [
601
        "11:500368d3:::rbd_data.1.af0b536b8b4567.000000000062aef2:head"
602
      ],
603
      "recovering": [
604
        "11:500368d3:::rbd_data.1.af0b536b8b4567.000000000062aef2:head"
605
      ],
606
      "pg_backend": {
607
        "recovery_ops": [
608
          {
609
            "hoid": "11:500368d3:::rbd_data.1.af0b536b8b4567.000000000062aef2:head",
610
            "v": "178993'819609",
611
            "missing_on": "289(3),292(2)",
612
            "missing_on_shards": "2,3",
613
            "recovery_info": "ObjectRecoveryInfo(11:500368d3:::rbd_data.1.af0b536b8b4567.000000000062aef2:head@178993'819609, size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:{})",
614
            "recovery_progress": "ObjectRecoveryProgress(!first, data_recovered_to:4202496, data_complete:true, omap_recovered_to:, omap_complete:true, error:false)",
615
            "state": "WRITING",
616
            "waiting_on_pushes": "289(3),292(2)",
617
            "extent_requested": "0,8404992"
618
          }
619
        ],
620
        "read_ops": []
621
      }
622
    },
623
    "scrub": {
624
      "scrubber.epoch_start": "0",
625
      "scrubber.active": false,
626
      "scrubber.state": "INACTIVE",
627
      "scrubber.start": "MIN",
628
      "scrubber.end": "MIN",
629
      "scrubber.max_end": "MIN",
630
      "scrubber.subset_last_update": "0'0",
631
      "scrubber.deep": false,
632
      "scrubber.waiting_on_whom": []
633
    }
634
  },
635
  {
636
    "name": "Started",
637
    "enter_time": "2020-08-11 10:41:55.090085"
638
  }
639
]
640

    
641

    
642
# Below we temporarily change placement of new OSDs back and forth. The
643
# initiated peering leads to re-discovery of all objects.
644

    
645
# ceph osd crush move osd.288 host=bb-04
646
moved item id 288 name 'osd.288' to location {host=bb-04} in crush map
647
# ceph osd crush move osd.289 host=bb-05
648
moved item id 289 name 'osd.289' to location {host=bb-05} in crush map
649
# ceph osd crush move osd.290 host=bb-06
650
moved item id 290 name 'osd.290' to location {host=bb-06} in crush map
651
# ceph osd crush move osd.291 host=bb-21
652
moved item id 291 name 'osd.291' to location {host=bb-21} in crush map
653
# ceph osd crush move osd.292 host=bb-07
654
moved item id 292 name 'osd.292' to location {host=bb-07} in crush map
655
# ceph osd crush move osd.293 host=bb-18
656
moved item id 293 name 'osd.293' to location {host=bb-18} in crush map
657
# ceph osd crush move osd.295 host=bb-19
658
moved item id 295 name 'osd.295' to location {host=bb-19} in crush map
659
# ceph osd crush move osd.294 host=bb-22
660
moved item id 294 name 'osd.294' to location {host=bb-22} in crush map
661
# ceph osd crush move osd.296 host=bb-20
662
moved item id 296 name 'osd.296' to location {host=bb-20} in crush map
663

    
664
# All objects found at this point. Notice that a slow op shows up for
665
# one of the mons (see very end). It does not clear itself, a restart
666
# is required.
667

    
668
# ceph status
669
  cluster:
670
    id:     xxx-x-xxx
671
    health: HEALTH_WARN
672
            noout,norebalance flag(s) set
673
            59942033/1498816658 objects misplaced (3.999%)
674
            1 pools nearfull
675
            1 slow ops, oldest one blocked for 62 sec, mon.ceph-03 has slow ops
676
 
677
  services:
678
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
679
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
680
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
681
    osd: 297 osds: 272 up, 272 in; 419 remapped pgs
682
         flags noout,norebalance
683
 
684
  data:
685
    pools:   11 pools, 3215 pgs
686
    objects: 178.0 M objects, 491 TiB
687
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
688
    pgs:     59942033/1498816658 objects misplaced (3.999%)
689
             2747 active+clean
690
             348  active+remapped+backfill_wait
691
             71   active+remapped+backfilling
692
             34   active+clean+snaptrim
693
             12   active+clean+snaptrim_wait
694
             3    active+clean+scrubbing+deep
695
 
696
  io:
697
    client:   130 MiB/s rd, 113 MiB/s wr, 1.38 kop/s rd, 1.53 kop/s wr
698
 
699
# ceph pg 11.4 query | jq ".acting,.up,.recovery_state"
700
[
701
  86,
702
  74,
703
  237,
704
  235,
705
  182,
706
  63,
707
  231,
708
  84
709
]
710
[
711
  86,
712
  74,
713
  237,
714
  235,
715
  182,
716
  63,
717
  231,
718
  84
719
]
720
[
721
  {
722
    "name": "Started/Primary/Active",
723
    "enter_time": "2020-08-11 10:50:26.411524",
724
    "might_have_unfound": [],
725
    "recovery_progress": {
726
      "backfill_targets": [],
727
      "waiting_on_backfill": [],
728
      "last_backfill_started": "MIN",
729
      "backfill_info": {
730
        "begin": "MIN",
731
        "end": "MIN",
732
        "objects": []
733
      },
734
      "peer_backfill_info": [],
735
      "backfills_in_flight": [],
736
      "recovering": [],
737
      "pg_backend": {
738
        "recovery_ops": [],
739
        "read_ops": []
740
      }
741
    },
742
    "scrub": {
743
      "scrubber.epoch_start": "0",
744
      "scrubber.active": false,
745
      "scrubber.state": "INACTIVE",
746
      "scrubber.start": "MIN",
747
      "scrubber.end": "MIN",
748
      "scrubber.max_end": "MIN",
749
      "scrubber.subset_last_update": "0'0",
750
      "scrubber.deep": false,
751
      "scrubber.waiting_on_whom": []
752
    }
753
  },
754
  {
755
    "name": "Started",
756
    "enter_time": "2020-08-11 10:50:20.931555"
757
  }
758
]
759

    
760
# ceph pg 11.a query | jq ".acting,.up,.recovery_state"
761
[
762
  170,
763
  156,
764
  148,
765
  74,
766
  234,
767
  86,
768
  236,
769
  232
770
]
771
[
772
  170,
773
  156,
774
  148,
775
  74,
776
  234,
777
  86,
778
  236,
779
  232
780
]
781
[
782
  {
783
    "name": "Started/Primary/Active",
784
    "enter_time": "2020-08-11 10:50:18.781335",
785
    "might_have_unfound": [
786
      {
787
        "osd": "74(3)",
788
        "status": "already probed"
789
      },
790
      {
791
        "osd": "86(5)",
792
        "status": "already probed"
793
      },
794
      {
795
        "osd": "148(2)",
796
        "status": "already probed"
797
      },
798
      {
799
        "osd": "156(1)",
800
        "status": "already probed"
801
      },
802
      {
803
        "osd": "232(7)",
804
        "status": "already probed"
805
      },
806
      {
807
        "osd": "234(4)",
808
        "status": "already probed"
809
      },
810
      {
811
        "osd": "236(6)",
812
        "status": "already probed"
813
      },
814
      {
815
        "osd": "289(3)",
816
        "status": "not queried"
817
      },
818
      {
819
        "osd": "292(2)",
820
        "status": "not queried"
821
      }
822
    ],
823
    "recovery_progress": {
824
      "backfill_targets": [],
825
      "waiting_on_backfill": [],
826
      "last_backfill_started": "MIN",
827
      "backfill_info": {
828
        "begin": "MIN",
829
        "end": "MIN",
830
        "objects": []
831
      },
832
      "peer_backfill_info": [],
833
      "backfills_in_flight": [],
834
      "recovering": [],
835
      "pg_backend": {
836
        "recovery_ops": [],
837
        "read_ops": []
838
      }
839
    },
840
    "scrub": {
841
      "scrubber.epoch_start": "0",
842
      "scrubber.active": false,
843
      "scrubber.state": "INACTIVE",
844
      "scrubber.start": "MIN",
845
      "scrubber.end": "MIN",
846
      "scrubber.max_end": "MIN",
847
      "scrubber.subset_last_update": "0'0",
848
      "scrubber.deep": false,
849
      "scrubber.waiting_on_whom": []
850
    }
851
  },
852
  {
853
    "name": "Started",
854
    "enter_time": "2020-08-11 10:50:18.043630"
855
  }
856
]
857

    
858
# ceph osd crush move osd.288 host=ceph-04
859
moved item id 288 name 'osd.288' to location {host=ceph-04} in crush map
860
# ceph osd crush move osd.289 host=ceph-05
861
moved item id 289 name 'osd.289' to location {host=ceph-05} in crush map
862
# ceph osd crush move osd.290 host=ceph-06
863
moved item id 290 name 'osd.290' to location {host=ceph-06} in crush map
864
# ceph osd crush move osd.291 host=ceph-21
865
moved item id 291 name 'osd.291' to location {host=ceph-21} in crush map
866
# ceph osd crush move osd.292 host=ceph-07
867
moved item id 292 name 'osd.292' to location {host=ceph-07} in crush map
868
# ceph osd crush move osd.293 host=ceph-18
869
moved item id 293 name 'osd.293' to location {host=ceph-18} in crush map
870
# ceph osd crush move osd.295 host=ceph-19
871
moved item id 295 name 'osd.295' to location {host=ceph-19} in crush map
872
# ceph osd crush move osd.294 host=ceph-22
873
moved item id 294 name 'osd.294' to location {host=ceph-22} in crush map
874
# ceph osd crush move osd.296 host=ceph-20
875
moved item id 296 name 'osd.296' to location {host=ceph-20} in crush map
876

    
877

    
878
# After these placement operations, we start observing slow ops. Not sute what
879
# is going on here, but something seems not to work the way it should. We recorded
880
# here 2 ceph status reports to show the transition. In between these two there
881
# was a PG going down as well, it was shown as 1 pg inactive for a short time.
882
# Didn't manage to catch that for the record.
883

    
884
# ceph status
885
  cluster:
886
    id:     xxx-x-xxx
887
    health: HEALTH_WARN
888
            noout,norebalance flag(s) set
889
            8630330/1498837232 objects misplaced (0.576%)
890
            1 pools nearfull
891
            8 slow ops, oldest one blocked for 212 sec, daemons [osd.169,osd.234,osd.288,osd.63,mon.ceph-03] have slow ops.
892
 
893
  services:
894
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
895
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
896
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
897
    osd: 297 osds: 272 up, 272 in; 46 remapped pgs
898
         flags noout,norebalance
899
 
900
  data:
901
    pools:   11 pools, 3215 pgs
902
    objects: 178.0 M objects, 491 TiB
903
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
904
    pgs:     0.156% pgs not active
905
             8630330/1498837232 objects misplaced (0.576%)
906
             3158 active+clean
907
             41   active+remapped+backfill_wait
908
             6    active+clean+scrubbing+deep
909
             4    active+remapped+backfilling
910
             4    activating
911
             1    activating+remapped
912
             1    active+clean+snaptrim
913
 
914
  io:
915
    client:   85 MiB/s rd, 127 MiB/s wr, 534 op/s rd, 844 op/s wr
916
 
917
# ceph status
918
  cluster:
919
    id:     xxx-x-xxx
920
    health: HEALTH_WARN
921
            noout,norebalance flag(s) set
922
            8630330/1498844491 objects misplaced (0.576%)
923
            1 pools nearfull
924
            1 slow ops, oldest one blocked for 247 sec, mon.ceph-03 has slow ops
925
 
926
  services:
927
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
928
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
929
    mds: con-fs2-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby-replay
930
    osd: 297 osds: 272 up, 272 in; 46 remapped pgs
931
         flags noout,norebalance
932
 
933
  data:
934
    pools:   11 pools, 3215 pgs
935
    objects: 178.0 M objects, 491 TiB
936
    usage:   685 TiB used, 1.2 PiB / 1.9 PiB avail
937
    pgs:     8630330/1498844491 objects misplaced (0.576%)
938
             3162 active+clean
939
             42   active+remapped+backfill_wait
940
             6    active+clean+scrubbing+deep
941
             4    active+remapped+backfilling
942
             1    active+clean+snaptrim
943
 
944
  io:
945
    client:   51 MiB/s rd, 66 MiB/s wr, 1.22 kop/s rd, 1.18 kop/s wr
946

    
947
# ceph pg 11.4 query | jq ".acting,.up,.recovery_state"
948
[
949
  86,
950
  74,
951
  237,
952
  235,
953
  182,
954
  63,
955
  231,
956
  84
957
]
958
[
959
  86,
960
  74,
961
  237,
962
  235,
963
  182,
964
  63,
965
  231,
966
  84
967
]
968
[
969
  {
970
    "name": "Started/Primary/Active",
971
    "enter_time": "2020-08-11 10:53:20.059512",
972
    "might_have_unfound": [],
973
    "recovery_progress": {
974
      "backfill_targets": [],
975
      "waiting_on_backfill": [],
976
      "last_backfill_started": "MIN",
977
      "backfill_info": {
978
        "begin": "MIN",
979
        "end": "MIN",
980
        "objects": []
981
      },
982
      "peer_backfill_info": [],
983
      "backfills_in_flight": [],
984
      "recovering": [],
985
      "pg_backend": {
986
        "recovery_ops": [],
987
        "read_ops": []
988
      }
989
    },
990
    "scrub": {
991
      "scrubber.epoch_start": "0",
992
      "scrubber.active": false,
993
      "scrubber.state": "INACTIVE",
994
      "scrubber.start": "MIN",
995
      "scrubber.end": "MIN",
996
      "scrubber.max_end": "MIN",
997
      "scrubber.subset_last_update": "0'0",
998
      "scrubber.deep": false,
999
      "scrubber.waiting_on_whom": []
1000
    }
1001
  },
1002
  {
1003
    "name": "Started",
1004
    "enter_time": "2020-08-11 10:53:08.947343"
1005
  }
1006
]
1007

    
1008
# ceph pg 11.a query | jq ".acting,.up,.recovery_state"
1009
[
1010
  170,
1011
  156,
1012
  148,
1013
  74,
1014
  234,
1015
  86,
1016
  236,
1017
  232
1018
]
1019
[
1020
  170,
1021
  156,
1022
  292,
1023
  289,
1024
  234,
1025
  86,
1026
  236,
1027
  232
1028
]
1029
[
1030
  {
1031
    "name": "Started/Primary/Active",
1032
    "enter_time": "2020-08-11 10:53:08.842425",
1033
    "might_have_unfound": [],
1034
    "recovery_progress": {
1035
      "backfill_targets": [
1036
        "289(3)",
1037
        "292(2)"
1038
      ],
1039
      "waiting_on_backfill": [],
1040
      "last_backfill_started": "MIN",
1041
      "backfill_info": {
1042
        "begin": "MIN",
1043
        "end": "MIN",
1044
        "objects": []
1045
      },
1046
      "peer_backfill_info": [],
1047
      "backfills_in_flight": [],
1048
      "recovering": [],
1049
      "pg_backend": {
1050
        "recovery_ops": [],
1051
        "read_ops": []
1052
      }
1053
    },
1054
    "scrub": {
1055
      "scrubber.epoch_start": "0",
1056
      "scrubber.active": false,
1057
      "scrubber.state": "INACTIVE",
1058
      "scrubber.start": "MIN",
1059
      "scrubber.end": "MIN",
1060
      "scrubber.max_end": "MIN",
1061
      "scrubber.subset_last_update": "0'0",
1062
      "scrubber.deep": false,
1063
      "scrubber.waiting_on_whom": []
1064
    }
1065
  },
1066
  {
1067
    "name": "Started",
1068
    "enter_time": "2020-08-11 10:53:04.962749"
1069
  }
1070
]
1071

    
1072

    
1073
# This operation got stuck. In other experiments I saw hundreds stuck and
1074
# always needed a mon restart.
1075

    
1076
# ceph daemon mon.ceph-03 ops
1077
{
1078
    "ops": [
1079
        {
1080
            "description": "osd_pgtemp(e193916 {11.43=[236,168,85,86,228,169,60,148]} v193916)",
1081
            "initiated_at": "2020-08-11 10:50:20.712996",
1082
            "age": 310.065493,
1083
            "duration": 310.065506,
1084
            "type_data": {
1085
                "events": [
1086
                    {
1087
                        "time": "2020-08-11 10:50:20.712996",
1088
                        "event": "initiated"
1089
                    },
1090
                    {
1091
                        "time": "2020-08-11 10:50:20.712996",
1092
                        "event": "header_read"
1093
                    },
1094
                    {
1095
                        "time": "2020-08-11 10:50:20.713012",
1096
                        "event": "throttled"
1097
                    },
1098
                    {
1099
                        "time": "2020-08-11 10:50:20.713015",
1100
                        "event": "all_read"
1101
                    },
1102
                    {
1103
                        "time": "2020-08-11 10:50:20.713110",
1104
                        "event": "dispatched"
1105
                    },
1106
                    {
1107
                        "time": "2020-08-11 10:50:20.713113",
1108
                        "event": "mon:_ms_dispatch"
1109
                    },
1110
                    {
1111
                        "time": "2020-08-11 10:50:20.713113",
1112
                        "event": "mon:dispatch_op"
1113
                    },
1114
                    {
1115
                        "time": "2020-08-11 10:50:20.713113",
1116
                        "event": "psvc:dispatch"
1117
                    },
1118
                    {
1119
                        "time": "2020-08-11 10:50:20.713125",
1120
                        "event": "osdmap:preprocess_query"
1121
                    },
1122
                    {
1123
                        "time": "2020-08-11 10:50:20.713155",
1124
                        "event": "forward_request_leader"
1125
                    },
1126
                    {
1127
                        "time": "2020-08-11 10:50:20.713184",
1128
                        "event": "forwarded"
1129
                    }
1130
                ],
1131
                "info": {
1132
                    "seq": 56785507,
1133
                    "src_is_mon": false,
1134
                    "source": "osd.85 192.168.32.69:6820/2538607",
1135
                    "forwarded_to_leader": true
1136
                }
1137
            }
1138
        }
1139
    ],
1140
    "num_ops": 1
1141
}
1142