Bug #23439
Crashing OSDs after 'ceph pg repair'
0%
Description
Yesterday, ceph reported scrub errors.
cluster [ERR] overall HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
After further investigation we found pg 0.103 is the placement group in question and tried to repair it:
ceph1:~# ceph pg repair 0.103 instructing pg 0.103 on osd.11 to repair
After a while osd.11 crashed. Please find the corresponding log file under http://af.janno.io/ceph-osd.11.log.1.gz
This morning 5 more OSDs had been crashed:
root@head1:~# ceph -s cluster: id: c59e56df-2043-4c92-9492-25f05f268d9f health: HEALTH_ERR 29756/10190445 objects misplaced (0.292%) 5 scrub errors Possible data damage: 1 pg inconsistent Degraded data redundancy: 30563/10190445 objects degraded (0.300%), 10 pgs degraded, 10 pgs undersized services: mon: 3 daemons, quorum head1,head2,head3 mgr: head3(active), standbys: head1, head2 osd: 40 osds: 34 up, 34 in; 10 remapped pgs data: pools: 1 pools, 768 pgs objects: 3317k objects, 12158 GB usage: 37814 GB used, 89230 GB / 124 TB avail pgs: 30563/10190445 objects degraded (0.300%) 29756/10190445 objects misplaced (0.292%) 758 active+clean 5 active+undersized+degraded+remapped+backfill_wait 4 active+undersized+degraded+remapped+backfilling 1 active+undersized+degraded+remapped+inconsistent+backfill_wait io: client: 46312 kB/s rd, 7226 kB/s wr, 378 op/s rd, 44 op/s wr recovery: 57960 kB/s, 14 objects/s
root@head1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 145.93274 root default -2 29.08960 host ceph1 0 hdd 3.63620 osd.0 up 1.00000 1.00000 1 hdd 3.63620 osd.1 down 0 1.00000 2 hdd 3.63620 osd.2 up 1.00000 1.00000 3 hdd 3.63620 osd.3 up 1.00000 1.00000 4 hdd 3.63620 osd.4 up 1.00000 1.00000 5 hdd 3.63620 osd.5 up 1.00000 1.00000 6 hdd 3.63620 osd.6 up 1.00000 1.00000 7 hdd 3.63620 osd.7 up 1.00000 1.00000 -3 29.13217 host ceph2 8 hdd 3.63620 osd.8 down 0 1.00000 9 hdd 3.63620 osd.9 down 0 1.00000 10 hdd 3.63620 osd.10 down 0 1.00000 11 hdd 3.65749 osd.11 down 0 1.00000 12 hdd 3.63620 osd.12 up 1.00000 1.00000 13 hdd 3.65749 osd.13 up 1.00000 1.00000 14 hdd 3.63620 osd.14 up 1.00000 1.00000 15 hdd 3.63620 osd.15 up 1.00000 1.00000 -4 29.11258 host ceph3 16 hdd 3.63620 osd.16 up 1.00000 1.00000 18 hdd 3.63620 osd.18 up 1.00000 1.00000 19 hdd 3.63620 osd.19 up 1.00000 1.00000 20 hdd 3.65749 osd.20 up 1.00000 1.00000 21 hdd 3.63620 osd.21 up 1.00000 1.00000 22 hdd 3.63620 osd.22 up 1.00000 1.00000 23 hdd 3.63620 osd.23 up 1.00000 1.00000 24 hdd 3.63789 osd.24 up 1.00000 1.00000 -9 29.29919 host ceph4 17 hdd 3.66240 osd.17 up 1.00000 1.00000 25 hdd 3.66240 osd.25 up 1.00000 1.00000 26 hdd 3.66240 osd.26 up 1.00000 1.00000 27 hdd 3.66240 osd.27 up 1.00000 1.00000 28 hdd 3.66240 osd.28 down 0 1.00000 29 hdd 3.66240 osd.29 up 1.00000 1.00000 30 hdd 3.66240 osd.30 up 1.00000 1.00000 31 hdd 3.66240 osd.31 up 1.00000 1.00000 -11 29.29919 host ceph5 32 hdd 3.66240 osd.32 up 1.00000 1.00000 33 hdd 3.66240 osd.33 up 1.00000 1.00000 34 hdd 3.66240 osd.34 up 1.00000 1.00000 35 hdd 3.66240 osd.35 up 1.00000 1.00000 36 hdd 3.66240 osd.36 up 1.00000 1.00000 37 hdd 3.66240 osd.37 up 1.00000 1.00000 38 hdd 3.66240 osd.38 up 1.00000 1.00000 39 hdd 3.66240 osd.39 up 1.00000 1.00000
History
#1 Updated by Jan Marquardt about 6 years ago
With #23258 we already had a similar issue and I am wondering if this is something you always have to expect with Ceph or if this might be an hardware/setup issue or something completely different.
#2 Updated by Jan Marquardt about 6 years ago
And the next three OSDs crashed:
root@head1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 145.93274 root default -2 29.08960 host ceph1 0 hdd 3.63620 osd.0 up 1.00000 1.00000 1 hdd 3.63620 osd.1 down 0 1.00000 2 hdd 3.63620 osd.2 up 1.00000 1.00000 3 hdd 3.63620 osd.3 up 1.00000 1.00000 4 hdd 3.63620 osd.4 up 1.00000 1.00000 5 hdd 3.63620 osd.5 down 0 1.00000 6 hdd 3.63620 osd.6 up 1.00000 1.00000 7 hdd 3.63620 osd.7 up 1.00000 1.00000 -3 29.13217 host ceph2 8 hdd 3.63620 osd.8 down 0 1.00000 9 hdd 3.63620 osd.9 down 0 1.00000 10 hdd 3.63620 osd.10 down 0 1.00000 11 hdd 3.65749 osd.11 down 0 1.00000 12 hdd 3.63620 osd.12 up 1.00000 1.00000 13 hdd 3.65749 osd.13 up 1.00000 1.00000 14 hdd 3.63620 osd.14 up 1.00000 1.00000 15 hdd 3.63620 osd.15 up 1.00000 1.00000 -4 29.11258 host ceph3 16 hdd 3.63620 osd.16 up 1.00000 1.00000 18 hdd 3.63620 osd.18 up 1.00000 1.00000 19 hdd 3.63620 osd.19 up 1.00000 1.00000 20 hdd 3.65749 osd.20 up 1.00000 1.00000 21 hdd 3.63620 osd.21 up 1.00000 1.00000 22 hdd 3.63620 osd.22 up 1.00000 1.00000 23 hdd 3.63620 osd.23 up 1.00000 1.00000 24 hdd 3.63789 osd.24 up 1.00000 1.00000 -9 29.29919 host ceph4 17 hdd 3.66240 osd.17 up 1.00000 1.00000 25 hdd 3.66240 osd.25 up 1.00000 1.00000 26 hdd 3.66240 osd.26 up 1.00000 1.00000 27 hdd 3.66240 osd.27 up 1.00000 1.00000 28 hdd 3.66240 osd.28 down 0 1.00000 29 hdd 3.66240 osd.29 up 1.00000 1.00000 30 hdd 3.66240 osd.30 up 1.00000 1.00000 31 hdd 3.66240 osd.31 up 1.00000 1.00000 -11 29.29919 host ceph5 32 hdd 3.66240 osd.32 down 0 1.00000 33 hdd 3.66240 osd.33 up 1.00000 1.00000 34 hdd 3.66240 osd.34 up 1.00000 1.00000 35 hdd 3.66240 osd.35 up 1.00000 1.00000 36 hdd 3.66240 osd.36 down 0 1.00000 37 hdd 3.66240 osd.37 up 1.00000 1.00000 38 hdd 3.66240 osd.38 up 1.00000 1.00000 39 hdd 3.66240 osd.39 up 1.00000 1.00000
#3 Updated by Jan Marquardt about 6 years ago
And the next three ,,,
root@head1:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 145.93274 root default -2 29.08960 host ceph1 0 hdd 3.63620 osd.0 up 1.00000 1.00000 1 hdd 3.63620 osd.1 down 0 1.00000 2 hdd 3.63620 osd.2 up 1.00000 1.00000 3 hdd 3.63620 osd.3 up 1.00000 1.00000 4 hdd 3.63620 osd.4 up 1.00000 1.00000 5 hdd 3.63620 osd.5 down 0 1.00000 6 hdd 3.63620 osd.6 up 1.00000 1.00000 7 hdd 3.63620 osd.7 up 1.00000 1.00000 -3 29.13217 host ceph2 8 hdd 3.63620 osd.8 down 0 1.00000 9 hdd 3.63620 osd.9 down 0 1.00000 10 hdd 3.63620 osd.10 down 0 1.00000 11 hdd 3.65749 osd.11 down 0 1.00000 12 hdd 3.63620 osd.12 up 1.00000 1.00000 13 hdd 3.65749 osd.13 down 0 1.00000 14 hdd 3.63620 osd.14 up 1.00000 1.00000 15 hdd 3.63620 osd.15 up 1.00000 1.00000 -4 29.11258 host ceph3 16 hdd 3.63620 osd.16 up 1.00000 1.00000 18 hdd 3.63620 osd.18 up 1.00000 1.00000 19 hdd 3.63620 osd.19 up 1.00000 1.00000 20 hdd 3.65749 osd.20 down 0 1.00000 21 hdd 3.63620 osd.21 up 1.00000 1.00000 22 hdd 3.63620 osd.22 up 1.00000 1.00000 23 hdd 3.63620 osd.23 up 1.00000 1.00000 24 hdd 3.63789 osd.24 up 1.00000 1.00000 -9 29.29919 host ceph4 17 hdd 3.66240 osd.17 up 1.00000 1.00000 25 hdd 3.66240 osd.25 up 1.00000 1.00000 26 hdd 3.66240 osd.26 up 1.00000 1.00000 27 hdd 3.66240 osd.27 up 1.00000 1.00000 28 hdd 3.66240 osd.28 down 0 1.00000 29 hdd 3.66240 osd.29 up 1.00000 1.00000 30 hdd 3.66240 osd.30 up 1.00000 1.00000 31 hdd 3.66240 osd.31 up 1.00000 1.00000 -11 29.29919 host ceph5 32 hdd 3.66240 osd.32 down 0 1.00000 33 hdd 3.66240 osd.33 down 1.00000 1.00000 34 hdd 3.66240 osd.34 up 1.00000 1.00000 35 hdd 3.66240 osd.35 up 1.00000 1.00000 36 hdd 3.66240 osd.36 down 0 1.00000 37 hdd 3.66240 osd.37 up 1.00000 1.00000 38 hdd 3.66240 osd.38 up 1.00000 1.00000 39 hdd 3.66240 osd.39 up 1.00000 1.00000
#4 Updated by Greg Farnum almost 6 years ago
That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear what happened, but scrub errors are usually rooted in some kind of hardware issue. They may be propagating just because everything's corrupted but only a small set of your OSDs can get scrub reservations at once, or there may be a piece of broken metadata which is getting moved around, but it's not something likely to be solved in an urgent fashion on the tracker.
#5 Updated by Jan Marquardt almost 6 years ago
Hi Greg,
thanks for your response.
That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
Yeah, damn, your right, sorry. I didn't know ceph-post-file so far. I just uploaded the file: d15bc45c-eb2a-4d92-8d57-94288f0b5490
It's not clear what happened, but scrub errors are usually rooted in some kind of hardware issue. They may be propagating just because everything's corrupted but only a small set of your OSDs can get scrub reservations at once, or there may be a piece of broken metadata which is getting moved around, but it's not something likely to be solved in an urgent fashion on the tracker.
So the mailing list would be better to get support for such cases?
We also suspect at the moment that the hardware is the problem in some way and will therefore probably replace it with new ones.
#6 Updated by Greg Farnum almost 6 years ago
You'll definitely get more attention and advice if somebody else has hit this issue before.