Bug #47929
openHuge RAM Usage on OSD recovery
0%
Description
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state because many OSD can recover
itself because the kernel kill it by OOM. Even now one OSD that was OK,
go down by OOM killed.
Even in a server with 32GB RAM the OSD use ALL that and never recover, i
think that can be a memory leak, ceph version octopus 15.2.3
In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is down
because that.
this https://pastebin.pl/view/59089adc is almost the OSD going to be killed by OOM
pglog trimmed OK, but has the same behavior, log:
https://pastebin.ubuntu.com/p/dwbXtX7wTP/
Updated by Neha Ojha over 3 years ago
- Project changed from Ceph to RADOS
Can you please provide the output of "ceph -s" and "ceph pg dump"?
Updated by Luis Felipe Domínguez Vega over 3 years ago
ceph -s: https://pastebin.ubuntu.com/p/3rjd435Sdh/
ceph pg dump: https://pastebin.ubuntu.com/p/THsSd2J33s/
Updated by Luis Felipe Domínguez Vega over 3 years ago
Well try with:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-<osd id> --pgid "<stuck_pg get from log>" --force --op remove
and now the OSD is running, so there are something wrong with that's PG
Updated by Luis Felipe Domínguez Vega over 3 years ago
Changed and used the
--op export-removeand then
--op importof ceph-objectstore-tool for the failing PG and now the OSD is running great.
Updated by Luis Felipe Domínguez Vega over 3 years ago
there are some extrange behavior because now in another failing OSD not work at all and i execute the export-remove and import again. The file is about 17 Gb of data
Updated by Luis Felipe Domínguez Vega over 3 years ago
Nop, not work the export-import behavior, because on recover, when need to recover that PG then OOM killed
Updated by Neha Ojha over 3 years ago
Can you export and upload a copy the problematic PG via ceph-post-file?
Updated by Luis Felipe Domínguez Vega over 3 years ago
Neha Ojha wrote:
Can you export and upload a copy the problematic PG via ceph-post-file?
there are differents PGs on differents OSD... i will try with one, but the PGs size is about 17Gb
Updated by Luis Felipe Domínguez Vega over 3 years ago
Neha Ojha wrote:
Can you export and upload a copy the problematic PG via ceph-post-file?
ceph-post-file: 7639cc22-79eb-426d-ac79-e704bc34ef0f