OSDs can not start after shutdown, killed by OOM killer during PGs load
After shutdown all OSDs can not start. During load_pgs stage ceph-osd process consumes all available virtual memory (RAM+swap) so OOM has to kill it.
@root@osd001:~# dpkg -l | grep -i ceph
ii ceph-base 12.2.2-1xenial amd64 common ceph daemon libraries and management tools
ii ceph-common 12.2.2-1xenial amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-fuse 12.2.2-1xenial amd64 FUSE-based client for the Ceph distributed file system
ii ceph-mds 12.2.2-1xenial amd64 metadata server for the ceph distributed file system
ii ceph-osd 12.2.2-1xenial amd64 OSD server for the ceph storage system
ii libcephfs2 12.2.2-1xenial amd64 Ceph distributed file system client library
ii python-cephfs 12.2.2-1xenial amd64 Python 2 libraries for the Ceph libcephfs library
ii python-rados 12.2.2-1xenial amd64 Python 2 libraries for the Ceph librados library
ii python-rbd 12.2.2-1xenial amd64 Python 2 libraries for the Ceph librbd library
ii python-rgw 12.2.2-1xenial amd64 Python 2 libraries for the Ceph librgw library
root@osd001:~# apt-cache policy ceph-osd
Version table: *** 12.2.2-1xenial 1100
1100 https://download.ceph.com/debian-luminous xenial/main amd64 Packages
#3 Updated by Volodymyr Blokhin over 3 years ago
Unfortunately we could not wait so long and re-deployed Ceph cluster on 12/30/2017.
We have managed to start ceph-osd (PG load finished) adding 100Gb to swap on each OSD node.
But we never got PGs online (waited 36 hours) and had to re-deploy the cluster.
Sage Weil wrote:
The mempool dump shows 58GB (!) of pg logs. Can you restart the osd with 'debug bluestore = 20' so we can see if it is reading real, valid log entries?