Bug #52261
OSD takes all memory and crashes, after pg_num increase
0%
Description
After increasing a pool pg_num from 256 to 512, all osds are down.
On startup, they take all of the memory. After stopping the oom-killer and increasing the swap memory, the OSD eventually dies with this stacktrace.
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
1: (()+0x12b20) [0x7f258613db20]
2: (pthread_kill()+0x35) [0x7f258613a8d5]
3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x258) [0x55f84c76e808]
4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned long, unsigned long)+0x262) [0x55f84c76ee52]
5: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x7a3) [0x55f84c15d613]
6: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) [0x55f84c15f2d4]
7: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x55f84c391396]
8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x55f84c15217f]
9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55f84c7906f4]
10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55f84c793354]
11: (()+0x814a) [0x7f258613314a]
12: (clone()+0x43) [0x7f2584e63dc3]
Related issues
History
#1 Updated by Josh Durgin over 2 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD)
Can you attach a 'ceph osd dump' and 'ceph pg dump', plus a log of one of the osds starting leading up to the crash with debug_osd=20 debug_bluestore=10 debug_ms=1 set for that osd in ceph.conf?
#2 Updated by Neha Ojha over 2 years ago
- Status changed from New to Need More Info
#3 Updated by Loïc Dachary over 2 years ago
- Target version deleted (
v15.2.15)
#4 Updated by Aldo Briessmann over 2 years ago
Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 16.2.6 did unfortunately not change anything. According to the logs, the issue started when the active Auto-PG-Scaler decided to split a RBD pool with 64 PGs to 256 PGs a few days after 2-3 large files were loaded into a volume within the pool. Our cluster is rather small, 4 servers with two 4TB disks and 16GB of RAM each. While attempting to fix, among others we tried activating zram and zswap, with which we could go up to 256GB(!), and even then the OSD daemon would attempt to allocate all that storage (indicating that it's all 0s) until the sytem was out of memory and the OOM-killer stopped the OSD daemon again.
The only solution to get the cluster into a usable state again was to remove the problematic PGs from all OSDs, which of course resulted in data loss. Fortunately, we had daily backups of all important files outside the Ceph cluster, so we did not loose much data.
Unfortunately this incident happened a few weeks back (it's a student project), but if you are interested in any specific logs, I may be able to provide them.
#5 Updated by Neha Ojha over 2 years ago
Aldo Briessmann wrote:
Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 16.2.6 did unfortunately not change anything. According to the logs, the issue started when the active Auto-PG-Scaler decided to split a RBD pool with 64 PGs to 256 PGs a few days after 2-3 large files were loaded into a volume within the pool. Our cluster is rather small, 4 servers with two 4TB disks and 16GB of RAM each. While attempting to fix, among others we tried activating zram and zswap, with which we could go up to 256GB(!), and even then the OSD daemon would attempt to allocate all that storage (indicating that it's all 0s) until the sytem was out of memory and the OOM-killer stopped the OSD daemon again.
The only solution to get the cluster into a usable state again was to remove the problematic PGs from all OSDs, which of course resulted in data loss. Fortunately, we had daily backups of all important files outside the Ceph cluster, so we did not loose much data.
Unfortunately this incident happened a few weeks back (it's a student project), but if you are interested in any specific logs, I may be able to provide them.
Will you be able provide the information requested in https://tracker.ceph.com/issues/52261#note-1?
#6 Updated by Dan van der Ster about 2 years ago
- Related to Bug #53729: ceph-osd takes all memory before oom on boot added