Project

General

Profile

Bug #52261

OSD takes all memory and crashes, after pg_num increase

Added by Marius Leustean over 2 years ago. Updated over 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
osd
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After increasing a pool pg_num from 256 to 512, all osds are down.

On startup, they take all of the memory. After stopping the oom-killer and increasing the swap memory, the OSD eventually dies with this stacktrace.

ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
1: (()+0x12b20) [0x7f258613db20]
2: (pthread_kill()+0x35) [0x7f258613a8d5]
3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x258) [0x55f84c76e808]
4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned long, unsigned long)+0x262) [0x55f84c76ee52]
5: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x7a3) [0x55f84c15d613]
6: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) [0x55f84c15f2d4]
7: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x55f84c391396]
8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x55f84c15217f]
9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55f84c7906f4]
10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55f84c793354]
11: (()+0x814a) [0x7f258613314a]
12: (clone()+0x43) [0x7f2584e63dc3]

Related issues

Related to RADOS - Bug #53729: ceph-osd takes all memory before oom on boot Resolved

History

#1 Updated by Josh Durgin over 2 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)

Can you attach a 'ceph osd dump' and 'ceph pg dump', plus a log of one of the osds starting leading up to the crash with debug_osd=20 debug_bluestore=10 debug_ms=1 set for that osd in ceph.conf?

#2 Updated by Neha Ojha over 2 years ago

  • Status changed from New to Need More Info

#3 Updated by Loïc Dachary over 2 years ago

  • Target version deleted (v15.2.15)

#4 Updated by Aldo Briessmann over 2 years ago

Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 16.2.6 did unfortunately not change anything. According to the logs, the issue started when the active Auto-PG-Scaler decided to split a RBD pool with 64 PGs to 256 PGs a few days after 2-3 large files were loaded into a volume within the pool. Our cluster is rather small, 4 servers with two 4TB disks and 16GB of RAM each. While attempting to fix, among others we tried activating zram and zswap, with which we could go up to 256GB(!), and even then the OSD daemon would attempt to allocate all that storage (indicating that it's all 0s) until the sytem was out of memory and the OOM-killer stopped the OSD daemon again.

The only solution to get the cluster into a usable state again was to remove the problematic PGs from all OSDs, which of course resulted in data loss. Fortunately, we had daily backups of all important files outside the Ceph cluster, so we did not loose much data.

Unfortunately this incident happened a few weeks back (it's a student project), but if you are interested in any specific logs, I may be able to provide them.

#5 Updated by Neha Ojha over 2 years ago

Aldo Briessmann wrote:

Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 16.2.6 did unfortunately not change anything. According to the logs, the issue started when the active Auto-PG-Scaler decided to split a RBD pool with 64 PGs to 256 PGs a few days after 2-3 large files were loaded into a volume within the pool. Our cluster is rather small, 4 servers with two 4TB disks and 16GB of RAM each. While attempting to fix, among others we tried activating zram and zswap, with which we could go up to 256GB(!), and even then the OSD daemon would attempt to allocate all that storage (indicating that it's all 0s) until the sytem was out of memory and the OOM-killer stopped the OSD daemon again.

The only solution to get the cluster into a usable state again was to remove the problematic PGs from all OSDs, which of course resulted in data loss. Fortunately, we had daily backups of all important files outside the Ceph cluster, so we did not loose much data.

Unfortunately this incident happened a few weeks back (it's a student project), but if you are interested in any specific logs, I may be able to provide them.

Will you be able provide the information requested in https://tracker.ceph.com/issues/52261#note-1?

#6 Updated by Dan van der Ster about 2 years ago

  • Related to Bug #53729: ceph-osd takes all memory before oom on boot added

Also available in: Atom PDF