Bug #41037: Containerized cluster failure due to osd_memory_target not being set to ratio of cgroup_limit per osd_memory_target_cgroup_limit_ratio - bluestore - Ceph

Actions

Copy link

Bug #41037

closed

Containerized cluster failure due to osd_memory_target not being set to ratio of cgroup_limit per osd_memory_target_cgroup_limit_ratio

Added by Dustin Black almost 5 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Target version:

Ceph - v14.2.3

% Done:

Source:

Tags:

Backport:

nautilus

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Under heavy I/O workload (generated to multiple postgres databaseses, backed by Ceph RBD, via the pgbench utility), we have experienced a condition where the memory limit for the OSD pods is reached, OOM kills happen, and the cluster then becomes unresponsive. This is a critical failure caused apparently by the osd_memory_target parameter not being sized with some headroom below the cgroup_limit. Mark Nelson has repeatedly said that 10-20% headroom was needed to ensure that the OSD could trim its memory before the OOM killer was triggered.

Note that because Bluestore is computing the osd_memory_target for us based on the CGroup limit, there is no way to override this, so this is a high priority problem!

Deploying containerized Ceph via Rook.io, we apply a memory limit to the OSDs via the cluster.yaml, but not a memory request.

  resources:
    osd:
      requests:
        cpu: "2" 
      limits:
        cpu: "2" 
        memory: "6Gi"

Per fc3bdad [1], our expectation is that, without a memory request, the value of osd_memory_target should default to cgroup_limit * osd_memory_target_cgroup_limit_ratio (0.8 default). However, the deployed cluster shows ...

sh-4.2# ceph daemon /var/lib/rook/osd0/ceph-osd.0.asok config show | grep osd_memory
    "osd_memory_base": "805306368",
    "osd_memory_cache_min": "134217728",
    "osd_memory_cache_resize_interval": "1.000000",
    "osd_memory_expected_fragmentation": "0.150000",
    "osd_memory_target": "6442450944",  <-- Target = limit instead of = (limit*0.8)
    "osd_memory_target_cgroup_limit_ratio": "0.800000",  <-- Ratio looks correct

  cluster:
    id:     ffa396e4-7874-472d-95fa-692d754f5e6e
    health: HEALTH_ERR
            1 MDSs report slow metadata IOs
            2 osds down
            15/76461 objects unfound (0.020%)
            Reduced data availability: 309 pgs inactive, 309 pgs peering, 103 pgs stale
            Possible data damage: 10 pgs recovery_unfound
            Degraded data redundancy: 73406/229339 objects degraded (32.008%), 669 pgs degraded, 644 pgs undersized

  services:
    mon: 3 daemons, quorum a,b,c (age 2w)
    mgr: a(active, since 2d)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 12 osds: 6 up (since 24m), 8 in (since 23m); 104 remapped pgs

  data:
    pools:   3 pools, 1224 pgs
    objects: 76.46k objects, 296 GiB
    usage:   638 GiB used, 11 TiB / 12 TiB avail
    pgs:     25.245% pgs not active
             73406/229339 objects degraded (32.008%)
             15/76461 objects unfound (0.020%)
             548 active+undersized+degraded
             253 peering
             145 active+clean
             100 stale+active+clean
             56  remapped+peering
             40  active+recovery_wait+undersized+degraded
             36  active+undersized+degraded+remapped+backfill_wait
             23  active+recovery_wait+degraded
             9   active+recovery_wait+undersized+degraded+remapped
             6   active+recovery_unfound+undersized+degraded+remapped
             4   active+recovery_unfound+undersized+degraded
             2   stale+active+recovery_wait+degraded
             1   stale+active+undersized+degraded
             1   active+recovery_wait

[1] https://github.com/ceph/ceph/commit/fc3bdad87597066a813a3734b2a79e803340be36#diff-a9faffcf40600fd57aea5451cef5abe9

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #41037

Containerized cluster failure due to osd_memory_target not being set to ratio of cgroup_limit per osd_memory_target_cgroup_limit_ratio

Updated by Neha Ojha almost 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Ben England almost 5 years ago

Updated by Ben England almost 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Josh Durgin almost 5 years ago

Updated by Ben England over 4 years ago

Updated by Mark Nelson over 4 years ago

Updated by Sage Weil over 4 years ago

Updated by Sage Weil over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Josh Durgin over 4 years ago