Project

General

Profile

Actions

Bug #37289

open

Issue with overfilled OSD for cache-tier pools

Added by Oleksandr Mykhalskyi over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have bad issue in our ceph cluster.

Centos 7.5 (3.10.0-862.3.2.el7.x86_64)
Luminous 12.2.5, bluestore OSDs, using cache-tier feature
Openstack Pike (qemu-kvm-ev-2.10.0, libvirt-daemon-3.9.0-14)
Affected clients in the cloud have various OS (Centos 7.2, Centos 7.5, Redhat 6.7)

When one of OSD (device class ssd, where our cache tier pools located) reached 95% utilization, certainly all cache tier pools became blocked. I added more OSDs to resolve this overflow and expected, that clients will be unfreezed and continue to work, like in case of overflow of regular replicated pools (or reaching quota for replicated pool).
But not…

Clients stayed in hanged state, we have to reboot them, after reboot there were errors like:
[ 9.551419] blk_update_request: I/O error, dev vdb, sector 20973600
[ 9.555494] Buffer I/O error on device vdb2, logical block 4
[ 9.559532] lost page write due to I/O error on vdb2

We fixed it by “rbd object-map rebuild” for affected volumes.

From ceph documentation:
“When a pool quota is reached, librados operations now block indefinitely, the same way they do when the cluster fills up. (Previously they would return -ENOSPC.)
By default, a full cluster or pool will now block. If your librados application can handle ENOSPC or EDQUOT errors gracefully,
you can get error returns instead by using the new librados OPERATION_FULL_TRY flag”

It seems, that for cache-tier pools this behavior doesn’t work?

Details of my test ceph cluster, created to reproduce the issue – in the attachment

P.S. I tried this case on Luminous 12.2.9 – the same results.

Thank you


Files

details_ceph_cluster.txt (5.67 KB) details_ceph_cluster.txt Oleksandr Mykhalskyi, 11/16/2018 12:43 PM
Actions

Also available in: Atom PDF