Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage - RADOS - Ceph

Actions

Copy link

Bug #22050

closed

ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage

Added by mingxin liu over 6 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Josh Durgin

Category:

Performance/Resource Usage

Target version:

% Done:

Source:

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

we use rbd discard api to zero the whole range of a very big volume. many extents of this volume yet to be written, before discard operation, so these extents map to those nonexistent object, when osd execute DELETE op initiated by rbd_discard for those objects will got an -ENOENT. for dup op detection sake, it record an ERROR log by a particular io path(see PrimaryLogPG::submit_log_entries) which didnot update last_complete_ondisk. In normal pglog update path, slave will update its last_complete_ondisk(ReplicatedBackend::sub_op_commit) and inform primary(ReplicatedBackend::sub_op_modify_reply), then primary use min_last_complete_ondisk as lowwer boundary to trim pglog, restrain its number. so, if a PG continuously receive this kind of DELETE op, with no successfull write occur meanwhile, primary has no change to update min_last_complete_ondisk to trim pglog, these ERROR type entries keep accumulating.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Greg Farnum over 6 years ago

Project changed from Ceph to RADOS
Subject changed from ERROR type entries of pglog cannot be trimmed timely caused a large memory usage to ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage
Category changed from OSD to Performance/Resource Usage
Component(RADOS) OSD added

This one's tricky; I'm not sure we want to trim based on error entries in the general case. If a broken client submits an error op constantly, it could go through much more quickly than real ops and trimming based on that might cause issues if an OSD is rebooting at the same time...

Actions

Copy link

Updated by Greg Farnum over 6 years ago

Status changed from New to Triaged

Actions

Copy link

Updated by Greg Farnum over 6 years ago

Josh thinks we still want to trim since it's a write to disk.

Actions

Copy link

Updated by Josh Durgin about 6 years ago

Priority changed from Normal to Urgent

This seems to be biting rgw's usage pools when rgw-admin usage trim occurs in pgs with little other activity.

Actions

Copy link

Updated by Orit Wasserman about 6 years ago

Related to Bug #22963: radosgw-admin usage show loops indefinitly - again added

Actions

Copy link

Updated by Josh Durgin about 6 years ago

Assignee set to Josh Durgin

running a fix through testing http://pulpito.ceph.com/joshd-2018-03-09_00:39:29-rados-wip-pg-log-trim-errors-distro-basic-smithi

Actions

Copy link

Updated by Josh Durgin about 6 years ago

Status changed from Triaged to Fix Under Review
Backport set to luminous

https://github.com/ceph/ceph/pull/20827

Backport only needed to luminous since error pg log entries did not exist before that.

Actions

Copy link

Updated by Josh Durgin about 6 years ago

Copied to Backport #23323: luminous: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage added

Actions

Copy link