Fix #5268: mds: fix/clean up file size/mtime recovery code - CephFS - Ceph

Actions

Copy link

Fix #5268

closed

mds: fix/clean up file size/mtime recovery code

Added by Sage Weil almost 11 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

High

Assignee:

Zheng Yan

Category:

Performance/Resource Usage

Target version:

Ceph - v13.0.0

% Done:

Source:

Development

Tags:

Backport:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client, MDS, osdc

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

from diagnosing #4832 (see the attached log) it looks like this code needs an overhaul:

i don't think we should be triggering recovery when transitioning from stable states, but explicitly sometime earlier
we should hold a wrlock while gathering, and avoid the maxsize/size force_wrlock flag at the end
we should have well defined behavior for when a client goes stale, resume, stale, etc., and races with file size recovery.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Greg Farnum almost 11 years ago

Updated by Greg Farnum about 9 years ago

From #10875:

A very sparse file of length slightly larger than 1GB had got a few scattered writes when the mds restarted.

Recovery decided to scan all 512 objects from 0 to 2GB.

This takes a very long time on my cluster. Each object stat is taking a few seconds, presumably because of the ongoing migration of data to an EC pool.

The only information the mds logs is the delayed attempts to obtain rdlocks when I access the file.

If we probed multiple objects in parallel, I think it would go much faster, but it's statting only one object at a time, going backwards. Starting the search so far away from the actual size surely doesn't help either.

Regardless, recovery might still take a long time in particularly pathological cases, so it would be nice if the mds would log long-running probe aggregate operations, just as it logs delayed client requests. This would at least give users a clue on what is going on when accessing a file takes a very, very long time.

So: parallel object checks. More visibility about ongoing recovery operations. Unfortunately going backwards from max_size is necessary, since we need to demonstrate that we know the last object. :(

Actions

Copy link