Project

General

Profile

Bug #10117

OSD crashes if xattr "_" is absent for the file when doing backfill scanning (ReplicatedPG::scan_range)

Added by Guang Yang almost 5 years ago. Updated over 2 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
Start date:
11/16/2014
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

We observed a OSD crash pattern which is due to xattr "_" is absent for the file (on filesystem) which result in an assertion failure, corresponding code snippet is:

void ReplicatedPG::scan_range(...) {
for (vector<hobject_t>::iterator p = ls.begin(); p != ls.end(); ++p) {
    handle.reset_tp_timeout();
    ObjectContextRef obc;
    if (is_primary())
      obc = object_contexts.lookup(*p);
    if (obc) {
      bi->objects[*p] = obc->obs.oi.version;
      dout(20) << "  " << *p << " " << obc->obs.oi.version << dendl;
    } else {
      bufferlist bl;
      int r = pgbackend->objects_get_attr(*p, OI_ATTR, &bl);
      assert(r >= 0);
      ...
    }
...
}

I am not sure if there is a better way to handle such data corruption, but a better way might be to tolerate such corruption and use redundancy to construct the list.

Ceph version: v0.80.4
Platform: RHEL6.5


Related issues

Related to Ceph - Bug #10150: osd/ReplicatedPG.cc: 10853: FAILED assert(r >= 0) (in _scan_range) Resolved 11/20/2014

History

#1 Updated by Samuel Just almost 5 years ago

This should probably be a feature request for the backlog. We need a test reproducing it and some code to tolerate it. I'm not sure what the correct solution is, it might be to delete the object, add it to the divergent_priors set in the log (so that it remains missing), notify the primary, and recover the object. Such machinery is probably how we want to handle corrupt objects in general.

#2 Updated by Wenjun Huang almost 5 years ago

Samuel Just wrote:

This should probably be a feature request for the backlog. We need a test reproducing it and some code to tolerate it. I'm not sure what the correct solution is, it might be to delete the object, add it to the divergent_priors set in the log (so that it remains missing), notify the primary, and recover the object. Such machinery is probably how we want to handle corrupt objects in general.

Hi, Samuel:

I think there is apparent flaw of this code snippet, as data corruption is common case in commodity hardware. If the object's "_" attr corrupts, and when backfilling occurs, the corresponding osd will crash. I reproduce the issue in my cluster when take the below steps:
1. Use XFS'a attr command to remove an object's "_" attr.(The corresponding physical file's attr is "ceph._")
2. Trigger an backfilling of the cluster.
3. There will be osd crash for the attr absence.

In my opinion, there might be some ways to enhance the code:
1. When the "_" attr absence detected, we can repair the object instantly through its replicated peer.
2. We can delete the object, and it will recover in next self-healing triggers(scrub or deep scrub).

#3 Updated by Wenjun Huang almost 5 years ago

Wenjun Huang wrote:

Samuel Just wrote:

This should probably be a feature request for the backlog. We need a test reproducing it and some code to tolerate it. I'm not sure what the correct solution is, it might be to delete the object, add it to the divergent_priors set in the log (so that it remains missing), notify the primary, and recover the object. Such machinery is probably how we want to handle corrupt objects in general.

Hi, Samuel:

I think there is apparent flaw of this code snippet, as data corruption is common case in commodity hardware. If the object's "_" attr corrupts, and when backfilling occurs, the corresponding osd will crash. I reproduce the issue in my cluster when take the below steps:
1. Use XFS'a attr command to remove an object's "_" attr.(The corresponding physical file's attr is "ceph._")
2. Trigger an backfilling of the cluster.
3. There will be osd crash for the attr absence.

In my opinion, there might be some ways to enhance the code:
1. When the "_" attr absence detected, we can repair the object instantly through its replicated peer.
2. We can delete the object, and it will recover in next self-healing triggers(scrub or deep scrub).

I prefer the second way, because the first way seems very complicated, which may bring much effort and hidden more bugs.

#4 Updated by Samuel Just almost 5 years ago

Right, the deleting the object portion is what I was talking about. I think that's the right way.

#5 Updated by Wenjun Huang almost 5 years ago

Samuel Just wrote:

Right, the deleting the object portion is what I was talking about. I think that's the right way.

Hi, Samuel

I have gone through you fix about this bug. [[http://tracker.ceph.com/projects/ceph/repository/revisions/643c173377f9c489c6c035516967c710dc15b8a4/diff/src/osd/ReplicatedPG.cc]]

But I think it is not enough for this issue. Diving into the flow of pgbackend->objects_get_attr(*p, OI_ATTR, &bl), I think ceph can handle two failures itself:

1) ENOENT : may be returned when opening the file. (The fix handled)
2) ENOATTR : may be returned when invoke fgetxattr. (The fix missed, and I think it is the cause of this issue)

So, in my opinion, if we can change the fix like this:

if (r -ENOENT || r -ENOATTR)
continue;

Thank you!

#6 Updated by Wenjun Huang almost 5 years ago

Sorry for my carelessness, I meant:
if (r -ENOENT || r -ENOATTR)
continue;

#7 Updated by Wenjun Huang almost 5 years ago

Wenjun Huang wrote:

Sorry for my carelessness, I meant:
if (r -ENOENT || r -ENOATTR)
continue;

Sorry for my carelessness, I meant:
if (r -ENOENT || r -ENOATTR)
    continue;

#8 Updated by Wenjun Huang almost 5 years ago

I'm sorry, that I forgot use correct format before, so it omitted some character:

if (r == -ENOENT || r == -ENOATTR)
    continue;

#9 Updated by Guang Yang almost 5 years ago

Wenjun Huang wrote:

I'm sorry, that I forgot use correct format before, so it omitted some character:
[...]

Just had a quick chat with Sam:
  1. We should not handle ENOATTR the same way as ENOENT, Sam's fix was to fix the general race.
  2. In order to fix this issue (data corruption), we will need a way to general repair the object, Loic is working on this and we can further discuss.

#10 Updated by Sage Weil over 2 years ago

  • Status changed from New to Won't Fix

Also available in: Atom PDF