Project

General

Profile

Bug #18914

cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)

Added by Zheng Yan 2 months ago. Updated 6 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
02/13/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
Needs Doc:
No


Related issues

Copied to Backport #19675: jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) In Progress
Copied to Backport #19676: kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) New

History

#1 Updated by John Spray 2 months ago

Hmm, so this is happening because volume client creates a pool, then tries to use it as a layout at a time before its libcephfs Client instance's Objecter has seen the osdmap epoch that contains the pool. The Server::check_layout_vxattr logic waits for the client's osdmap epoch before concluding ENOENT -- if it waited for the global latest osdmap epoch then it would work.

I'm not sure the Server behaviour makes sense: setting up pools is almost always an out-of-band thing with respect to cephfs clients, so why would we assume that a client setting a layout xattr has the right osdmap epoch for the newly set up pool?

#2 Updated by Greg Farnum 2 months ago

That's odd; I thought clients validated pools before passing them to the mds. Maybe that's wrong or undesirable for other reasons, though.

#3 Updated by John Spray 2 months ago

Oh yeah, the client does have code that's meant to be doing that, and on the client side it's a wait_for_latest. So don't know why this is happening.

#4 Updated by John Spray 2 months ago

  • Assignee set to Zheng Yan

#5 Updated by Zheng Yan 2 months ago

  • Status changed from New to Verified
  • Assignee deleted (Zheng Yan)

The error is because MDS had outdated osdmap and thought the newly creately pool does not exist. (MDS has code that makes sure its osdmap is the same as or newer than fs client's osdmap) For this case, It seems both mds and fs client had outdated osdmap. Pool creation was through self.rados. self.rados had the newest olsdmap, but self.fs might have outdated osdmap

We saw this type of error before http://www.spinics.net/lists/ceph-devel/msg31902.html. Not sure how to fix it

#6 Updated by John Spray 2 months ago

The thing that's confusing me is that Client::ll_setxattr has this block:

    if (r == -ENOENT) {
      C_SaferCond ctx;
      objecter->wait_for_latest_osdmap(&ctx);
      ctx.wait();
    }

Which should be handling this case, but isn't for some reason.

#7 Updated by Zheng Yan 2 months ago

I think cephfs python bind calls ceph_setxattr instead of ceph_ll_setxattr. There is no such code in Client::setxattr

#8 Updated by John Spray 2 months ago

Of course, you're right.

#9 Updated by Zheng Yan 2 months ago

  • Status changed from Verified to Need Review

#10 Updated by John Spray 6 days ago

  • Status changed from Need Review to Pending Backport
  • Backport set to jewel, kraken

#11 Updated by Nathan Cutler 6 days ago

  • Subject changed from jewel:Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) to cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)

#12 Updated by Nathan Cutler 6 days ago

  • Copied to Backport #19675: jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) added

#13 Updated by Nathan Cutler 6 days ago

  • Copied to Backport #19676: kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) added

Also available in: Atom PDF