Project

General

Profile

Actions

Bug #18914

closed

cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)

Added by Zheng Yan about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #19675: jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)ResolvedNathan CutlerActions
Copied to CephFS - Backport #19676: kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)ResolvedNathan CutlerActions
Actions #1

Updated by John Spray about 7 years ago

Hmm, so this is happening because volume client creates a pool, then tries to use it as a layout at a time before its libcephfs Client instance's Objecter has seen the osdmap epoch that contains the pool. The Server::check_layout_vxattr logic waits for the client's osdmap epoch before concluding ENOENT -- if it waited for the global latest osdmap epoch then it would work.

I'm not sure the Server behaviour makes sense: setting up pools is almost always an out-of-band thing with respect to cephfs clients, so why would we assume that a client setting a layout xattr has the right osdmap epoch for the newly set up pool?

Actions #2

Updated by Greg Farnum about 7 years ago

That's odd; I thought clients validated pools before passing them to the mds. Maybe that's wrong or undesirable for other reasons, though.

Actions #3

Updated by John Spray about 7 years ago

Oh yeah, the client does have code that's meant to be doing that, and on the client side it's a wait_for_latest. So don't know why this is happening.

Actions #4

Updated by John Spray about 7 years ago

  • Assignee set to Zheng Yan
Actions #5

Updated by Zheng Yan about 7 years ago

  • Status changed from New to 12
  • Assignee deleted (Zheng Yan)

The error is because MDS had outdated osdmap and thought the newly creately pool does not exist. (MDS has code that makes sure its osdmap is the same as or newer than fs client's osdmap) For this case, It seems both mds and fs client had outdated osdmap. Pool creation was through self.rados. self.rados had the newest olsdmap, but self.fs might have outdated osdmap

We saw this type of error before http://www.spinics.net/lists/ceph-devel/msg31902.html. Not sure how to fix it

Actions #6

Updated by John Spray about 7 years ago

The thing that's confusing me is that Client::ll_setxattr has this block:

    if (r == -ENOENT) {
      C_SaferCond ctx;
      objecter->wait_for_latest_osdmap(&ctx);
      ctx.wait();
    }

Which should be handling this case, but isn't for some reason.

Actions #7

Updated by Zheng Yan about 7 years ago

I think cephfs python bind calls ceph_setxattr instead of ceph_ll_setxattr. There is no such code in Client::setxattr

Actions #8

Updated by John Spray about 7 years ago

Of course, you're right.

Actions #9

Updated by Zheng Yan about 7 years ago

  • Status changed from 12 to Fix Under Review
Actions #10

Updated by John Spray about 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to jewel, kraken
Actions #11

Updated by Nathan Cutler about 7 years ago

  • Subject changed from jewel:Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) to cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
Actions #12

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #19675: jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) added
Actions #13

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #19676: kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) added
Actions #14

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF