Project

General

Profile

Actions

Bug #14805

closed

Hadoop tests failing with EPERM

Added by John Spray about 8 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Hadoop/Java
Labels (FS):
Java/Hadoop
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Most recent instance:
http://pulpito.ceph.com/teuthology-2016-02-17_18:12:06-hadoop-jewel---basic-mira/

Here's the earliest instance I could find:
http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-16_22:12:02-hadoop-master---basic-openstack/

Greg & Noah briefly discussed this on ceph qa list for this run
http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-18_22:12:01-hadoop-master---basic-openstack/

Failing on jewel and master consistently, but mixed in with infrastructure issues.

Actions #1

Updated by Zheng Yan about 8 years ago

  • Subject changed from Hadoop tests failing with EACCESS to Hadoop tests failing with EPERM
Actions #2

Updated by Zheng Yan about 8 years ago

  • Status changed from New to Fix Under Review

I have trouble to run the test on local machine, let's try disable client_permissions

https://github.com/ceph/ceph-qa-suite/pull/827

Actions #4

Updated by Greg Farnum about 8 years ago

  • Assignee set to Zheng Yan

Do you have any idea what about the client permission checking is busting Hadoop? We want to fix it properly (or at least band-aid it automatically :p), not just swap the qa suite runs.

Actions #5

Updated by Zheng Yan about 8 years ago

old libcephfs only has permission check for open. Now, It has full permission checks (open, lookup, setattr ...)

Actions #6

Updated by John Spray about 8 years ago

Maybe this has same issue as the python libcephfs tests did, they were creating files with mode 0, which used to work

Actions #7

Updated by Greg Farnum about 8 years ago

Wait, can you expand on that John? I wasn't really looking at the python tests, although I know it involved root ownership — didn't you just start running them as sudo?
I think that should generally be VFS-controlled, not something in our Client environment (I mean, I know we're checking mode and uid now, but the VFS is also gating on those first, so if we're disagreeing it's probably a bug?).

Actions #8

Updated by John Spray about 8 years ago

Greg Farnum wrote:

Wait, can you expand on that John? I wasn't really looking at the python tests, although I know it involved root ownership — didn't you just start running them as sudo?

Yeah, it was a s/nosetests/sudo nosetests/

I think that should generally be VFS-controlled, not something in our Client environment (I mean, I know we're checking mode and uid now, but the VFS is also gating on those first, so if we're disagreeing it's probably a bug?).

Right, but there's no VFS in the libcephfs python tests or in hadoop.

On this subject, the C libcephfs API has the ll functions (used by ganesha) that enable it to pass through UIDs. But the python API and the 'normal' C API doesn't have that, and just reads the UID from the environment. I'm thinking we maybe want to add uid/gid to the ceph_create function so that users are explicitly picking UID (since getting it from environment was not enforced anyway)

Actions #9

Updated by Greg Farnum about 8 years ago

d'oh, right. Okay, I get the problem now. I've run this through a couple of times in my latest integration branch, btw, and it seems fine (although we're still hitting a failure in hadoop sometimes; need to dig into that).

Actions #10

Updated by Greg Farnum about 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #11

Updated by Greg Farnum almost 8 years ago

  • Component(FS) Hadoop/Java added
Actions #12

Updated by Patrick Donnelly about 5 years ago

  • Category deleted (48)
  • Labels (FS) Java/Hadoop added
Actions

Also available in: Atom PDF