Project

General

Profile

Actions

Bug #7779

closed

osd: object file can have too many xattrs, get E2BIG

Added by Sage Weil about 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

if an object has too many xattrs on it, you get E2BIG from listxattr. one such object:

llistxattr("./DIR_B/DIR_C/DIR_D/DIR_A/DIR_4/redacted__head_F514ADCB__5", 0x7fffb9b20150, 65536) = -1 E2BIG (Argument list too long)

this causes backfill to fall over because teh object_info_t _ attr isn't in the getattrs list. (strangely it doesn't error out before that from teh E2BIG)

fixed this manually be getting the ceph._ and ceph.snapset, copying content to a new file, and setting those attrs. this let backfill proceed, although the user object is damaged (lost attrs)

Actions #1

Updated by Greg Farnum about 10 years ago

This exact bug is why we made the changes to xattr handling and no longer store unlimited numbers in the filesystem. I think it's been back ported everywhere it should be; what version was this observed on?

Actions #2

Updated by Sage Weil about 10 years ago

Yep. This particular object was written back in November, though, so it predates the fix by some time (and in fact may have even been before dumpling? I can't remember).

Actions #3

Updated by Greg Farnum about 10 years ago

Sounds like this is a manual repair job to me, then...
I guess we could write tools that know all the xattr name patterns to look for which extract them into a file or something, but a user would still need to do all the cleanup work.

Actions #4

Updated by Sage Weil about 10 years ago

good news: scrub on a pg with an object with too many xattrs:

2014-03-24 16:40:42.632996 osd.2 [ERR] 2.14 shard (5,255): soid 27292a34/passwd/head//2 missing attr _, missing attr snapset
2014-03-24 16:40:42.633116 osd.2 [ERR] 2.14 scrub 0 missing, 1 inconsistent objects

(good because the osd doesn't crash or anything and we can go back and repair these without breaking a running system)

Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from New to 12

wip-7779 has a reproducer program, and a 'salvage' target that will copy the known important ceph xattrs to a replacement object (but lose all the user's rados attrs).

Actions #6

Updated by Sage Weil about 10 years ago

actually, the salvage tool is useless. since we are throwing out the rados user attrs, we can just do

rados -p <pool> get object /tmp/foo
rados -p <pool> rm object
rados -p <pool> put object /tmp/foo

(as long as there is no omap data!)

Actions #7

Updated by Greg Farnum about 10 years ago

What we'll actually need to repair these objects will involve pulling the rgw xattrs out as well as the RADOS ones — without those manifest xattrs you might as well just delete the object.

Actions #8

Updated by Ian Colle about 10 years ago

  • Priority changed from Urgent to High
Actions #9

Updated by Sage Weil almost 10 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF