Bug #7779
closed
osd: object file can have too many xattrs, get E2BIG
Added by Sage Weil about 10 years ago.
Updated almost 10 years ago.
Description
if an object has too many xattrs on it, you get E2BIG from listxattr. one such object:
llistxattr("./DIR_B/DIR_C/DIR_D/DIR_A/DIR_4/redacted__head_F514ADCB__5", 0x7fffb9b20150, 65536) = -1 E2BIG (Argument list too long)
this causes backfill to fall over because teh object_info_t _ attr isn't in the getattrs list. (strangely it doesn't error out before that from teh E2BIG)
fixed this manually be getting the ceph._ and ceph.snapset, copying content to a new file, and setting those attrs. this let backfill proceed, although the user object is damaged (lost attrs)
This exact bug is why we made the changes to xattr handling and no longer store unlimited numbers in the filesystem. I think it's been back ported everywhere it should be; what version was this observed on?
Yep. This particular object was written back in November, though, so it predates the fix by some time (and in fact may have even been before dumpling? I can't remember).
Sounds like this is a manual repair job to me, then...
I guess we could write tools that know all the xattr name patterns to look for which extract them into a file or something, but a user would still need to do all the cleanup work.
good news: scrub on a pg with an object with too many xattrs:
2014-03-24 16:40:42.632996 osd.2 [ERR] 2.14 shard (5,255): soid 27292a34/passwd/head//2 missing attr _, missing attr snapset
2014-03-24 16:40:42.633116 osd.2 [ERR] 2.14 scrub 0 missing, 1 inconsistent objects
(good because the osd doesn't crash or anything and we can go back and repair these without breaking a running system)
- Status changed from New to 12
wip-7779 has a reproducer program, and a 'salvage' target that will copy the known important ceph xattrs to a replacement object (but lose all the user's rados attrs).
actually, the salvage tool is useless. since we are throwing out the rados user attrs, we can just do
rados -p <pool> get object /tmp/foo
rados -p <pool> rm object
rados -p <pool> put object /tmp/foo
(as long as there is no omap data!)
What we'll actually need to repair these objects will involve pulling the rgw xattrs out as well as the RADOS ones — without those manifest xattrs you might as well just delete the object.
- Priority changed from Urgent to High
- Status changed from 12 to Resolved
Also available in: Atom
PDF