Project

General

Profile

Actions

Bug #8625

closed

EC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xxx, setxattr' transactions

Added by Guang Yang almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When we tested radosgw against EC pool, we found that for the head strip written from radosgw, it always create two physical files within file system, one with the correct length (and generation as ffffffffffffffff) and another one with 0 length (another generation).

-rw-r--r-- 1 root root    0 Jun 17 14:35 default.5344.122\utest__head_E47F0944__3_f4f_a
-rw-r--r-- 1 root root 800K Jun 17 14:35 default.5344.122\utest__head_E47F0944__3_ffffffffffffffff_a

While checking the OSD log, we found such:

2014-06-18 10:45:42.828385 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) touch 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0
2014-06-18 10:45:42.828688 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) touch 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0
2014-06-18 10:45:42.828737 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) setattrs 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0
2014-06-18 10:45:42.828853 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) setattrs 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0
2014-06-18 10:45:42.828877 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) _collection_move_rename 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/5561/0 from 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0
2014-06-18 10:45:42.828905 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _set_replay_guard 1719416.0.6 START
2014-06-18 10:45:42.829158 7f9d64a2e700 20 filestore dbobjectmap: seq is 446
2014-06-18 10:45:42.829878 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _set_replay_guard 1719416.0.6 done
2014-06-18 10:45:42.830213 7f9d64a2e700 20 filestore(/home/y/var/lib/ceph/osd/ceph-96) lfn_unlink: clearing omap on 831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 in cid 3.ebs0_head
2014-06-18 10:45:42.830451 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _close_replay_guard 1719416.0.6
2014-06-18 10:45:42.830729 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _close_replay_guard 1719416.0.6 done
2014-06-18 10:45:42.830745 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _collection_move_rename 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/5561/0 from 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0
2014-06-18 10:45:42.830769 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) write 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 0~81920

It looks like the OSD creates the right file and then dup a empty file, is this on purpose for EC pool? This approach (if by design) would increase the amount of files 2X thus make the dentry/inode cache less efficient from system's perspective.

Actions #1

Updated by Guang Yang almost 10 years ago

We further locate the root cause for this issue, for the head object, there are several op transactions radosgw send over to OSD, here is an example: [create 0~0,setxattr user.rgw.idtag (15),writefull 0~8,setxattr user.rgw.manifest(341),setxattr
user.rgw.acl (127),setxattr user.rgw.etag (33)], and OSD handles the op one by one, when it is handling the CREATE transaction, it will create the file, and the following WRITEFULL op will be taken as an update. With EC pool, an update will leave the original file there for a while (seems like there is a pending backport will do the removal for those files), as a result, we see such empty files (with a generation).

With the fix to aggressively remove files with a generation, those files could be reclaimed, however, is there a chance we can avoid writing those files in the first place?

Actions #2

Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Samuel Just over 9 years ago

It's the create 0~0 followed by a writefull. Arguably, we still shouldn't version the object, I'll take a look.

Actions #4

Updated by Samuel Just over 9 years ago

  • Status changed from New to 7

wip-8625, versioning should never be necessary after a create (it will be necessary before the create if the object already existed).

Actions #5

Updated by Samuel Just over 9 years ago

  • Project changed from rgw to Ceph
  • Assignee set to Samuel Just

Making it not an rgw bug.

Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Pending Backport
Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF