Bug #8625
closedEC pool - OSD creates an empty file for op with 'create 0~0, writefull 0~xxx, setxattr' transactions
0%
Description
When we tested radosgw against EC pool, we found that for the head strip written from radosgw, it always create two physical files within file system, one with the correct length (and generation as ffffffffffffffff) and another one with 0 length (another generation).
-rw-r--r-- 1 root root 0 Jun 17 14:35 default.5344.122\utest__head_E47F0944__3_f4f_a -rw-r--r-- 1 root root 800K Jun 17 14:35 default.5344.122\utest__head_E47F0944__3_ffffffffffffffff_a
While checking the OSD log, we found such:
2014-06-18 10:45:42.828385 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) touch 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 2014-06-18 10:45:42.828688 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) touch 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0 2014-06-18 10:45:42.828737 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) setattrs 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 2014-06-18 10:45:42.828853 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) setattrs 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0 2014-06-18 10:45:42.828877 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) _collection_move_rename 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/5561/0 from 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 2014-06-18 10:45:42.828905 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _set_replay_guard 1719416.0.6 START 2014-06-18 10:45:42.829158 7f9d64a2e700 20 filestore dbobjectmap: seq is 446 2014-06-18 10:45:42.829878 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _set_replay_guard 1719416.0.6 done 2014-06-18 10:45:42.830213 7f9d64a2e700 20 filestore(/home/y/var/lib/ceph/osd/ceph-96) lfn_unlink: clearing omap on 831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 in cid 3.ebs0_head 2014-06-18 10:45:42.830451 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _close_replay_guard 1719416.0.6 2014-06-18 10:45:42.830729 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _close_replay_guard 1719416.0.6 done 2014-06-18 10:45:42.830745 7f9d64a2e700 10 filestore(/home/y/var/lib/ceph/osd/ceph-96) _collection_move_rename 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/5561/0 from 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 = 0 2014-06-18 10:45:42.830769 7f9d64a2e700 15 filestore(/home/y/var/lib/ceph/osd/ceph-96) write 3.ebs0_head/831d70eb/default.5338.1_12m/head//3/18446744073709551615/0 0~81920
It looks like the OSD creates the right file and then dup a empty file, is this on purpose for EC pool? This approach (if by design) would increase the amount of files 2X thus make the dentry/inode cache less efficient from system's perspective.
Updated by Guang Yang almost 10 years ago
We further locate the root cause for this issue, for the head object, there are several op transactions radosgw send over to OSD, here is an example: [create 0~0,setxattr user.rgw.idtag (15),writefull 0~8,setxattr user.rgw.manifest(341),setxattr
user.rgw.acl (127),setxattr user.rgw.etag (33)], and OSD handles the op one by one, when it is handling the CREATE transaction, it will create the file, and the following WRITEFULL op will be taken as an update. With EC pool, an update will leave the original file there for a while (seems like there is a pending backport will do the removal for those files), as a result, we see such empty files (with a generation).
With the fix to aggressively remove files with a generation, those files could be reclaimed, however, is there a chance we can avoid writing those files in the first place?
Updated by Samuel Just over 9 years ago
It's the create 0~0 followed by a writefull. Arguably, we still shouldn't version the object, I'll take a look.
Updated by Samuel Just over 9 years ago
- Status changed from New to 7
wip-8625, versioning should never be necessary after a create (it will be necessary before the create if the object already existed).
Updated by Samuel Just over 9 years ago
- Project changed from rgw to Ceph
- Assignee set to Samuel Just
Making it not an rgw bug.
Updated by Sage Weil over 9 years ago
- Status changed from Pending Backport to Resolved