Project

General

Profile

Bug #53859

qa: Test failure: test_pool_perm (tasks.cephfs.test_pool_perm.TestPoolPerm)

Added by Venky Shankar about 1 year ago. Updated 12 months ago.

Status:
Fix Under Review
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

History

#1 Updated by Patrick Donnelly about 1 year ago

Relevant bits from the log:

2022-01-13T05:23:31.208+0000 7f32cffff700  3 client.5688 ll_open 0x10000000000.head 2 = 0 (0x7f32c8002d00)
2022-01-13T05:23:31.209+0000 7f32cffff700  8 client.5688 _ll_forget 0x10000000000 1
2022-01-13T05:23:31.209+0000 7f32cffff700 20 client.5688 _ll_put 0x7f32e0009670 0x10000000000 1 -> 1
2022-01-13T05:23:31.209+0000 7f32e77fe700 20 client.5688 put_inode on 0x10000000000.head(faked_ino=0 nref=13 ll_ref=1 cap_refs={} open={3=1} mode=100664 size=0/0 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) n = 1
2022-01-13T05:23:31.209+0000 7f32e77fe700 20 client.5688 put_inode on 0x10000000000.head(faked_ino=0 nref=13 ll_ref=1 cap_refs={} open={3=1} mode=100664 size=0/0 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) n = 1
2022-01-13T05:23:31.209+0000 7f32e4ff9700 20 client.5688 _ll_get 0x7f32e0009670 0x10000000000 -> 2
2022-01-13T05:23:31.209+0000 7f32e4ff9700  8 client.5688 _ll_getattr 0x10000000000.head
2022-01-13T05:23:31.209+0000 7f32e4ff9700 10 client.5688 _getattr mask pAsLsXsFs issued=1
2022-01-13T05:23:31.209+0000 7f32e4ff9700 10 client.5688 fill_stat on 0x10000000000 snap/devhead mode 0100664 mtime 2022-01-13T05:23:18.645267+0000 ctime 2022-01-13T05:23:18.646363+0000
2022-01-13T05:23:31.209+0000 7f32e4ff9700  3 client.5688 ll_getattr 0x10000000000.head = 0
2022-01-13T05:23:31.209+0000 7f32e4ff9700  8 client.5688 _ll_forget 0x10000000000 1
2022-01-13T05:23:31.209+0000 7f32e4ff9700 20 client.5688 _ll_put 0x7f32e0009670 0x10000000000 1 -> 1
2022-01-13T05:23:31.212+0000 7f32e77fe700  1 -- 192.168.0.1:0/2718057660 <== mds.0 v2:172.21.15.153:6835/2865775467 6 ==== client_caps(grant ino 0x10000000000 5 seq 3 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2022-01-13T05:23:18.645267+0000 tws 2) v11 ==== 252+0+0 (crc 0 0 0) 0x7f32f400e040 con 0x560ab8f738a0
2022-01-13T05:23:31.212+0000 7f32e77fe700  5 client.5688 set_cap_epoch_barrier epoch = 57
2022-01-13T05:23:31.212+0000 7f32e77fe700 10 client.5688  mds.0 seq now 1
2022-01-13T05:23:31.212+0000 7f32e77fe700  5 client.5688 handle_cap_grant on in 0x10000000000 mds.0 seq 3 caps now pAsxLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
2022-01-13T05:23:31.212+0000 7f32e77fe700 10 client.5688 update_inode_file_time 0x10000000000.head(faked_ino=0 nref=13 ll_ref=1 cap_refs={} open={3=1} mode=100664 size=0/0 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) pAsxLsXsxFsxcrwb ctime 2022-01-13T05:23:18.646363+0000 mtime 2022-01-13T05:23:18.645267+0000
2022-01-13T05:23:31.212+0000 7f32e77fe700 10 client.5688 max_size 0 -> 4194304
2022-01-13T05:23:31.212+0000 7f32e77fe700 10 client.5688   caps unchanged at pAsxLsXsxFsxcrwb
2022-01-13T05:23:31.213+0000 7f32cffff700  3 client.5688 ll_flush 0x7f32c8002d00 0x10000000000
2022-01-13T05:23:31.213+0000 7f32cffff700 10 client.5688 _flush: 0x7f32c8002d00 on inode 0x10000000000.head(faked_ino=0 nref=13 ll_ref=1 cap_refs={} open={3=1} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) no async_err state
2022-01-13T05:23:31.213+0000 7f32e4ff9700  3 client.5688 ll_release (fh)0x7f32c8002d00 0x10000000000
2022-01-13T05:23:31.213+0000 7f32e4ff9700  8 client.5688 _release_fh 0x7f32c8002d00 mode 3 on 0x10000000000.head(faked_ino=0 nref=13 ll_ref=1 cap_refs={} open={3=1} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670)
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688 _flush 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670)
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688  nothing to flush
2022-01-13T05:23:31.213+0000 7f32e4ff9700 20 client.5688 put_inode on 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) n = 1
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688 check_caps on 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) wanted - used - issued pAsxLsXsxFsxcrwb revoking - flags=0
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688  cap mds.0 issued pAsxLsXsxFsxcrwb implemented pAsxLsXsxFsxcrwb revoking -
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688 delaying cap release
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688 cap_delay_requeue on 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670)
2022-01-13T05:23:31.213+0000 7f32e4ff9700 10 client.5688 _release_fh 0x7f32c8002d00 on inode 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) no async_err state
2022-01-13T05:23:31.213+0000 7f32e4ff9700 20 client.5688 put_inode on 0x10000000000.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={} open={3=0} mode=100664 size=0/4194304 nlink=1 btime=2022-01-13T05:23:18.640986+0000 mtime=2022-01-13T05:23:18.645267+0000 ctime=2022-01-13T05:23:18.646363+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["test_file"] 0x7f32e0009670) n = 1

From: /ceph/teuthology-archive/vshankar-2022-01-13_04:42:40-fs-wip-vshankar-testing-20220106-145819-testing-default-smithi/6612000/remote/smithi052/log/ceph-client.0.51171.log.gz

Don't see a read call... maybe it's being optimized out since the file is known to be empty? If so, not sure how this ever worked. Would suggest experimenting with a file with contents to see if that makes a difference.

#2 Updated by Xiubo Li about 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 44679

#3 Updated by Xiubo Li about 1 year ago

When the 'test_file' is empty the kernel won't trigger go do the real read, so the libcephfs won't get a chance to call read. Seems will call a getattr before calling read do decide whether to call read or not.

#4 Updated by Xiubo Li about 1 year ago

  • Assignee set to Xiubo Li

#5 Updated by Xiubo Li about 1 year ago

From kernel fs and fuse code:

1567 static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
1568 {
1569         struct file *file = iocb->ki_filp;
1570         struct fuse_file *ff = file->private_data;
1571         struct inode *inode = file_inode(file);
1572 
1573         if (fuse_is_bad(inode))
1574                 return -EIO;
1575 
1576         if (FUSE_IS_DAX(inode))
1577                 return fuse_dax_read_iter(iocb, to);
1578 
1579         if (!(ff->open_flags & FOPEN_DIRECT_IO))
1580                 return fuse_cache_read_iter(iocb, to);
1581         else
1582                 return fuse_direct_read_iter(iocb, to);
1583 }

In Line#1580 above when doing the cache read it will update the attribute first, which will fire a 'getattr' request to libcephfs:

 989 static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
 990 {
 991         struct inode *inode = iocb->ki_filp->f_mapping->host;
 992         struct fuse_conn *fc = get_fuse_conn(inode);
 993 
 994         /*
 995          * In auto invalidate mode, always update attributes on read.
 996          * Otherwise, only update if we attempt to read past EOF (to ensure                                                                       
 997          * i_size is up to date).
 998          */
 999         if (fc->auto_inval_data ||
1000             (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
1001                 int err;
1002                 err = fuse_update_attributes(inode, iocb->ki_filp, STATX_SIZE);
1003                 if (err)
1004                         return err;
1005         }
1006 
1007         return generic_file_read_iter(iocb, to);
1008 }

The 'getattr' will get the inode size related fields, which is zero.

And then when doing the generic_file_read_iter()-->filemap_read():

2628         do {
2629                 cond_resched();
2630 
2631                 /*
2632                  * If we've already successfully copied some data, then we
2633                  * can no longer safely return -EIOCBQUEUED. Hence mark
2634                  * an async read NOWAIT at that point.
2635                  */
2636                 if ((iocb->ki_flags & IOCB_WAITQ) && already_read)
2637                         iocb->ki_flags |= IOCB_NOWAIT;
2638 
2639                 if (unlikely(iocb->ki_pos >= i_size_read(inode)))
2640                         break;
2641 
2642                 error = filemap_get_pages(iocb, iter, &pvec);
2643                 if (error < 0)
2644                         break;

When it see that "iocb->ki_pos >= i_size_read(inode)" it will return directly without fire real read to libcephfs.

This is why when the read offset >= file size, we won't see the 'read' in libcephfs.

I have checked the direct IO in kernel & fuse code, also has the same logic.

#6 Updated by Patrick Donnelly about 1 year ago

That explains the current behavior but why did this test not fail in the past?

#7 Updated by Xiubo Li about 1 year ago

Patrick Donnelly wrote:

That explains the current behavior but why did this test not fail in the past?

Is it possible that the 'test_file' was created by other test cases and wasn't removed ? Then the 'test_file' was not empty ?

#8 Updated by Venky Shankar about 1 year ago

Xiubo Li wrote:

Patrick Donnelly wrote:

That explains the current behavior but why did this test not fail in the past?

Is it possible that the 'test_file' was created by other test cases and wasn't removed ? Then the 'test_file' was not empty ?

That's possible. I've run into such things earlier where a previous test does not cleanup files and the subsequent test (depending on what its expecting) fails.

#9 Updated by Xiubo Li 12 months ago

Venky Shankar wrote:

Xiubo Li wrote:

Patrick Donnelly wrote:

That explains the current behavior but why did this test not fail in the past?

Is it possible that the 'test_file' was created by other test cases and wasn't removed ? Then the 'test_file' was not empty ?

That's possible. I've run into such things earlier where a previous test does not cleanup files and the subsequent test (depending on what its expecting) fails.

Okay, before I thought the previous test files should be cleaned up anyway.

Also available in: Atom PDF