Project

General

Profile

Actions

Bug #12971

closed

TestQuotaFull fails

Added by Zheng Yan over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2015-09-01_23:04:01-fs-infernalis---basic-multi/1041649/teuthology.log

2015-09-04T08:46:56.854 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2015-09-04T08:46:56.854 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/test_full.py", line 202, in test_full_different_file
2015-09-04T08:46:56.854 INFO:tasks.cephfs_test_runner:    self._test_full(True)
2015-09-04T08:46:56.854 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/test_full.py", line 187, in _test_full
2015-09-04T08:46:56.855 INFO:tasks.cephfs_test_runner:    timeout=osd_mon_report_interval_max * 5)
2015-09-04T08:46:56.855 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/cephfs_test_case.py", line 205, in wait_until_true
2015-09-04T08:46:56.855 INFO:tasks.cephfs_test_runner:    raise RuntimeError("Timed out after {0} seconds".format(elapsed))
2015-09-04T08:46:56.855 INFO:tasks.cephfs_test_runner:RuntimeError: Timed out after 25 seconds
2015-09-04T08:46:56.855 INFO:tasks.cephfs_test_runner:
2015-09-04T08:46:56.856 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2015-09-04T08:46:56.856 INFO:tasks.cephfs_test_runner:Ran 7 tests in 518.185s
2015-09-04T08:46:56.856 INFO:tasks.cephfs_test_runner:
2015-09-04T08:46:56.856 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2015-09-04T08:46:56.856 INFO:tasks.cephfs_test_runner:
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:======================================================================
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:ERROR: test_full_different_file (tasks.cephfs.test_full.TestQuotaFull)
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/test_full.py", line 202, in test_full_different_file
2015-09-04T08:46:56.857 INFO:tasks.cephfs_test_runner:    self._test_full(True)
2015-09-04T08:46:56.858 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/test_full.py", line 187, in _test_full
2015-09-04T08:46:56.858 INFO:tasks.cephfs_test_runner:    timeout=osd_mon_report_interval_max * 5)
2015-09-04T08:46:56.858 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/cephfs/cephfs_test_case.py", line 205, in wait_until_true
2015-09-04T08:46:56.858 INFO:tasks.cephfs_test_runner:    raise RuntimeError("Timed out after {0} seconds".format(elapsed))
2015-09-04T08:46:56.858 INFO:tasks.cephfs_test_runner:RuntimeError: Timed out after 25 seconds
Actions #1

Updated by Zheng Yan over 8 years ago

2015-09-04 11:46:30.042620 7f00e7929700 10 mds.0.cache.strays _consume: allocating allowance 5 to 0 in flight
2015-09-04 11:46:30.042624 7f00e7929700 10 mds.0.cache.strays purge [dentry #100/stray1/10000000000 [2,head] auth (dversion lock) v=23 inode=0x7f00f439c320 state=new | request=0 lock=0 inodepin=1 purging=1 dirty=1 authpin=0 0x7f00f43d12c0] [inode 10000000000 [2,head] ~mds0/stray1/10000000000 auth v23 s=16777216 nl=0 n(v0 b16777216 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 dirtyparent=0 dirty=1 waiter=0 authpin=0 0x7f00f439c320]
2015-09-04 11:46:30.042658 7f00e7929700 10 mds.0.cache.strays  realm snaprealm(100 seq 1 lc 0 cr 0 cps 1 snaps={} 0x7f00f421ba80)
2015-09-04 11:46:30.042662 7f00e7929700 10 mds.0.cache.strays purge 0~16777216 objects 0~4 snapc 0x7f00f421bc00 on [inode 10000000000 [2,head] ~mds0/stray1/10000000000 auth v23 s=16777216 nl=0 n(v0 b16777216 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 dirtyparent=0 dirty=1 waiter=0 authpin=0 0x7f00f439c320]
2015-09-04 11:46:30.042696 7f00e7929700  0 mds.0.objecter  FULL, paused modify 0x7f00f421e300 tid 55
2015-09-04 11:46:30.042709 7f00e7929700  0 mds.0.objecter  FULL, paused modify 0x7f00f421e540 tid 56
2015-09-04 11:46:30.042719 7f00e7929700  0 mds.0.objecter  FULL, paused modify 0x7f00f4219f80 tid 57
2015-09-04 11:46:30.042728 7f00e7929700  0 mds.0.objecter  FULL, paused modify 0x7f00f421e0c0 tid 58

looks like objector pause osd requests that delete objects.

Actions #2

Updated by Zheng Yan over 8 years ago

  • Assignee set to John Spray

the objecter pauses all write osd request (including delete request) when cluster is full or pool-quota is reached. Test case 'test_full_different_file' first writes data to files until pool-quota is reached. then delete the files and expects pool's full flag get cleared. This does not suppose to work because MDS's objecter should pauses requests that purge strays.

Actions #3

Updated by Greg Farnum over 8 years ago

That's not quite the problem. The MDS calls Objecter::unset_honor_osdmap_full(), at which point the Objecter should let write operations pass through. Perhaps we're not setting the flag early enough in startup and this was during a replay or something?

Actions #4

Updated by John Spray over 8 years ago

Greg's theory sounds plausible, will look

Actions #5

Updated by John Spray over 8 years ago

  • Status changed from New to In Progress

Looks like this commit might have broken it:

commit dbcf2e40d3e8a92f280879c74b7ca954a902b2d1
Author: xinxin shu <xinxin.shu@intel.com>
Date:   Thu Jun 18 09:44:47 2015 +0800

    Fixes : #12018

    resend writes after pool loses full flag

    Signed-off-by: xinxin shu <xinxin.shu@intel.com>

as it introduced an unguarded read of FULL flag that didn't respect the honor_osdmap_full setting

Actions #6

Updated by John Spray over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF