Project

General

Profile

Actions

Bug #49318

closed

qa: racy session evicted check

Added by Patrick Donnelly about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-02-16T05:22:00.182 DEBUG:teuthology.orchestra.run.gibba006:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 900 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok session ls
...
2021-02-16T05:22:00.247 INFO:tasks.cephfs.fuse_mount.ceph-fuse.1.gibba023.stderr:2021-02-16 05:22:00.243157 7f833abd8700 -1 client.4224 I was blacklisted at osd epoch 26
2021-02-16T05:22:00.248 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.gibba014.stderr:2021-02-16 05:22:00.241533 7f95afa82700 -1 client.4212 I was blacklisted at osd epoch 26
...
2021-02-16T05:22:00.590 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_64d5f18051c2fa3ad001b1c48f945134f60a8f76/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_64d5f18051c2fa3ad001b1c48f945134f60a8f76/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_ca4a449b0ba4ec9b17626a2cec228d6d46e1e824/qa/tasks/fs.py", line 48, in clients_evicted
    raise RuntimeError("client still has session: {}".format(str(session)))
RuntimeError: client still has session: {'id': 4212, 'entity': {'name': {'type': 'client', 'num': 4212}, 'addr': {'type': 'v1', 'addr': '172.21.2.114:0', 'nonce': 3397793124}}, 'state': 'open', 'num_leases': 0, 'num_caps': 1, 'request_load_avg': 0, 'uptime': 18.248637877, 'requests_in_flight': 0, 'completed_requests': [], 'reconnecting': False, 'recall_caps': {'value': 0, 'halflife': 60}, 'release_caps': {'value': 0, 'halflife': 60}, 'recall_caps_throttle': {'value': 0, 'halflife': 1.5}, 'recall_caps_throttle2o': {'value': 0, 'halflife': 0.5}, 'session_cache_liveness': {'value': 0.972016034454625, 'halflife': 300}, 'cap_acquisition': {'value': 0, 'halflife': 10}, 'inst': 'client.4212 v1:172.21.2.114:0/3397793124', 'prealloc_inos': [{'start': 1099511654821, 'length': 7}, {'start': 1099511655329, 'length': 501}], 'used_inos': [], 'client_metadata': {'features': '0x00000000000000ff', 'ceph_sha1': '956ceb853a58f6b6847b31fac34f2f0228a70579', 'ceph_version': 'ceph version 12.2.13-54-g956ceb853a (956ceb853a58f6b6847b31fac34f2f0228a70579) luminous (stable)', 'entity_id': '0', 'hostname': 'gibba014', 'mount_point': '/home/ubuntu/cephtest/mnt.0', 'pid': '12595', 'root': '/'}}
2021-02-16T05:22:00.872 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=de7b418330614bd0852dbc45715c7da0
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_64d5f18051c2fa3ad001b1c48f945134f60a8f76/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_64d5f18051c2fa3ad001b1c48f945134f60a8f76/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_ca4a449b0ba4ec9b17626a2cec228d6d46e1e824/qa/tasks/fs.py", line 48, in clients_evicted
    raise RuntimeError("client still has session: {}".format(str(session)))
RuntimeError: client still has session: {'id': 4212, 'entity': {'name': {'type': 'client', 'num': 4212}, 'addr': {'type': 'v1', 'addr': '172.21.2.114:0', 'nonce': 3397793124}}, 'state': 'open', 'num_leases': 0, 'num_caps': 1, 'request_load_avg': 0, 'uptime': 18.248637877, 'requests_in_flight': 0, 'completed_requests': [], 'reconnecting': False, 'recall_caps': {'value': 0, 'halflife': 60}, 'release_caps': {'value': 0, 'halflife': 60}, 'recall_caps_throttle': {'value': 0, 'halflife': 1.5}, 'recall_caps_throttle2o': {'value': 0, 'halflife': 0.5}, 'session_cache_liveness': {'value': 0.972016034454625, 'halflife': 300}, 'cap_acquisition': {'value': 0, 'halflife': 10}, 'inst': 'client.4212 v1:172.21.2.114:0/3397793124', 'prealloc_inos': [{'start': 1099511654821, 'length': 7}, {'start': 1099511655329, 'length': 501}], 'used_inos': [], 'client_metadata': {'features': '0x00000000000000ff', 'ceph_sha1': '956ceb853a58f6b6847b31fac34f2f0228a70579', 'ceph_version': 'ceph version 12.2.13-54-g956ceb853a (956ceb853a58f6b6847b31fac34f2f0228a70579) luminous (stable)', 'entity_id': '0', 'hostname': 'gibba014', 'mount_point': '/home/ubuntu/cephtest/mnt.0', 'pid': '12595', 'root': '/'}}

/ceph/teuthology-archive/yuriw-2021-02-15_21:26:29-fs-wip-yuri5-testing-2021-02-15-1156-nautilus-distro-basic-gibba/5885484/teuthology.log

Client will not close a new reconnection until it observes that it's been blocklisted. The session will be torn down when that happens. The client also never got a chance to try sending a session open message which would have resulted in a reject due to blocklisting.

The fix is probably to simply add a sleep before doing this task.


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #49611: octopus: qa: racy session evicted checkResolvedNathan CutlerActions
Copied to CephFS - Backport #49612: pacific: qa: racy session evicted checkResolvedPatrick DonnellyActions
Copied to CephFS - Backport #49613: nautilus: qa: racy session evicted checkResolvedNathan CutlerActions
Actions #1

Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Patrick Donnelly
Actions #3

Updated by Patrick Donnelly about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 39502
Actions #4

Updated by Patrick Donnelly about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot about 3 years ago

Actions #6

Updated by Backport Bot about 3 years ago

Actions #7

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49613: nautilus: qa: racy session evicted check added
Actions #8

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF