Project

General

Profile

Actions

Bug #64389

open

client: check if pools are full when mounting

Added by Dhairya Parmar 3 months ago. Updated 3 months ago.

Status:
Triaged
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

otherwise the mounting stalls:

2024-02-12T16:20:27.609+0530 7f03e4e7a9c0 10 client.0 osdmap pool full0
2024-02-12T16:20:27.609+0530 7f03dca4d6c0 10 client.0 ms_handle_connect on v2:127.0.0.1:40206/0
2024-02-12T16:20:27.609+0530 7f03e4e7a9c0 10 client.4513 Subscribing to map 'mdsmap'
2024-02-12T16:20:27.610+0530 7f03c4ff96c0 20 client.4513 tick
2024-02-12T16:20:27.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_metrics
2024-02-12T16:20:27.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_global_metrics
2024-02-12T16:20:27.610+0530 7f03c4ff96c0  5 client.4513 collect_and_send_global_metrics MDS rank 0 is not ready yet -- not sending metric
2024-02-12T16:20:27.610+0530 7f03c4ff96c0 20 client.4513 trim_cache size 0 max 16384
2024-02-12T16:20:27.610+0530 7f03c4ff96c0 20 client.4513 upkeep thread waiting interval 1.000000000s
2024-02-12T16:20:27.610+0530 7f03dca4d6c0  1 client.4513 _handle_full_flag: FULL: cancelling outstanding operations on 1
2024-02-12T16:20:27.610+0530 7f03dca4d6c0  1 client.4513 _handle_full_flag: FULL: cancelling outstanding operations on 2
2024-02-12T16:20:27.610+0530 7f03dca4d6c0  1 client.4513 _handle_full_flag: FULL: cancelling outstanding operations on 3
2024-02-12T16:20:27.610+0530 7f03dca4d6c0  1 client.4513 handle_mds_map epoch 533
2024-02-12T16:20:27.610+0530 7f03e4e7a9c0 20 client.4513 populate_metadata read hostname 'li-d7acf5cc-234b-11b2-a85c-8f0e65e32dfd.ibm.com'
2024-02-12T16:20:27.610+0530 7f03e4e7a9c0 10 client.4513 did not get mds through better means, so chose random mds 0
2024-02-12T16:20:27.610+0530 7f03e4e7a9c0 20 client.4513 mds is 0
2024-02-12T16:20:27.610+0530 7f03e4e7a9c0 10 client.4513 _open_mds_session mds.0
2024-02-12T16:20:27.610+0530 7f03e4e7a9c0 10 client.4513 waiting for session to mds.0 to open
2024-02-12T16:20:27.610+0530 7f03dca4d6c0 10 client.4513 ms_handle_connect on v2:127.0.0.1:6830/1345341561
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 20 client.4513 tick
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 10 client.4513 renew_caps()
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 15 client.4513 renew_caps requesting from mds.0
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 10 client.4513 renew_caps mds.0
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_metrics
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_global_metrics
2024-02-12T16:20:28.610+0530 7f03c4ff96c0  5 client.4513 collect_and_send_global_metrics: no session with rank=0 -- not sending metric
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 20 client.4513 trim_cache size 0 max 16384
2024-02-12T16:20:28.610+0530 7f03c4ff96c0 20 client.4513 upkeep thread waiting interval 1.000000000s
2024-02-12T16:20:29.048+0530 7f03dca4d6c0  1 client.4513 handle_mds_map epoch 534
2024-02-12T16:20:29.610+0530 7f03c4ff96c0 20 client.4513 tick
2024-02-12T16:20:29.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_metrics
2024-02-12T16:20:29.610+0530 7f03c4ff96c0 20 client.4513 collect_and_send_global_metrics
2024-02-12T16:20:29.610+0530 7f03c4ff96c0  5 client.4513 collect_and_send_global_metrics: no session with rank=0 -- not sending metric
2024-02-12T16:20:29.610+0530 7f03c4ff96c0 20 client.4513 trim_cache size 0 max 16384
2024-02-12T16:20:29.610+0530 7f03c4ff96c0 20 client.4513 upkeep thread waiting interval 1.000000000s
2024-02-12T16:20:30.610+0530 7f03c4ff96c0 20 client.4513 tick

health status:

    health: HEALTH_ERR
            2 client(s) laggy due to laggy OSDs
            1 MDSs report slow metadata IOs
            3 full osd(s)
            3 pool(s) full

from mon logs:

mon.a.log:1432535:            "message": "2 slow ops, oldest one blocked for 3318 sec, mon.a has slow ops",
mon.a.log:1434391:            "message": "2 slow ops, oldest one blocked for 3318 sec, mon.a has slow ops",
mon.a.log:980570:            "message": "0 slow ops, oldest one blocked for 34 sec, osd.0 has slow ops",
mon.a.log:981411:            "message": "0 slow ops, oldest one blocked for 34 sec, osd.0 has slow ops",
mon.a.log:982393:            "message": "0 slow ops, oldest one blocked for 34 sec, osd.0 has slow ops" 

mds log:

mds.c.log:331991:2024-02-12T16:35:03.624+0530 7f8a13ef46c0 20 mds.beacon.c 9 slow metadata IOs found

there is no room left for the client it seems and so the mounting hangs.

Actions #1

Updated by Venky Shankar 3 months ago

  • Status changed from New to Triaged
  • Assignee set to Neeraj Pratap Singh
  • Target version set to v19.0.0
  • Backport set to quincy,reef
Actions

Also available in: Atom PDF