Project

General

Profile

Actions

Bug #17413

closed

objecter read request stuck for 2 days from OSD

Added by Henrik Korkuc over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

while testing http://tracker.ceph.com/issues/17275 I stumbled upon read request which was stuck for more than 2 days. ceph-fuse is 0.94.9 with patch from #17275. Cluster is 10.2.1.

Example of request:

        {
            "tid": 629298,
            "pg": "1.1ab4db54",
            "osd": 44,
            "object_id": "1000024655f.00025010",
            "object_locator": "@1",
            "target_object_id": "1000024655f.00025010",
            "target_object_locator": "@1",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "last_sent": "2016-09-24 04:13:37.639395",
            "attempts": 1,
            "snapid": "head",
            "snap_context": "0=[]",
            "mtime": "0.000000",
            "osd_ops": [
                "read 131072~131072" 
            ]
        },

I am attaching some debug info, also:
client cache: ceph-post-file: cd539449-cca1-4b36-bf32-54f0353c0815
mds cache: ceph-post-file: 477d14b0-dcc9-4291-a7fd-fe654483b6b8

Now that ceph-fuse is not running so I am unable to collect additional debug from that, but if you would give me instructions I could collect that if issue will be founf again


Files

client.27841966_gdb.txt (118 KB) client.27841966_gdb.txt Henrik Korkuc, 09/27/2016 12:23 PM
client.27841966_objecter_requests (3.22 KB) client.27841966_objecter_requests Henrik Korkuc, 09/27/2016 12:23 PM
mds-dump_blocked_ops (318 KB) mds-dump_blocked_ops Henrik Korkuc, 09/27/2016 12:23 PM
Actions #1

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Closed

The helpful piece of info here would have been teh output from 'ceph daemon osd.nnn ops' (for the osd where the request is blocked). I'm guessing this is a dup of the rare hung op when it collides with snap trimming, but hard to say here! closing this since there isn't any info here that'll help that one.

Actions

Also available in: Atom PDF