Project

General

Profile

Actions

Bug #17660

closed

objecter_requests 'mtime": "1970-01-01 00:00:00.000000s"' and '"last_sent": "4.76809e+06s"'

Added by Henrik Korkuc over 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Objecter
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I got bunch of stuck objecter_requests for ceph-fuse 10.2.3. They also have 'mtime": "1970-01-01 00:00:00.000000s"' and '"last_sent": "4.76809e+06s"'
example request:

        {
            "tid": 2140662,
            "pg": "1.5eea92dd",
            "osd": 201,
            "object_id": "10000151f19.00000c46",
            "object_locator": "@1",
            "target_object_id": "10000151f19.00000c46",
            "target_object_locator": "@1",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "last_sent": "4.76828e+06s",
            "attempts": 1,
            "snapid": "head",
            "snap_context": "0=[]",
            "mtime": "1970-01-01 00:00:00.000000s",
            "osd_ops": [
                "read 3104768~4096" 
            ]
        }

this client has patches from http://tracker.ceph.com/issues/17275 and https://github.com/ceph/ceph/pull/10921

ceph-post-file: 24c0e687-3bea-4a05-b7bc-d43fa6cd1556 ceph-fuse cache dump
ceph-post-file: 6619548a-b8ce-4eb9-b629-a860a708f88c gdb theads
ceph-post-file: a7d9ae6b-7d8c-4302-9e8f-e85e7b72eb61 objecter requests

Actions #1

Updated by Greg Farnum over 7 years ago

  • Subject changed from ceph-fuse 10.2.3 objecter_requests 'mtime": "1970-01-01 00:00:00.000000s"' and '"last_sent": "4.76809e+06s"' to objecter_requests 'mtime": "1970-01-01 00:00:00.000000s"' and '"last_sent": "4.76809e+06s"'
  • Category changed from librados to Objecter
  • Priority changed from Normal to High

So for some reason the Objecter is outputting its last_sent timestamp as a time period instead of a date, and it's printing mtime for read operations based on a default-initialized date stamp.

This is probably because of the recent changes to internal timestamp handling.

Actions #2

Updated by Henrik Korkuc over 7 years ago

what about them being stuck? I just came back to that server and it still has same requests stuck. cluster is in healthy state, these requests became stuck after cluster received too much load for 1-2 days.

Actions #3

Updated by Samuel Just over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (Objecter)
Actions #4

Updated by Greg Farnum over 7 years ago

  • Project changed from CephFS to Ceph
  • Category set to Objecter

This is an Objecter issue with its admin socket interfaces. Just because it's the Client's objecter doesn't make it an FS bug. :)

Actions #5

Updated by Henrik Korkuc over 7 years ago

I can semi-reliably reproduce stucking of objecter requests. I do large file writing to Cephfs from multiple clients: a script which mounts ceph-fuse, writes large file and unmounts ceph-fuse from multiple servers and multiple clients. And I have a script on one of the servers to find older files than X and delete them. Write is larger than cluster can handle, so I have slow requests warning, trimming is very slow. After cluster runs out of disk space mds trims some data which was due to be deleted, cluster writes resumes, it fills up again, rinse and repeat. After some time there will be these stuck requests which won't clear even when cluster is idle for long time. OSD restart clears that stuck request. Hammer is affecter too (#17413)

Actions #6

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category changed from Objecter to Administration/Usability
  • Component(RADOS) Objecter added
Actions #7

Updated by Sage Weil over 6 years ago

  • Status changed from New to Can't reproduce

My guess is this is related to some issues with full handling that are fixe din luminous and were backported to later jewel releases. If you can still see this on latest jewel or luminous, please reopen!

Actions

Also available in: Atom PDF