Project

General

Profile

Actions

Bug #37673

closed

latency between "initiated" and "queued_for_pg"

Added by Kefu Chai over 5 years ago. Updated over 5 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  • 6 osd cluster
  • separated cluster network on eth0, and public network on eth1.
  • a rados client accessing from public network
  1. inject 2 seconds delay using tc to eth0. like tc qdisc add dev eth0 root netem delay 2000ms
  2. put a 128 MB file to into 3 replica pool

slow requests are observed. quite a few of them had over 20 seconds' delay between "initiated" and "queued_for_pg"

{
                       "time": "2018-12-17 05:36:11.977528",
                       "event": "initiated" 
},                    
{
                        "time": "2018-12-17 05:36:47.991316",
                        "event": "queued_for_pg" 
},

found on luminous. need to reproduce this on master.

Actions #1

Updated by Kefu Chai over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Kefu Chai over 5 years ago

  • Description updated (diff)
Actions #3

Updated by Kefu Chai over 5 years ago

  • Status changed from New to Won't Fix

after testing locally, i think this is expected behavior.

test settings:

  • docker container osd-host.alpha, 172.17.0.3: hosting osd.0, osd.1, osd.2
  • docker container osd-host.beta, 172.17.0.4: hosting osd.3, osd.4, osd.5
  • docker container mon-host, 172.17.0.2: mon.a

test steps:

  1. on mon-host: ceph osd pool create test-pool 16 16
  2. on mon-host: dd if=/dev/zero of=/tmp/4MB count=1 bs=4MB
  3. on osd-host.alpha: tc qdisc add dev eth0 root netem delay 2000ms
  4. on mon-host: time bin/rados put -p test-pool test-obj3 /tmp/4MB
    real    1m8.079s
    user    0m0.048s
    sys     0m0.022s
    

it took 68 seconds to copy a 4MB object

$ ceph osd map test-pool test-obj3
osdmap e55 pool 'test-pool' (1) object 'test-obj3' -> pg 1.6958e5c3 (1.3) -> up ([1,5,3], p1) acting ([1,5,3], p1)

on osd-host.alpha:

$ ceph daemon osd.1 dump_historic_slow_ops
...
            "description": "osd_op(client.14194.0:1 1.3 1:c3a71a96:::test-obj3:head [writefull 0~4000000] snapc 0=[] ondisk+write+known_if_redirected e55)",
            "initiated_at": "2018-12-19 17:12:21.041033",
            "age": 232.282686,
            "duration": 58.024624,
...
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.14194",
                    "client_addr": "172.17.0.2:36030/1920072373",
                    "tid": 1
                },
                "events": [
                    {
                        "time": "2018-12-19 17:12:21.041033",
                        "event": "initiated" 
                    },
                    {
                        "time": "2018-12-19 17:12:21.041033",
                        "event": "header_read" 
                    },
                    {
                        "time": "2018-12-19 17:12:21.041038",
                        "event": "throttled" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996248",
                        "event": "all_read" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996308",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996317",
                        "event": "queued_for_pg" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996392",
                        "event": "reached_pg" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996502",
                        "event": "started" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996617",
                        "event": "waiting for subops from 3,5" 
                    },
                    {
                        "time": "2018-12-19 17:12:42.277594",
                        "event": "op_commit" 
                    },
                    {
                        "time": "2018-12-19 17:13:13.621318",
                        "event": "sub_op_commit_rec" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065579",
                        "event": "sub_op_commit_rec" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065635",
                        "event": "commit_sent" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065656",
                        "event": "done" 
                    }

on osd-host.alpha, iperf shows

time iperf -c 172.17.0.4 -n 4M -f KBytes
------------------------------------------------------------
Client connecting to 172.17.0.4, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  3] local 172.17.0.3 port 47388 connected with 172.17.0.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-16.8 sec  4096 KBytes   243 KBytes/sec

real    0m18.878s
user    0m0.014s
sys     0m0.024s

on mon-host:

$ time iperf -c 172.17.0.3 -n 4M -f KBytes
------------------------------------------------------------
Client connecting to 172.17.0.3, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  3] local 172.17.0.2 port 57882 connected with 172.17.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-21.3 sec  4096 KBytes   192 KBytes/sec

real    0m23.310s
user    0m0.007s
sys     0m0.033s

i am wondering somehow both ingress and egress traffic are delayed on osd-host.alpha. and the 4M request was segmented at IP layer into smaller packets, and the packets were queued in the queue in qdisc. that's why it took the primary osd 58 seconds to handle the write request. and from the client's point of view more than 2 * 2 seconds delay were added because of the first packet of the write request from it and ack message from the primary osd.

Actions

Also available in: Atom PDF