Bug #37673: latency between "initiated" and "queued_for_pg" - RADOS - Ceph

Actions

Copy link

Bug #37673

closed

latency between "initiated" and "queued_for_pg"

Added by Kefu Chai over 5 years ago. Updated over 5 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Performance/Resource Usage

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

6 osd cluster
separated cluster network on eth0, and public network on eth1.
a rados client accessing from public network

inject 2 seconds delay using tc to eth0. like tc qdisc add dev eth0 root netem delay 2000ms
put a 128 MB file to into 3 replica pool

slow requests are observed. quite a few of them had over 20 seconds' delay between "initiated" and "queued_for_pg"

{
                       "time": "2018-12-17 05:36:11.977528",
                       "event": "initiated" 
},                    
{
                        "time": "2018-12-17 05:36:47.991316",
                        "event": "queued_for_pg" 
},

found on luminous. need to reproduce this on master.

Actions

Copy link

Updated by Kefu Chai over 5 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 5 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 5 years ago

Status changed from New to Won't Fix

after testing locally, i think this is expected behavior.

test settings:

docker container osd-host.alpha, 172.17.0.3: hosting osd.0, osd.1, osd.2
docker container osd-host.beta, 172.17.0.4: hosting osd.3, osd.4, osd.5
docker container mon-host, 172.17.0.2: mon.a

test steps:

on mon-host: ceph osd pool create test-pool 16 16
on mon-host: dd if=/dev/zero of=/tmp/4MB count=1 bs=4MB
on osd-host.alpha: tc qdisc add dev eth0 root netem delay 2000ms
on mon-host: time bin/rados put -p test-pool test-obj3 /tmp/4MB
```
real    1m8.079s
user    0m0.048s
sys     0m0.022s
```

it took 68 seconds to copy a 4MB object

$ ceph osd map test-pool test-obj3
osdmap e55 pool 'test-pool' (1) object 'test-obj3' -> pg 1.6958e5c3 (1.3) -> up ([1,5,3], p1) acting ([1,5,3], p1)

on osd-host.alpha:

$ ceph daemon osd.1 dump_historic_slow_ops
...
            "description": "osd_op(client.14194.0:1 1.3 1:c3a71a96:::test-obj3:head [writefull 0~4000000] snapc 0=[] ondisk+write+known_if_redirected e55)",
            "initiated_at": "2018-12-19 17:12:21.041033",
            "age": 232.282686,
            "duration": 58.024624,
...
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.14194",
                    "client_addr": "172.17.0.2:36030/1920072373",
                    "tid": 1
                },
                "events": [
                    {
                        "time": "2018-12-19 17:12:21.041033",
                        "event": "initiated" 
                    },
                    {
                        "time": "2018-12-19 17:12:21.041033",
                        "event": "header_read" 
                    },
                    {
                        "time": "2018-12-19 17:12:21.041038",
                        "event": "throttled" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996248",
                        "event": "all_read" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996308",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996317",
                        "event": "queued_for_pg" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996392",
                        "event": "reached_pg" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996502",
                        "event": "started" 
                    },
                    {
                        "time": "2018-12-19 17:12:41.996617",
                        "event": "waiting for subops from 3,5" 
                    },
                    {
                        "time": "2018-12-19 17:12:42.277594",
                        "event": "op_commit" 
                    },
                    {
                        "time": "2018-12-19 17:13:13.621318",
                        "event": "sub_op_commit_rec" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065579",
                        "event": "sub_op_commit_rec" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065635",
                        "event": "commit_sent" 
                    },
                    {
                        "time": "2018-12-19 17:13:19.065656",
                        "event": "done" 
                    }

on osd-host.alpha, iperf shows

time iperf -c 172.17.0.4 -n 4M -f KBytes
------------------------------------------------------------
Client connecting to 172.17.0.4, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  3] local 172.17.0.3 port 47388 connected with 172.17.0.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-16.8 sec  4096 KBytes   243 KBytes/sec

real    0m18.878s
user    0m0.014s
sys     0m0.024s

on mon-host:

$ time iperf -c 172.17.0.3 -n 4M -f KBytes
------------------------------------------------------------
Client connecting to 172.17.0.3, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  3] local 172.17.0.2 port 57882 connected with 172.17.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-21.3 sec  4096 KBytes   192 KBytes/sec

real    0m23.310s
user    0m0.007s
sys     0m0.033s

i am wondering somehow both ingress and egress traffic are delayed on osd-host.alpha. and the 4M request was segmented at IP layer into smaller packets, and the packets were queued in the queue in qdisc. that's why it took the primary osd 58 seconds to handle the write request. and from the client's point of view more than 2 * 2 seconds delay were added because of the first packet of the write request from it and ack message from the primary osd.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #37673

latency between "initiated" and "queued_for_pg"

Updated by Kefu Chai over 5 years ago

Updated by Kefu Chai over 5 years ago

Updated by Kefu Chai over 5 years ago