Project

General

Profile

Actions

Bug #62965

open

osd perf error

Added by zhipeng li 7 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

test RGW with cosbench with 4k files,

get osd slow ops info as bellow:
ceph daemon osd.14 dump_historic_slow_ops |grep dura
"duration": 15.016597959,
"duration": 15.019391707,
"duration": 14.823621810000001,
"duration": 14.826146726999999,
"duration": 13.366448235,
"duration": 13.119884395,
"duration": 13.122062842,
"duration": 11.476077675999999,
"duration": 10.016261663,
"duration": 4294967295.4891562,
"duration": 4294967295.489171,
"duration": 4294967295.4829359,
"duration": 4294967295.2711391
show detail info {
"description": "osd_op(client.13240035.0:240243317 10.13 10:c9c7d664:::6d4f3da9-b06d-42a0-b9d3-b7a4d20b55f7.5520.7_object-1m1840:head [getxattrs,stat,read 0~4000] snapc 0=[] ondisk+read+known_if_redirected e3130)",
"initiated_at": "2023-09-22 13:44:08.514075",
"age": 245817.01734406501,
"duration": 4294967295.489171,
"type_data": {
"flag_point": "started",
"client_info": {
"client": "client.13240035",
"client_addr": "10.10.52.81:0/191604539",
"tid": 240243317
},
"events": [ {
"time": "2023-09-22 13:44:08.514075",
"event": "initiated"
}, {
"time": "2023-09-22 13:44:08.514075",
"event": "header_read"
}, {
"time": "2023-09-22 13:44:08.514071",
"event": "throttled"
}, {
"time": "2023-09-22 13:44:08.514077",
"event": "all_read"
}, {
"time": "2023-09-22 13:44:08.514078",
"event": "dispatched"
}, {
"time": "2023-09-22 13:44:08.514081",
"event": "queued_for_pg"
}, {
"time": "2023-09-22 13:44:08.002997",
"event": "reached_pg"
}, {
"time": "2023-09-22 13:44:08.003145",
"event": "started"
}, {
"time": "2023-09-22 13:44:08.003246",
"event": "done"
}

]
}
}, {
"description": "osd_op(client.13245211.0:260000668 9.1c 9:3cc78a1c:::.dir.6d4f3da9-b06d-42a0-b9d3-b7a4d20b55f7.5520.3.157:head [call rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[] ondisk+write+known_if_redirected e3130)",
"initiated_at": "2023-09-22 13:44:08.517542",
"age": 245817.01387675601,
"duration": 4294967295.4829359,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.13245211",
"client_addr": "10.10.52.77:0/1529571248",
"tid": 260000668
},
"events": [ {
"time": "2023-09-22 13:44:08.517542",
"event": "initiated"
}, {
"time": "2023-09-22 13:44:08.517542",
"event": "header_read"
}, {
"time": "2023-09-22 13:44:08.517534",
"event": "throttled"
}, {
"time": "2023-09-22 13:44:08.517545",
"event": "all_read"
}, {
"time": "2023-09-22 13:44:08.517546",
"event": "dispatched"
}, {
"time": "2023-09-22 13:44:08.517548",
"event": "queued_for_pg"
}, {
"time": "2023-09-22 13:44:08.517567",
"event": "reached_pg"
}, {
"time": "2023-09-22 13:44:08.517612",
"event": "started"
}, {
"time": "2023-09-22 13:44:08.517797",
"event": "waiting for subops from 25,35"
}, {
"time": "2023-09-22 13:44:08.000211",
"event": "op_commit"
}, {
"time": "2023-09-22 13:44:08.000378",
"event": "sub_op_commit_rec"
}, {
"time": "2023-09-22 13:44:08.000434",
"event": "sub_op_commit_rec"
}, {
"time": "2023-09-22 13:44:08.000467",
"event": "commit_sent"
}, {
"time": "2023-09-22 13:44:08.000478",
"event": "done"
}
]
}
},
Actions #1

Updated by Neha Ojha 7 months ago

  • Description updated (diff)
Actions #2

Updated by Radoslaw Zarzynski 7 months ago

  • Description updated (diff)

Is the junky duration

            "duration": 4294967295.4829359,

the problem here? If so, there is was a fix: https://github.com/ceph/ceph/pull/51668.

Actions #3

Updated by Radoslaw Zarzynski 7 months ago

  • Pull request ID set to 53643
Actions #4

Updated by zhipeng li 7 months ago

Radoslaw Zarzynski wrote:

Is the junky duration

[...]

the problem here? If so, there is was a fix: https://github.com/ceph/ceph/pull/51668.

maybe not same issue, I post part logs.

@
"events": [ {
"time": "2023-09-22 13:44:08.514075",
"event": "initiated"
}, {
"time": "2023-09-22 13:44:08.514075",
"event": "header_read"
}, {
"time": "2023-09-22 13:44:08.514071",
"event": "throttled"
}, {
"time": "2023-09-22 13:44:08.514077",
"event": "all_read"
}, {
"time": "2023-09-22 13:44:08.514078",
"event": "dispatched"
}, {
"time": "2023-09-22 13:44:08.514081",
"event": "queued_for_pg"
}, {
"time": "2023-09-22 13:44:08.002997",
"event": "reached_pg"
}, {
"time": "2023-09-22 13:44:08.003145",
"event": "started"
}, {
"time": "2023-09-22 13:44:08.003246",
"event": "done"
}
]
@

even queued_for_pg time > reached_pg

Actions #5

Updated by zhipeng li 7 months ago

maybe we can skip it like this https://github.com/ceph/ceph/pull/53643

Actions #6

Updated by zhipeng li 7 months ago

zhipeng li wrote:

in performance testing ,many small file read latency may bigger then 1000s for system clock issue.

maybe we can skip perf noise like this https://github.com/ceph/ceph/pull/53643

Actions #7

Updated by zhipeng li 7 months ago

zhipeng li wrote:

Radoslaw Zarzynski wrote:

Is the junky duration

[...]

the problem here? If so, there is was a fix: https://github.com/ceph/ceph/pull/51668.

maybe not same issue, I post part logs.

"events": [ {
"time": "2023-09-22 13:44:08.514075",
"event": "initiated"
}, {
"time": "2023-09-22 13:44:08.514075",
"event": "header_read"
}, {
"time": "2023-09-22 13:44:08.514071",
"event": "throttled"
}, {
"time": "2023-09-22 13:44:08.514077",
"event": "all_read"
}, {
"time": "2023-09-22 13:44:08.514078",
"event": "dispatched"
}, {
"time": "2023-09-22 13:44:08.514081",
"event": "queued_for_pg"
}, {
"time": "2023-09-22 13:44:08.002997",
"event": "reached_pg"
}, {
"time": "2023-09-22 13:44:08.003145",
"event": "started"
}, {
"time": "2023-09-22 13:44:08.003246",
"event": "done"
}
]

even queued_for_pg time > reached_pg

Actions #8

Updated by zhipeng li 7 months ago

Radoslaw Zarzynski wrote:

Is the junky duration

[...]

the problem here? If so, there is was a fix: https://github.com/ceph/ceph/pull/51668.

not op tracker issue

Actions

Also available in: Atom PDF