Project

General

Profile

Actions

Bug #48818

open

rbd client upload image stuck indefinitely

Added by Norman Shen over 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are using openstack pike with ceph versioned to 12.2.8. The problem is that
when uploading image to glance, uploading process will be blocked after around 15mins

by inspecting the network connection, I found

root@ctl05:~# ss -nti | sort -nk3 | tail -n5
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 768 10.201.0.25:39418 10.201.0.24:9292
ESTAB 0 768 10.201.0.25:54400 10.201.0.25:9292
ESTAB 0 4920864 10.201.16.25:35320 10.201.17.1:6800
ESTAB 0 5427408 10.201.16.25:33048 10.201.17.19:6815
root@ctl05:~# ss -nt^C| sort -nk3 | tail -n5
root@ctl05:~# ss -niptpo 'dport = :33048'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
root@ctl05:~# ss -niptpo 'sport = :33048'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 5459664 10.201.16.25:33048 10.201.17.19:6815
users:(("cinder-volume",pid=21229,fd=617)) timer:(persist,072ms,0) ts sack cubic wscale:7,8 rto:204 rtt:0.105/0.007 ato:40 mss:8948 cwnd:10 ssthresh:156 bytes_acked:3224418080 bytes_received:3238855435 segs_out:6396522 segs_in:5665100 send 6817.5Mbps lastsnd:132 lastrcv:1332724 lastack:132 pacing_rate 13570.4Mbps reordering:76
rcv_rtt:0.753 rcv_space:3171936

which indicates that connection to some osds got stuck, and by dumping historic slow ops on target osd, I get

{
"num to keep": 20,
"threshold to keep": 10,
"Ops": [ {
"description": "osd_op(client.828268208.0:654831 4.11c 4:3880e257:::rbd_data.59bdf143f18422.0000000000023a15:head [call rbd.copyup,set-alloc-hint object_size 4194304 write_size 4194304,write 0~0] snapc 0=[] ondisk+write+known_if_redirected e64275)",
"initiated_at": "2021-01-10 21:06:51.584543",
"age": 47470.013532,
"duration": 2261.981690,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.828268208",
"client_addr": "10.201.16.25:0/4041839941",
"tid": 654831
},
"events": [ {
"time": "2021-01-10 21:06:51.584543",
"event": "initiated"
}, {
"time": "2021-01-10 21:44:33.437332",
"event": "queued_for_pg"
}, {
"time": "2021-01-10 21:44:33.437354",
"event": "reached_pg"
}, {
"time": "2021-01-10 21:44:33.437575",
"event": "started"
}, {
"time": "2021-01-10 21:44:33.438253",
"event": "waiting for subops from 105,203"
}, {
"time": "2021-01-10 21:44:33.535935",
"event": "sub_op_commit_rec from 105"
}, {
"time": "2021-01-10 21:44:33.550442",
"event": "op_commit"
}, {
"time": "2021-01-10 21:44:33.550445",
"event": "op_applied"
}, {
"time": "2021-01-10 21:44:33.566179",
"event": "sub_op_commit_rec from 203"

Why this happens and how I can workaround this? thank you.

Actions #1

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to rbd
Actions

Also available in: Atom PDF