Support #20356: cluster health ok,but rados hang - Ceph - Ceph

Actions

Copy link

Support #20356

closed

cluster health ok,but rados hang

Added by Ivan Wong almost 7 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

v10.2.7

Pull request ID:

Description

One day I found the VM ioutils is very high.But the ceph load did not.

The cluster health is OK,all pgs is active+clean.
When I run :rados -p block ls (my rbd block pool).
Command to midway got hang.The default log not record specific information.
Restart all osd can slove this problem.But to run for a period of time will return.

I try to upgrade ceph from 10.2.5 to 10.2.7.The problem is still.

It seems that after I modify configuration appears to be something.
ceph.conf modify:
ms_type = async
ms_async_op_threads = 5
ms_dispatch_throttle_bytes = 104857600000

Finally,I roll-back ceph config. Two days without problems.

ceph.conf:
[global]
fsid = e608c0d4-9e92-4e96-a4f1-a2c319dd8435
mon_initial_members = ip-10-26-67-10, ip-10-26-67-11, ip-10-26-67-12
mon_host = 10.26.67.10,10.26.67.11,10.26.67.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
osd pool min size = 1
osd pool default pg num = 2048
osd pool default pgp num = 2048

public network =10.26.0.0/16
cluster network =192.168.1.0/24
max open files = 131072
#message
#ms_type = async
#ms_async_op_threads = 5
#ms_dispatch_throttle_bytes = 104857600000
#osd crush update on start = false
mon_pg_warn_max_per_osd = 600

[osd]
osd data = /var/lib/ceph/osd/ceph-$id
osd journal size = 20000
osd mkfs type = xfs
osd mkfs options xfs = -f

filestore min sync interval = 10
filestore max sync interval = 15
filestore queue max ops = 5000
filestore queue max bytes = 10485760
filestore queue committing max ops = 5000
filestore queue committing max bytes = 10485760000

journal max write bytes = 1073714824
journal max write entries = 1000
journal queue max ops = 3000
journal queue max bytes = 10485760000

osd max write size = 512
osd client message size cap = 2048
osd deep scrub stride = 131072
osd op threads = 8
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
osd mount options xfs = "rw,nodev,noexec,noatime,nodiratime,inode64"
osd recovery op priority = 4
osd recovery max active = 10
osd recovery max backfills = 4

[client]
rbd cache = true
rbd cache size = 67108864
rbd cache max dirty = 50331648
rbd cache target dirty= 33554432
rbd cache max dirty age = 1
rbd cache writethrough until flush= false

Thanks.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Support #20356

cluster health ok,but rados hang

Updated by Greg Farnum almost 7 years ago