Support #20356
closedcluster health ok,but rados hang
0%
Description
One day I found the VM ioutils is very high.But the ceph load did not.
The cluster health is OK,all pgs is active+clean.
When I run :rados -p block ls (my rbd block pool).
Command to midway got hang.The default log not record specific information.
Restart all osd can slove this problem.But to run for a period of time will return.
I try to upgrade ceph from 10.2.5 to 10.2.7.The problem is still.
It seems that after I modify configuration appears to be something.
ceph.conf modify:
ms_type = async
ms_async_op_threads = 5
ms_dispatch_throttle_bytes = 104857600000
Finally,I roll-back ceph config. Two days without problems.
ceph.conf:
[global]
fsid = e608c0d4-9e92-4e96-a4f1-a2c319dd8435
mon_initial_members = ip-10-26-67-10, ip-10-26-67-11, ip-10-26-67-12
mon_host = 10.26.67.10,10.26.67.11,10.26.67.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
osd pool min size = 1
osd pool default pg num = 2048
osd pool default pgp num = 2048
public network =10.26.0.0/16
cluster network =192.168.1.0/24
max open files = 131072
#message
#ms_type = async
#ms_async_op_threads = 5
#ms_dispatch_throttle_bytes = 104857600000
#osd crush update on start = false
mon_pg_warn_max_per_osd = 600
[osd]
osd data = /var/lib/ceph/osd/ceph-$id
osd journal size = 20000
osd mkfs type = xfs
osd mkfs options xfs = -f
filestore min sync interval = 10
filestore max sync interval = 15
filestore queue max ops = 5000
filestore queue max bytes = 10485760
filestore queue committing max ops = 5000
filestore queue committing max bytes = 10485760000
journal max write bytes = 1073714824
journal max write entries = 1000
journal queue max ops = 3000
journal queue max bytes = 10485760000
osd max write size = 512
osd client message size cap = 2048
osd deep scrub stride = 131072
osd op threads = 8
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
osd mount options xfs = "rw,nodev,noexec,noatime,nodiratime,inode64"
osd recovery op priority = 4
osd recovery max active = 10
osd recovery max backfills = 4
[client]
rbd cache = true
rbd cache size = 67108864
rbd cache max dirty = 50331648
rbd cache target dirty= 33554432
rbd cache max dirty age = 1
rbd cache writethrough until flush= false
Thanks.