Bug #10381
closedhealth HEALTH_WARN mds ceph239 is laggy
0%
Description
Hi there.
Today,I runned a script to do some test on my ceph cluster via a cephfs client,include dd/rm/cp files less than 10K.
After 1 hour,the cephfs client was freezed,So I check my ceph health was below:
[root@MON_137 ceph-deploy]# ceph -s
cluster fe614861-e6fb-426f-90f7-682fd6f2def3
health HEALTH_WARN mds ceph239 is laggy
monmap e3: 3 mons at {MON_137=10.118.202.137:6789/0,MON_156=10.118.202.156:6789/0,MON_9=10.118.202.9:6789/0}, election epoch 130, quorum 0,1,2 MON_9,MON_137,MON_156
mdsmap e106: 1/1/1 up {0=ceph239=up:active(laggy or crashed)}
osdmap e381: 11 osds: 11 up, 11 in
pgmap v12714: 384 pgs, 3 pools, 2284 MB data, 1022 objects
10317 MB used, 2386 GB / 2396 GB avail
384 active+clean
and find a coredump log in the attachment.
The mds can't work any more.
BTW,my max_mds was 1 not 2.
Files
Updated by Greg Farnum over 9 years ago
Can you provide the output of "ceph mds dump" and "ceph osd dump"?
It looks like the MDS is trying to access a pool that doesn't exist
Updated by science luo over 9 years ago
- File ceph mds dump.txt ceph mds dump.txt added
- File ceph osd dump.txt ceph osd dump.txt added
Greg Farnum wrote:
Can you provide the output of "ceph mds dump" and "ceph osd dump"?
It looks like the MDS is trying to access a pool that doesn't exist
Updated by Greg Farnum over 9 years ago
- Status changed from New to Resolved
Whoops, this fell through the cracks.
Anyway, the MDS map has pool 0 set to use for data, but the OSDMap doesn't have such a pool (it has a "data" pool, but it's number 3). I think there were several sequences of (unwise) monitor commands one could use to achieve this effect in the past, but we cover it appropriately now so that the OSDMonitor won't let you delete pools in use by the MDS, and the MDS won't let you insert pool IDs which don't exist.