Project

General

Profile

Actions

Bug #5823

closed

cpu load on cluster node is very high, client can't get data on pg from primary node (cpu hight) ...

Added by Khanh Nguyen Dang Quoc almost 11 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)

env: 3 cluster nodes (10 osds/node), use dedicated network (private network: infiniband over IP)

I meet one big issue on cluster node: when only one cluster node has cpu load is very high (full load 100% for all 24 cores).

-> all attached block device is unreachable.

see in logs files: Have slow requests and exist about 100 pgs stuck unclean, but i wait for along time, system hasn't repaired automatically -> it makes whole system is hang. I must to make down the cluster node with high cpu load

see my logs for more detail

2013-08-01 08:48:14.269903 osd.5 xx.xx.xx:6805/11563 95 : [WRN] 2 slow requests, 2 included below; oldest blocked for > 85.363643 secs
2013-08-01 08:48:14.269909 osd.5 xx.xx.xx:6805/11563 96 : [WRN] slow request 85.363643 seconds old, received at 2013-08-01 08:46:48.906210: osd_op(client.50425.0:128 rbd_header.c4f62ae8944a [watch add cookie 1 ver 0] 3.1662d8c3 e4048) v4 currently reached pg
2013-08-01 08:48:14.269915 osd.5 xx.xx.xx:6805/11563 97 : [WRN] slow request 85.363507 seconds old, received at 2013-08-01 08:46:48.906346: osd_op(client.50425.0:129 rbd_header.c4f62ae8944a [watch add cookie 1 ver 0] 3.1662d8c3 e4048) v4 currently reached pg
2013-08-01 08:48:16.554671 osd.5 xx.xx.xx:6805/11563 98 : [WRN] 2 slow requests, 2 included below; oldest blocked for > 87.648414 secs
2013-08-01 08:48:16.554679 osd.5 xx.xx.xx:6805/11563 99 : [WRN] slow request 87.648414 seconds old, received at 2013-08-01 08:46:48.906210: osd_op(client.50425.0:128 rbd_header.c4f62ae8944a [watch add cookie 1 ver 0] 3.1662d8c3 e4048) v4 currently reached pg
2013-08-01 08:48:16.554684 osd.5 xx.xx.xx:6805/11563 100 : [WRN] slow request 87.648278 seconds old, received at 2013-08-01 08:46:48.906346: osd_op(client.50425.0:129 rbd_header.c4f62ae8944a [watch add cookie 1 ver 0] 3.1662d8c3 e4048) v4 currently reached pg
2013-08-01 08:48:25.400571 mon.0 xx.xx.xx:6789/0 13039 : [INF] pgmap v781157: 12096 pgs: 4037 active+clean, 3544 peering, 4515 active+degraded; 3380 GB data, 6095 GB used, 8956 GB / 15051 GB avail; 5900B/s wr, 1op/s; 313884/1681762 degraded (18.664%)


Files

highload_ceph_osd.png (109 KB) highload_ceph_osd.png Khanh Nguyen Dang Quoc, 08/08/2013 07:23 PM
ceph-osd.15.log.zip (234 KB) ceph-osd.15.log.zip Khanh Nguyen Dang Quoc, 09/10/2013 12:49 AM
Actions

Also available in: Atom PDF