Project

General

Profile

Actions

Bug #15724

closed

Bluestore OSD high CPU usage on idle cluster, OSD does not start after restart

Added by Henrik Korkuc about 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hey,

it is possible fairly consistently trigger high CPU usage on all bluestore OSDs.

Steps to reproduce (I was using 3 OSD servers with 12x4TB disks):
  • fill the cluster with data using "rados bench" until full
  • bump pg full to clear the error
  • delete pool and recreate pool with same name (I am not sure if recreating is necessary)
  • Watch load avg grow to ~1 per OSD.

I left this cluster for ~2 days in that state. Load didn't drop (data was written per like half a day, so > 2day cleanup is abnormal). Disks are idle.

When stopping OSD with systemd took me 1min30sec. It generated 600MB of logs with:

2016-05-04 06:01:36.069561 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8092b0:::benchmark_data_cephbench-osd07-dca1_44861_object107666:head
2016-05-04 06:01:36.069570 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809399:::benchmark_data_cephbench-osd07-dca1_13126_object1742442:head
2016-05-04 06:01:36.069578 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80952d:::benchmark_data_cephbench-osd08-dca1_47842_object1637214:head
2016-05-04 06:01:36.069588 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809685:::benchmark_data_cephbench-osd07-dca1_44861_object948128:head
2016-05-04 06:01:36.069596 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8098e8:::benchmark_data_cephbench-osd07-dca1_44861_object1075356:head
2016-05-04 06:01:36.069605 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809ac2:::benchmark_data_cephbench-osd07-dca1_44861_object908852:head
2016-05-04 06:01:36.069613 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a0c2:::benchmark_data_cephbench-osd08-dca1_47842_object626327:head
2016-05-04 06:01:36.069623 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a13a:::benchmark_data_cephbench-osd07-dca1_13126_object949563:head
2016-05-04 06:01:36.069631 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a314:::benchmark_data_cephbench-osd07-dca1_44861_object1035979:head
2016-05-04 06:01:36.069641 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a5ad:::benchmark_data_cephbench-osd07-dca1_44861_object455718:head
2016-05-04 06:01:36.069649 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80ab4a:::benchmark_data_cephbench-osd07-dca1_44861_object332394:head
2016-05-04 06:01:36.069658 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b1bd:::benchmark_data_cephbench-osd07-dca1_44861_object618103:head
2016-05-04 06:01:36.069666 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b3ad:::benchmark_data_cephbench-osd07-dca1_44861_object828820:head
2016-05-04 06:01:36.069676 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b466:::benchmark_data_cephbench-osd07-dca1_13126_object1954098:head
2016-05-04 06:01:36.069684 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b4eb:::benchmark_data_cephbench-osd07-dca1_44861_object671049:head

after that it doesn't start.

2016-05-04 06:10:06.089776 7f7a8c383800 -1 bluefs _replay file with link count 0: file(ino 20 size 3867836 mtime 2016-05-01 17:52:40.543611 bdev 1 extents [1:49283072+1048576,1:56623104+1048576,1:61865984+2097152])
2016-05-04 06:10:06.089798 7f7a8c383800 -1 bluefs mount failed to replay log: (5) Input/output error
2016-05-04 06:10:06.089814 7f7a8c383800 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db failed bluefs mount: (5) Input/output error
2016-05-04 06:10:06.089821 7f7a8c383800  1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-05-04 06:10:06.338050 7f7a8c383800  1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-05-04 06:10:06.594479 7f7a8c383800 -1 osd.0 0 OSD:init: unable to mount object store
2016-05-04 06:10:06.594498 7f7a8c383800 -1 ESC[0;31m ** ERROR: osd init failed: (5) Input/output errorESC[0m

OSD shutdown posted with ceph-post-file: 99e2bbb1-61a1-4fa1-bf1b-b965a9f048fa

perf record of OSD before shutdown: 0c3af881-5c7d-4508-b291-7c1b12c3acde

Also attaching osd startup log.

Currently I am leaving this cluster as is. So I can take some more debugging info if needed


Files

ceph-osd.0.log (7.92 KB) ceph-osd.0.log Henrik Korkuc, 05/04/2016 07:20 AM
Actions

Also available in: Atom PDF