Actions
Bug #15724
closedBluestore OSD high CPU usage on idle cluster, OSD does not start after restart
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Hey,
it is possible fairly consistently trigger high CPU usage on all bluestore OSDs.
Steps to reproduce (I was using 3 OSD servers with 12x4TB disks):- fill the cluster with data using "rados bench" until full
- bump pg full to clear the error
- delete pool and recreate pool with same name (I am not sure if recreating is necessary)
- Watch load avg grow to ~1 per OSD.
I left this cluster for ~2 days in that state. Load didn't drop (data was written per like half a day, so > 2day cleanup is abnormal). Disks are idle.
When stopping OSD with systemd took me 1min30sec. It generated 600MB of logs with:
2016-05-04 06:01:36.069561 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8092b0:::benchmark_data_cephbench-osd07-dca1_44861_object107666:head 2016-05-04 06:01:36.069570 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809399:::benchmark_data_cephbench-osd07-dca1_13126_object1742442:head 2016-05-04 06:01:36.069578 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80952d:::benchmark_data_cephbench-osd08-dca1_47842_object1637214:head 2016-05-04 06:01:36.069588 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809685:::benchmark_data_cephbench-osd07-dca1_44861_object948128:head 2016-05-04 06:01:36.069596 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8098e8:::benchmark_data_cephbench-osd07-dca1_44861_object1075356:head 2016-05-04 06:01:36.069605 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809ac2:::benchmark_data_cephbench-osd07-dca1_44861_object908852:head 2016-05-04 06:01:36.069613 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a0c2:::benchmark_data_cephbench-osd08-dca1_47842_object626327:head 2016-05-04 06:01:36.069623 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a13a:::benchmark_data_cephbench-osd07-dca1_13126_object949563:head 2016-05-04 06:01:36.069631 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a314:::benchmark_data_cephbench-osd07-dca1_44861_object1035979:head 2016-05-04 06:01:36.069641 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a5ad:::benchmark_data_cephbench-osd07-dca1_44861_object455718:head 2016-05-04 06:01:36.069649 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80ab4a:::benchmark_data_cephbench-osd07-dca1_44861_object332394:head 2016-05-04 06:01:36.069658 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b1bd:::benchmark_data_cephbench-osd07-dca1_44861_object618103:head 2016-05-04 06:01:36.069666 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b3ad:::benchmark_data_cephbench-osd07-dca1_44861_object828820:head 2016-05-04 06:01:36.069676 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b466:::benchmark_data_cephbench-osd07-dca1_13126_object1954098:head 2016-05-04 06:01:36.069684 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b4eb:::benchmark_data_cephbench-osd07-dca1_44861_object671049:head
after that it doesn't start.
2016-05-04 06:10:06.089776 7f7a8c383800 -1 bluefs _replay file with link count 0: file(ino 20 size 3867836 mtime 2016-05-01 17:52:40.543611 bdev 1 extents [1:49283072+1048576,1:56623104+1048576,1:61865984+2097152]) 2016-05-04 06:10:06.089798 7f7a8c383800 -1 bluefs mount failed to replay log: (5) Input/output error 2016-05-04 06:10:06.089814 7f7a8c383800 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db failed bluefs mount: (5) Input/output error 2016-05-04 06:10:06.089821 7f7a8c383800 1 bdev(/var/lib/ceph/osd/ceph-0/block) close 2016-05-04 06:10:06.338050 7f7a8c383800 1 bdev(/var/lib/ceph/osd/ceph-0/block) close 2016-05-04 06:10:06.594479 7f7a8c383800 -1 osd.0 0 OSD:init: unable to mount object store 2016-05-04 06:10:06.594498 7f7a8c383800 -1 ESC[0;31m ** ERROR: osd init failed: (5) Input/output errorESC[0m
OSD shutdown posted with ceph-post-file: 99e2bbb1-61a1-4fa1-bf1b-b965a9f048fa
perf record of OSD before shutdown: 0c3af881-5c7d-4508-b291-7c1b12c3acde
Also attaching osd startup log.
Currently I am leaving this cluster as is. So I can take some more debugging info if needed
Files
Actions