Project

General

Profile

Actions

Bug #15724

closed

Bluestore OSD high CPU usage on idle cluster, OSD does not start after restart

Added by Henrik Korkuc almost 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hey,

it is possible fairly consistently trigger high CPU usage on all bluestore OSDs.

Steps to reproduce (I was using 3 OSD servers with 12x4TB disks):
  • fill the cluster with data using "rados bench" until full
  • bump pg full to clear the error
  • delete pool and recreate pool with same name (I am not sure if recreating is necessary)
  • Watch load avg grow to ~1 per OSD.

I left this cluster for ~2 days in that state. Load didn't drop (data was written per like half a day, so > 2day cleanup is abnormal). Disks are idle.

When stopping OSD with systemd took me 1min30sec. It generated 600MB of logs with:

2016-05-04 06:01:36.069561 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8092b0:::benchmark_data_cephbench-osd07-dca1_44861_object107666:head
2016-05-04 06:01:36.069570 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809399:::benchmark_data_cephbench-osd07-dca1_13126_object1742442:head
2016-05-04 06:01:36.069578 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80952d:::benchmark_data_cephbench-osd08-dca1_47842_object1637214:head
2016-05-04 06:01:36.069588 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809685:::benchmark_data_cephbench-osd07-dca1_44861_object948128:head
2016-05-04 06:01:36.069596 7f3e717f3700 20 snap_mapper.remove_oid 1:2f8098e8:::benchmark_data_cephbench-osd07-dca1_44861_object1075356:head
2016-05-04 06:01:36.069605 7f3e717f3700 20 snap_mapper.remove_oid 1:2f809ac2:::benchmark_data_cephbench-osd07-dca1_44861_object908852:head
2016-05-04 06:01:36.069613 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a0c2:::benchmark_data_cephbench-osd08-dca1_47842_object626327:head
2016-05-04 06:01:36.069623 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a13a:::benchmark_data_cephbench-osd07-dca1_13126_object949563:head
2016-05-04 06:01:36.069631 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a314:::benchmark_data_cephbench-osd07-dca1_44861_object1035979:head
2016-05-04 06:01:36.069641 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80a5ad:::benchmark_data_cephbench-osd07-dca1_44861_object455718:head
2016-05-04 06:01:36.069649 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80ab4a:::benchmark_data_cephbench-osd07-dca1_44861_object332394:head
2016-05-04 06:01:36.069658 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b1bd:::benchmark_data_cephbench-osd07-dca1_44861_object618103:head
2016-05-04 06:01:36.069666 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b3ad:::benchmark_data_cephbench-osd07-dca1_44861_object828820:head
2016-05-04 06:01:36.069676 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b466:::benchmark_data_cephbench-osd07-dca1_13126_object1954098:head
2016-05-04 06:01:36.069684 7f3e717f3700 20 snap_mapper.remove_oid 1:2f80b4eb:::benchmark_data_cephbench-osd07-dca1_44861_object671049:head

after that it doesn't start.

2016-05-04 06:10:06.089776 7f7a8c383800 -1 bluefs _replay file with link count 0: file(ino 20 size 3867836 mtime 2016-05-01 17:52:40.543611 bdev 1 extents [1:49283072+1048576,1:56623104+1048576,1:61865984+2097152])
2016-05-04 06:10:06.089798 7f7a8c383800 -1 bluefs mount failed to replay log: (5) Input/output error
2016-05-04 06:10:06.089814 7f7a8c383800 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db failed bluefs mount: (5) Input/output error
2016-05-04 06:10:06.089821 7f7a8c383800  1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-05-04 06:10:06.338050 7f7a8c383800  1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-05-04 06:10:06.594479 7f7a8c383800 -1 osd.0 0 OSD:init: unable to mount object store
2016-05-04 06:10:06.594498 7f7a8c383800 -1 ESC[0;31m ** ERROR: osd init failed: (5) Input/output errorESC[0m

OSD shutdown posted with ceph-post-file: 99e2bbb1-61a1-4fa1-bf1b-b965a9f048fa

perf record of OSD before shutdown: 0c3af881-5c7d-4508-b291-7c1b12c3acde

Also attaching osd startup log.

Currently I am leaving this cluster as is. So I can take some more debugging info if needed


Files

ceph-osd.0.log (7.92 KB) ceph-osd.0.log Henrik Korkuc, 05/04/2016 07:20 AM
Actions #1

Updated by Star Guo almost 8 years ago

I aslo suffer this error with ceph 10.2.2 :

2016-06-20 01:09:23.885106 7f5dea281800 0 pidfile_write: ignore empty --pid-file
2016-06-20 01:09:23.898878 7f5dea281800 -1 WARNING: the following dangerous and experimental features are enabled:
2016-06-20 01:09:23.905258 7f5dea281800 1 bluestore(/var/lib/ceph/osd/ceph-0) mount path /var/lib/ceph/osd/ceph-0
2016-06-20 01:09:23.905308 7f5dea281800 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel
2016-06-20 01:09:23.905673 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2016-06-20 01:09:23.905731 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) open size 107268259328 (102298 MB) block_size 4096 (4096 B)
2016-06-20 01:09:23.907550 7f5dea281800 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel
2016-06-20 01:09:23.907925 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2016-06-20 01:09:23.907962 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) open size 107268259328 (102298 MB) block_size 4096 (4096 B)
2016-06-20 01:09:23.907976 7f5dea281800 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block size 102298 MB
2016-06-20 01:09:23.907990 7f5dea281800 1 bluefs mount
2016-06-20 01:09:23.955159 7f5dea281800 -1 bluefs _replay file with link count 0: file(ino 35 size 3963965 mtime 2016-06-17 23:18:01.985723 bdev 1 extents [1:68157440+2097152,1:74448896+1048576,1:79691776+1048576])
2016-06-20 01:09:23.955191 7f5dea281800 -1 bluefs mount failed to replay log: (5) Input/output error
2016-06-20 01:09:23.955204 7f5dea281800 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db failed bluefs mount: (5) Input/output error
2016-06-20 01:09:23.955207 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-06-20 01:09:24.163883 7f5dea281800 1 bdev(/var/lib/ceph/osd/ceph-0/block) close
2016-06-20 01:09:24.409654 7f5dea281800 -1 osd.0 0 OSD:init: unable to mount object store
2016-06-20 01:09:24.409673 7f5dea281800 -1 *
ERROR: osd init failed: (5) Input/output error

Actions #2

Updated by Anonymous almost 8 years ago

I have seen the same problem, running with "ceph -v" 85bb43e111692989d2296a389ce45377d2297d6f.
There are actually three such inodes in my transaction log trace, inodes 113, 109, and 102,
see below. In all three cases, their files were unlinked while they were open, and then
later the file for each inode was closed. It is the close of the file after the file was
deleted that causes the problem.

Routine _drop_link() gets called when the file is deleted and the inode link count
goes to zero. It releases the inode and the file's disk extents, and it sets the
'deleted' variable to true, then returns. The inode and dis extent fields in the
FileRef->fnode are left looking like they are in use.

The BlueRocksWritableFile::Close() routine calls fs->truncate() if last_allocated_block
is greater than zero. Routine truncate() calls _truncate(), which lowers the file's size
in the FileRef->fnode structure, then issues an op_file_update call to log the fnode
change to the transaction log. There is no check of the 'deleted' variable here, so the
extraneous op_file_update after the op_file_remove means that each inode will have a
link count of zero at the end of the transaction log.

More background on this can be found in the ceph-devel reflector, look for the
subject "bluestore bug#15724: bluefs _replay file with link count 0:".
Here are the relevant transaction log entries for the three inodes exhibiting the problem.

...

2016-07-13 16:03:03.925186 7fb6617c1800 20 bluefs _replay 0x5b4000: op_file_update file(ino 113 size 71359362 mtime 2016-07-12 17:05:53.738683 bdev 1 extents [1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+2097152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:960495616+58720256])
2016-07-13 16:03:03.925197 7fb6617c1800 30 bluefs _get_file ino 113 = 0x7fb66aafbb90

...

2016-07-13 16:03:03.926040 7fb6617c1800 20 bluefs _replay 0x5b9000: op_file_update file(ino 102 size 71359405 mtime 2016-07-12 17:05:54.615190 bdev 1 extents [1:78643200+1048576,1:191889408+1048576,1:197132288+1048576,1:202375168+2097152,1:208666624+2097152,1:214958080+2097152,1:221249536+2097152,1:227540992+1048576,1:232783872+2097152,1:239075328+2097152,1:245366784+2097152,1:251658240+2097152,1:257949696+1048576,1:263192576+2097152,1:269484032+2097152,1:275775488+2097152,1:282066944+2097152,1:288358400+2097152,1:294649856+1048576,1:299892736+2097152,1:306184192+2097152,1:312475648+2097152,1:318767104+1048576,1:324009984+2097152,1:330301440+2097152,1:336592896+2097152,1:342884352+2097152,1:349175808+1048576,1:354418688+2097152,1:360710144+2097152,1:367001600+1048576,1:372244480+1048576,1:377487360+2097152,1:383778816+1048576,1:389021696+1048576,1:394264576+2097152,1:400556032+1048576,1:405798912+3145728,1:507510784+5242880])
2016-07-13 16:03:03.926048 7fb6617c1800 30 bluefs _get_file ino 102 = 0x7fb66aafbe60

...

2016-07-13 16:03:03.926101 7fb6617c1800 20 bluefs _replay 0x5ba000: op_file_update file(ino 109 size 71354939 mtime 2016-07-12 22:41:33.028030 bdev 1 extents [1:512753664+31457280,1:545259520+4194304,1:550502400+36700160])
2016-07-13 16:03:03.926103 7fb6617c1800 30 bluefs _get_file ino 109 = 0x7fb66aafc910

...

2016-07-13 16:03:03.929676 7fb6617c1800 20 bluefs _read got 32768
2016-07-13 16:03:03.929679 7fb6617c1800 10 bluefs _replay 0x5be000: txn(seq 77752 len 35269 crc 1796157826)
2016-07-13 16:03:03.929681 7fb6617c1800 20 bluefs _replay 0x5be000: op_dir_unlink db/000154.log
2016-07-13 16:03:03.929683 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_remove 109
2016-07-13 16:03:03.929684 7fb6617c1800 20 bluefs _replay 0x5be000: op_dir_unlink db/000153.log
2016-07-13 16:03:03.929685 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_remove 102
2016-07-13 16:03:03.929686 7fb6617c1800 20 bluefs _replay 0x5be000: op_dir_unlink db/000151.log
2016-07-13 16:03:03.929686 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_remove 113
2016-07-13 16:03:03.929687 7fb6617c1800 20 bluefs _replay 0x5be000: op_dir_unlink db/000150.log
2016-07-13 16:03:03.929688 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_remove 111
2016-07-13 16:03:03.929689 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_update file(ino 113 size 71354938 mtime 2016-07-12 17:05:53.738683 bdev 1 extents [1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+2097152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:960495616+58720256])
2016-07-13 16:03:03.929693 7fb6617c1800 30 bluefs _get_file ino 113 = 0x7fb66aafc880 (new)
2016-07-13 16:03:03.929695 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_update file(ino 102 size 68557043 mtime 2016-07-12 17:05:54.615190 bdev 1 extents [1:78643200+1048576,1:191889408+1048576,1:197132288+1048576,1:202375168+2097152,1:208666624+2097152,1:214958080+2097152,1:221249536+2097152,1:227540992+1048576,1:232783872+2097152,1:239075328+2097152,1:245366784+2097152,1:251658240+2097152,1:257949696+1048576,1:263192576+2097152,1:269484032+2097152,1:275775488+2097152,1:282066944+2097152,1:288358400+2097152,1:294649856+1048576,1:299892736+2097152,1:306184192+2097152,1:312475648+2097152,1:318767104+1048576,1:324009984+2097152,1:330301440+2097152,1:336592896+2097152,1:342884352+2097152,1:349175808+1048576,1:354418688+2097152,1:360710144+2097152,1:367001600+1048576,1:372244480+1048576,1:377487360+2097152,1:383778816+1048576,1:389021696+1048576,1:394264576+2097152,1:400556032+1048576,1:405798912+3145728,1:507510784+5242880])
2016-07-13 16:03:03.929702 7fb6617c1800 30 bluefs _get_file ino 102 = 0x7fb66aafbb90 (new)
2016-07-13 16:03:03.929703 7fb6617c1800 20 bluefs _replay 0x5be000: op_file_update file(ino 109 size 64421913 mtime 2016-07-12 22:41:33.028030 bdev 1 extents [1:512753664+31457280,1:545259520+4194304,1:550502400+36700160])
2016-07-13 16:03:03.929705 7fb6617c1800 30 bluefs _get_file ino 109 = 0x7fb66aafbe60 (new)

...

2016-07-13 16:03:03.933669 7fb6617c1800 20 bluefs _read got 4096
2016-07-13 16:03:03.933671 7fb6617c1800 10 bluefs _replay 0x5cf000: stop: uuid 4ef0859e-06e8-51c2-9612-add7269b0100 != super.uuid 7eb30434-317a-4844-b8d1-6f8cb37585d1
2016-07-13 16:03:03.933678 7fb6617c1800 10 bluefs _replay log file size was 0x5cf000
2016-07-13 16:03:03.933731 7fb6617c1800 -1 bluefs _replay file with link count 0: file(ino 113 size 71354938 mtime 2016-07-12 17:05:53.738683 bdev 1 extents [1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+2097152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:960495616+58720256])
2016-07-13 16:03:03.933739 7fb6617c1800 -1 bluefs mount failed to replay log: (5) Input/output error
2016-07-13 16:03:03.933753 7fb6617c1800 20 bluefs _stop_alloc
2016-07-13 16:03:03.933754 7fb6617c1800 10 bitmapalloc:shutdown instance 140421450426752
2016-07-13 16:03:03.933924 7fb6617c1800 -1 bluestore(/var/lib/ceph/osd/ceph-20) _open_db failed bluefs mount: (5) Input/output error
2016-07-13 16:03:03.933931 7fb6617c1800 1 bdev(/var/lib/ceph/osd/ceph-20/block) close
2016-07-13 16:03:03.962063 7fb6617c1800 1 bdev(/var/lib/ceph/osd/ceph-20/block) close
2016-07-13 16:03:04.212082 7fb6617c1800 -1 osd.20 0 OSD:init: unable to mount object store

Actions #3

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce

if you see this on kraken or later, pelase reopen!

Actions

Also available in: Atom PDF