Project

General

Profile

Actions

Bug #23333

closed

bluestore: ENODATA on aio

Added by Robert Sander about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since 3 days one of 18 BlueStore OSDs is constantly crashing:

2018-03-10 04:01:45.366202 mon.ceph01 mon.0 192.168.44.65:6789/0 3977 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.6)
2018-03-11 01:53:06.019025 mon.ceph01 mon.0 192.168.44.65:6789/0 8132 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.10)
2018-03-11 03:33:24.455042 mon.ceph01 mon.0 192.168.44.65:6789/0 8477 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.17)
2018-03-12 03:36:50.437396 mon.ceph01 mon.0 192.168.44.65:6789/0 12976 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.6)
2018-03-12 04:32:44.685349 mon.ceph01 mon.0 192.168.44.65:6789/0 13237 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 05:39:14.787894 mon.ceph01 mon.0 192.168.44.65:6789/0 13498 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.10)
2018-03-12 06:31:09.084123 mon.ceph01 mon.0 192.168.44.65:6789/0 13753 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 07:22:27.192456 mon.ceph01 mon.0 192.168.44.65:6789/0 13968 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 08:18:47.625200 mon.ceph01 mon.0 192.168.44.65:6789/0 14172 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.6)
2018-03-12 09:15:47.241273 mon.ceph01 mon.0 192.168.44.65:6789/0 14455 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.15)
2018-03-12 10:10:27.787580 mon.ceph01 mon.0 192.168.44.65:6789/0 14654 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.10)
2018-03-12 11:04:01.766949 mon.ceph01 mon.0 192.168.44.65:6789/0 14923 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.13)
2018-03-12 11:57:24.455534 mon.ceph01 mon.0 192.168.44.65:6789/0 15232 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.17)
2018-03-12 12:49:38.911668 mon.ceph01 mon.0 192.168.44.65:6789/0 15474 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 13:45:46.821370 mon.ceph01 mon.0 192.168.44.65:6789/0 15727 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.1)
2018-03-12 14:37:54.494178 mon.ceph01 mon.0 192.168.44.65:6789/0 15904 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 15:31:14.487131 mon.ceph01 mon.0 192.168.44.65:6789/0 16122 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.13)
2018-03-12 16:22:38.799025 mon.ceph01 mon.0 192.168.44.65:6789/0 16353 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.16)
2018-03-12 17:14:04.971014 mon.ceph01 mon.0 192.168.44.65:6789/0 16560 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-12 18:07:00.722340 mon.ceph01 mon.0 192.168.44.65:6789/0 16789 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.15)
2018-03-12 18:59:43.827737 mon.ceph01 mon.0 192.168.44.65:6789/0 17047 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.15)
2018-03-12 19:50:38.287630 mon.ceph01 mon.0 192.168.44.65:6789/0 17296 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.9)
2018-03-12 20:55:07.389005 mon.ceph01 mon.0 192.168.44.65:6789/0 17594 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.17)
2018-03-12 21:54:15.907260 mon.ceph01 mon.0 192.168.44.65:6789/0 17865 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.1)
2018-03-12 22:52:19.808324 mon.ceph01 mon.0 192.168.44.65:6789/0 18064 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.16)
2018-03-12 23:46:47.508301 mon.ceph01 mon.0 192.168.44.65:6789/0 18317 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.2)
2018-03-13 00:39:21.711974 mon.ceph01 mon.0 192.168.44.65:6789/0 18516 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.16)
2018-03-13 01:31:55.385104 mon.ceph01 mon.0 192.168.44.65:6789/0 18691 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.6)
2018-03-13 02:26:23.183917 mon.ceph01 mon.0 192.168.44.65:6789/0 18912 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.9)
2018-03-13 04:01:22.455514 mon.ceph01 mon.0 192.168.44.65:6789/0 19206 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.10)
2018-03-13 05:09:33.791115 mon.ceph01 mon.0 192.168.44.65:6789/0 19493 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.1)
2018-03-13 06:08:01.085420 mon.ceph01 mon.0 192.168.44.65:6789/0 19726 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.10)
2018-03-13 07:47:49.104523 mon.ceph01 mon.0 192.168.44.65:6789/0 20058 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-13 08:42:45.958985 mon.ceph01 mon.0 192.168.44.65:6789/0 20280 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-13 09:38:24.619503 mon.ceph01 mon.0 192.168.44.65:6789/0 20508 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.1)
2018-03-13 10:32:51.563617 mon.ceph01 mon.0 192.168.44.65:6789/0 20723 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.11)
2018-03-13 11:25:48.127581 mon.ceph01 mon.0 192.168.44.65:6789/0 20987 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.16)
2018-03-13 12:20:24.128393 mon.ceph01 mon.0 192.168.44.65:6789/0 21255 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.13)
2018-03-13 13:15:43.325775 mon.ceph01 mon.0 192.168.44.65:6789/0 21443 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.6)
2018-03-13 14:09:19.403727 mon.ceph01 mon.0 192.168.44.65:6789/0 21687 : cluster [INF] osd.12 failed (root=default,host=ceph02,ssd=sdh02) (connection refused reported by osd.15)

The latest logfile of that OSD is attached. At the time of the last crash it contains:

1> 2018-03-13 14:09:18.997826 7f9ef880a700 -1 bdev(0x558ab92fafc0 /var/lib/ceph/osd/ceph-12/block) _aio_thread got (61) No data available
0> 2018-03-13 14:09:18.999284 7f9ef880a700 -1 /build/ceph-12.2.4/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f9ef880a700 time 2018-03

13 14:09:18.997856
/build/ceph-12.2.4/src/os/bluestore/KernelDevice.cc: 379: FAILED assert(0 == "got unexpected error from io_getevents")
ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558aae2fc872]
2: (KernelDevice::_aio_thread()+0x16ca) [0x558aae29c6ea]
3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x558aae2a19cd]
4: (()+0x76ba) [0x7f9f01b746ba]
5: (clone()+0x6d) [0x7f9f00beb41d]

Files

ceph-osd.12.log.gz (306 KB) ceph-osd.12.log.gz Robert Sander, 03/13/2018 02:05 PM

Related issues 2 (0 open2 closed)

Related to bluestore - Bug #23426: aio thread got No space left on deviceWon't FixRadoslaw Zarzynski03/20/2018

Actions
Copied to bluestore - Backport #23672: luminous: bluestore: ENODATA on aioResolvedPrashant DActions
Actions

Also available in: Atom PDF