Project

General

Profile

Bug #23426

aio thread got No space left on device

Added by Yuri Weinstein over 2 years ago. Updated over 2 years ago.

Status:
Won't Fix
Priority:
Urgent
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
smoke
Pull request ID:
Crash signature:

Description

Seems reproducible on all distros

Runs:

http://pulpito.ceph.com/teuthology-2018-03-20_05:02:01-smoke-master-testing-basic-ovh/ xenial
http://pulpito.ceph.com/teuthology-2018-03-20_07:02:02-smoke-master-testing-basic-ovh/ centos

Jobs:

['2308443', '2308423'] xenial

['2307549', '2307569'] centos

Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2018-03-20_07:02:02-smoke-master-testing-basic-ovh/2308423/teuthology.log

2018-03-20T09:06:06.125 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:06:36.430 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:06:36.552 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:06:36.669 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:06.856 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:06.974 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:07.087 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:37.342 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:37.402 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:37.510 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-03-20T09:07:52.297 INFO:tasks.ceph.osd.0.ovh076.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.1-3181-g820dac9/rpm/el7/BUILD/ceph-13.0.1-3181-g820dac9/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fa42ae61700 time 2018-03-20 09:07:52.286519
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.1-3181-g820dac9/rpm/el7/BUILD/ceph-13.0.1-3181-g820dac9/src/os/bluestore/KernelDevice.cc: 417: FAILED assert(0 == "got unexpected error from io_getevents")
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr:2018-03-20 09:07:52.285 7fa42ae61700 -1 bdev(0x56304ec1a000 /var/lib/ceph/osd/ceph-0/block) _aio_thread got (28) No space left on device
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: ceph version 13.0.1-3181-g820dac9 (820dac980e9416fe05998d50cac633c81a87b9e3) mimic (dev)
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fa43774bc8f]
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 2: (()+0x278e77) [0x7fa43774be77]
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 3: (KernelDevice::_aio_thread()+0xd71) [0x56304c85fe51]
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 4: (KernelDevice::AioCompletionThread::entry()+0xd) [0x56304c86366d]
2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 5: (()+0x7e25) [0x7fa434d25e25]
2018-03-20T09:07:52.299 INFO:tasks.ceph.osd.0.ovh076.stderr: 6: (clone()+0x6d) [0x7fa433e1934d]
2018-03-20T09:07:52.299 INFO:tasks.ceph.osd.0.ovh076.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Related to bluestore - Bug #23333: bluestore: ENODATA on aio Resolved 03/13/2018

History

#1 Updated by Yuri Weinstein over 2 years ago

  • Related to Bug #23333: bluestore: ENODATA on aio added

#2 Updated by Yuri Weinstein over 2 years ago

might be dupe of #23333

#3 Updated by Radoslaw Zarzynski over 2 years ago

  • Assignee set to Radoslaw Zarzynski

#4 Updated by Radoslaw Zarzynski over 2 years ago

  • Status changed from New to 12

Yeah, the assertion came from aio_t::get_return_value. It might be because e.g. driver returned BLK_STS_NOSPC or the node is really running out of space. To know more I would need to take a look on dmesg the linked directory lacks. Is in obtainable?

#5 Updated by Sage Weil over 2 years ago

see remote/*/log/syslog/*

#6 Updated by Sage Weil over 2 years ago

  • Subject changed from "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102)" in smoke to aio thread got No space left on device
  • Status changed from 12 to Won't Fix

this looks like a provisioning/test error, not a bug, if we're getting ENOSPC.

Also available in: Atom PDF