Project

General

Profile

Actions

Bug #46434

closed

osdc: FAILED ceph_assert(bh->waitfor_read.empty())

Added by Ramana Raja almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
multimds
Component(FS):
osdc
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During Yuri's nautilus backport test run, hit the following failure error in multimds suite's xfs tests running in Ubuntu xenial environment with FUSE clients.

020-07-08T19:24:00.514 INFO:tasks.workunit.client.0.smithi124.stderr:+ ./fsx -z 1MB -N 50000 -p 10000 -l 1048576
2020-07-08T19:24:00.570 INFO:tasks.workunit.client.0.smithi124.stderr:fsx: main: filesystem does not support fallocate mode 0x8, disabling!
2020-07-08T19:24:00.570 INFO:tasks.workunit.client.0.smithi124.stderr:: Operation not supported
2020-07-08T19:24:00.571 INFO:tasks.workunit.client.0.smithi124.stderr:fsx: main: filesystem does not support fallocate mode 0x20, disabling!
2020-07-08T19:24:00.571 INFO:tasks.workunit.client.0.smithi124.stderr:: Operation not supported
2020-07-08T19:24:00.571 INFO:tasks.workunit.client.0.smithi124.stdout:skipping zero size read
2020-07-08T19:24:00.573 INFO:tasks.workunit.client.0.smithi124.stdout:skipping zero length punch hole
2020-07-08T19:24:00.574 INFO:tasks.workunit.client.0.smithi124.stdout:skipping zero size read
2020-07-08T19:24:00.574 INFO:tasks.workunit.client.0.smithi124.stdout:fallocating to largest ever: 0xbc839
2020-07-08T19:24:00.632 INFO:tasks.workunit.client.0.smithi124.stdout:fallocating to largest ever: 0xdc566
2020-07-08T19:24:00.652 INFO:tasks.workunit.client.0.smithi124.stdout:truncating to largest ever: 0xe437a
2020-07-08T19:24:00.787 INFO:tasks.workunit.client.0.smithi124.stdout:truncating to largest ever: 0xf5939
2020-07-08T19:24:00.801 INFO:tasks.workunit.client.0.smithi124.stdout:fallocating to largest ever: 0x100000
2020-07-08T19:24:10.300 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: In function 'void ObjectCacher::Object::discard(loff_t, loff_t, C_GatherBuilder*)' thread 7f3d2f7fe700 time 2020-07-08 19:24:10.305526
2020-07-08T19:24:10.300 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
2020-07-08T19:24:10.301 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: ceph version 14.2.10-29-gaf4ccda (af4ccdaffeffda4b9071601ab3a6baccd01ead14) nautilus (stable)
2020-07-08T19:24:10.302 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f3d8ca52732]
2020-07-08T19:24:10.302 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f3d8ca5290d]
2020-07-08T19:24:10.302 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 3: (ObjectCacher::Object::discard(long, long, C_GatherBuilderBase<Context, C_GatherBase<Context, Context> >*)+0x715) [0x529085]
2020-07-08T19:24:10.302 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 4: (ObjectCacher::_discard(ObjectCacher::ObjectSet*, std::vector<ObjectExtent, std::allocator<ObjectExtent> > const&, C_GatherBuilderBase<Context, C_GatherBase<Context, Context> >*)+0x166) [0x529256]
2020-07-08T19:24:10.302 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 5: (ObjectCacher::discard_writeback(ObjectCacher::ObjectSet*, std::vector<ObjectExtent, std::allocator<ObjectExtent> > const&, Context*)+0x6f) [0x53013f]
2020-07-08T19:24:10.303 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 6: (Client::_invalidate_inode_cache(Inode*, long, long)+0xfd) [0x46e1cd]
2020-07-08T19:24:10.303 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 7: (Client::_fallocate(Fh*, int, long, long)+0x275) [0x4cd0a5]
2020-07-08T19:24:10.303 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 8: (Client::ll_fallocate(Fh*, int, long, long)+0x209) [0x4ce219]
2020-07-08T19:24:10.303 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 9: ceph-fuse() [0x44ffab]
2020-07-08T19:24:10.303 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 10: (()+0x13a42) [0x7f3d95758a42]
2020-07-08T19:24:10.304 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 11: (()+0x15679) [0x7f3d9575a679]
2020-07-08T19:24:10.304 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 12: (()+0x11e38) [0x7f3d95756e38]
2020-07-08T19:24:10.304 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 13: (()+0x76ba) [0x7f3d8c56d6ba]
2020-07-08T19:24:10.304 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr: 14: (clone()+0x6d) [0x7f3d8bd9641d]
2020-07-08T19:24:10.304 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:2020-07-08 19:24:10.302 7f3d2f7fe700 -1 /build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: In function 'void ObjectCacher::Object::discard(loff_t, loff_t, C_GatherBuilder*)' thread 7f3d2f7fe700 time 2020-07-08 19:24:10.305526
2020-07-08T19:24:10.305 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
2020-07-08T19:24:10.305 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:

More info here,
https://pulpito.ceph.com/yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5209626/

Yuri ran the multimds suite twice. Both the times observed the above issue only in ubuntu xenial environments.
https://pulpito.ceph.com/yuriw-2020-07-06_17:30:41-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/
https://pulpito.ceph.com/yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #48097: nautilus: osdc: FAILED ceph_assert(bh->waitfor_read.empty())ResolvedWei-Chung ChengActions
Copied to CephFS - Backport #48098: octopus: osdc: FAILED ceph_assert(bh->waitfor_read.empty())ResolvedWei-Chung ChengActions
Actions #1

Updated by Ramana Raja almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by Ramana Raja almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by Patrick Donnelly almost 4 years ago

  • Subject changed from nautilus: xfs test failure in multi-mds suite to nautilus: osdc: FAILED ceph_assert(bh->waitfor_read.empty())
  • Description updated (diff)
  • Priority changed from Normal to High
  • Target version set to v16.0.0
  • Backport set to octopus,nautilus
  • Component(FS) osdc added

Interesting that it was reproducible for two runs. We've not seen this before but I'm suspicious that it probably exists on master too.

Actions #7

Updated by Patrick Donnelly over 3 years ago

  • Subject changed from nautilus: osdc: FAILED ceph_assert(bh->waitfor_read.empty()) to osdc: FAILED ceph_assert(bh->waitfor_read.empty())

Seems this started sometime around June 10th:

$ zgrep 'FAILED ceph_assert(bh->waitfor_read.empty())' *-2020-[01][0-9]-*multimds*/*/teu*log.gz
teuthology-2020-07-08_04:15:02-multimds-master-testing-basic-smithi/5208385/teuthology.log.gz:2020-07-09T13:07:42.236 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi120.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-3255-g02fc51deb7/rpm/el8/BUILD/ceph-16.0.0-3255-g02fc51deb7/src/osdc/ObjectCacher.cc: 583: FAILED ceph_assert(bh->waitfor_read.empty())
teuthology-2020-07-08_04:15:02-multimds-master-testing-basic-smithi/5208385/teuthology.log.gz:2020-07-09T13:07:42.274 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi120.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-3255-g02fc51deb7/rpm/el8/BUILD/ceph-16.0.0-3255-g02fc51deb7/src/osdc/ObjectCacher.cc: 583: FAILED ceph_assert(bh->waitfor_read.empty())
teuthology-2020-07-08_04:15:02-multimds-master-testing-basic-smithi/5208385/teuthology.log.gz:2020-07-09T13:07:42.755 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi120.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-3255-g02fc51deb7/rpm/el8/BUILD/ceph-16.0.0-3255-g02fc51deb7/src/osdc/ObjectCacher.cc: 583: FAILED ceph_assert(bh->waitfor_read.empty())
teuthology-2020-07-08_04:15:02-multimds-master-testing-basic-smithi/5208385/teuthology.log.gz:2020-07-09T13:07:43.219 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi120.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-3255-g02fc51deb7/rpm/el8/BUILD/ceph-16.0.0-3255-g02fc51deb7/src/osdc/ObjectCacher.cc: 583: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-06-10_18:00:48-multimds-wip-yuri3-testing-2020-06-10-1530-nautilus-distro-basic-smithi/5134213/teuthology.log.gz:2020-06-10T20:27:29.170 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-14.2.9-806-g042da806d81/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-06-10_18:00:48-multimds-wip-yuri3-testing-2020-06-10-1530-nautilus-distro-basic-smithi/5134213/teuthology.log.gz:2020-06-10T20:27:29.177 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-14.2.9-806-g042da806d81/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-06-10_18:00:48-multimds-wip-yuri3-testing-2020-06-10-1530-nautilus-distro-basic-smithi/5134213/teuthology.log.gz:2020-06-10T20:27:29.201 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-14.2.9-806-g042da806d81/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-06-10_18:00:48-multimds-wip-yuri3-testing-2020-06-10-1530-nautilus-distro-basic-smithi/5134213/teuthology.log.gz:2020-06-10T20:27:29.213 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-14.2.9-806-g042da806d81/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-06_17:30:41-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5204031/teuthology.log.gz:2020-07-06T23:25:36.798 INFO:tasks.cephfs.fuse_mount.ceph-fuse.1.smithi119.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-06_17:30:41-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5204031/teuthology.log.gz:2020-07-06T23:25:36.802 INFO:tasks.cephfs.fuse_mount.ceph-fuse.1.smithi119.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-06_17:30:41-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5204031/teuthology.log.gz:2020-07-06T23:25:36.814 INFO:tasks.cephfs.fuse_mount.ceph-fuse.1.smithi119.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-06_17:30:41-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5204031/teuthology.log.gz:2020-07-06T23:25:36.826 INFO:tasks.cephfs.fuse_mount.ceph-fuse.1.smithi119.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5209626/teuthology.log.gz:2020-07-08T19:24:10.300 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5209626/teuthology.log.gz:2020-07-08T19:24:10.305 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5209626/teuthology.log.gz:2020-07-08T19:24:10.317 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-07-08_17:27:06-multimds-wip-yuri3-testing-2020-07-01-1707-nautilus-distro-basic-smithi/5209626/teuthology.log.gz:2020-07-08T19:24:10.328 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi124.stderr:/build/ceph-14.2.10-29-gaf4ccda/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-08-07_15:05:02-multimds-wip-yuri4-testing-2020-08-07-1350-nautilus-distro-basic-smithi/5308731/teuthology.log.gz:2020-08-07T19:22:55.917 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi164.stderr:/build/ceph-14.2.10-268-g8aa51f1562d/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-08-07_15:05:02-multimds-wip-yuri4-testing-2020-08-07-1350-nautilus-distro-basic-smithi/5308731/teuthology.log.gz:2020-08-07T19:22:55.921 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi164.stderr:/build/ceph-14.2.10-268-g8aa51f1562d/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-08-07_15:05:02-multimds-wip-yuri4-testing-2020-08-07-1350-nautilus-distro-basic-smithi/5308731/teuthology.log.gz:2020-08-07T19:22:55.941 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi164.stderr:/build/ceph-14.2.10-268-g8aa51f1562d/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())
yuriw-2020-08-07_15:05:02-multimds-wip-yuri4-testing-2020-08-07-1350-nautilus-distro-basic-smithi/5308731/teuthology.log.gz:2020-08-07T19:22:55.949 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi164.stderr:/build/ceph-14.2.10-268-g8aa51f1562d/src/osdc/ObjectCacher.cc: 636: FAILED ceph_assert(bh->waitfor_read.empty())

It also shows up in master (but only July 8th??) so it's not just a Nautilus regression.

Actions #8

Updated by Patrick Donnelly over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Patrick Donnelly
Actions #9

Updated by Patrick Donnelly over 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 37732
Actions #10

Updated by Patrick Donnelly over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #48097: nautilus: osdc: FAILED ceph_assert(bh->waitfor_read.empty()) added
Actions #12

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #48098: octopus: osdc: FAILED ceph_assert(bh->waitfor_read.empty()) added
Actions #13

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF