Project

General

Profile

Actions

Bug #22624

closed

filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)

Added by Patrick Donnelly over 6 years ago. Updated over 4 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
FileStore, OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-01-06 22:52:00.916 7f6acd131700 20 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys(5545): 3.1_head/#3:80000000::::head# = 0
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction(2645): on 0x55e1a6b958d0
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) error opening file /var/lib/ceph/osd/ceph-2/current/3.1_head/10000000161.00000014__head_D58A1871__3 with flags=2: (2) No such file or directory
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _setattrs(4638): 3.1_head/#3:8e1851ab:::10000000161.00000014:head# = -2
2018-01-06 22:52:00.916 7f6acd131700 -1 filestore(/var/lib/ceph/osd/ceph-2)  error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)
2018-01-06 22:52:00.916 7f6acd131700  0 filestore(/var/lib/ceph/osd/ceph-2) unexpected error code
2018-01-06 22:52:00.916 7f6acd131700  0 filestore(/var/lib/ceph/osd/ceph-2)  transaction dump:
...
2018-01-06 22:52:00.920 7f6acd131700 -1 /build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f6acd131700 time 2018-01-06 22:52:00.920962
/build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: 3180: FAILED assert(0 == "unexpected error")

 ceph version 13.0.0-4469-gbb8574a (bb8574ac361a11c6680c21c2c6f9e9c01854551e) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x55e19b334635]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb02) [0x55e19b1056d2]
 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x55e19b10ae2b]
 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x31a) [0x55e19b10b17a]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb6e) [0x55e19b339a2e]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x55e19b33a710]
 7: (()+0x76ba) [0x7f6adb2546ba]
 8: (clone()+0x6d) [0x7f6ada8633dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From: /ceph/teuthology-archive/pdonnell-2018-01-06_01:09:20-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/2032459/remote/smithi085/log/ceph-osd.2.log.gz


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #40522: on_local_recover doesn't touch?Can't reproduce

Actions
Is duplicate of RADOS - Bug #37969: ENOENT on setattrsCan't reproduce01/18/2019

Actions
Actions #1

Updated by Sage Weil over 6 years ago

  • Project changed from RADOS to CephFS

I think the problem here is that the object doesn't exist but we're doing omap_setkeys on it.. which doesn't implicitly create the object. This is probably an MDS problem where it didn't create the directory yet but is trying to write dirty dentries out?

Actions #2

Updated by Patrick Donnelly over 6 years ago

That would be an fs bug, sure.

However, shouldn't the OSD not assert due to an object not existing?

Actions #3

Updated by Zheng Yan over 6 years ago

OSD should reply -ENOENT for that case. should be OSD bug

Actions #4

Updated by Kefu Chai over 6 years ago

i agree it's a bug in osd. but i don't think osd should return -ENOENT in this case. as Sage pointed out, it should create an object implicitly.

PrimaryLogPG::issue_repop =>
  ReplicatedBackend::submit_transaction() =>
    generate_transaction() =>
      PGTransaction::safe_create_traverse() =>
        /// because the the init_type is set to "Create" by PrimaryLogPG::maybe_create_new_object()
        t->touch(coll, goid);

so, the object should be created before we issue omap_setkeys on it.

Actions #5

Updated by Kefu Chai over 6 years ago

i am rerunning the failed test at http://pulpito.ceph.com/kchai-2018-01-12_07:44:06-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/, in hope to see if this issue is reproduciable. if yes, we might want to enable debug-osd=15, to see if "FileStore::_touch()" gets called or not.

Actions #6

Updated by Joao Eduardo Luis over 6 years ago

FWIW, I am currently reproducing this quite reliably on my dev env, on a quite outdated version of master (cbe78ae6291688de426c2685dec48ab5cfe25c4b).

Is this still worth investigating?

Actions #7

Updated by Joao Eduardo Luis over 6 years ago

Disregard my previous comment; different error message for the same assert was unfortunately buried in the logs. Sorry for the noise.

Actions #8

Updated by jianpeng ma almost 6 years ago

From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)". op 0 mean setxattrs is the first op of Transaction. So no touch op.

Actions #9

Updated by Patrick Donnelly almost 6 years ago

  • Project changed from CephFS to RADOS
  • Target version set to v13.0.0
  • Source set to Q/A
  • Backport set to luminous
  • Component(RADOS) FileStore, OSD added

Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug.

Actions #10

Updated by Josh Durgin almost 6 years ago

  • Status changed from New to 12
Actions #11

Updated by Patrick Donnelly over 5 years ago

  • Target version changed from v13.0.0 to v14.0.0
  • Backport changed from luminous to luminous,mimic

Saw this again in a luminous QA run:

  -429> 2018-10-09 13:30:54.388272 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/3.6_head/2000000001e.00000000__head_6AE4AD66__3 with flags=2: (2) No such file or directory
  -428> 2018-10-09 13:30:54.388279 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) _setattrs(4636): 3.6_head/#3:66b52756:::2000000001e.00000000:head# = -2
  -427> 2018-10-09 13:30:54.388282 7fe8d5372700 -1 filestore(/var/lib/ceph/osd/ceph-1)  error (2) No such file or directory not handled on operation 0x5607d37ee638 (3149.1.0, or op 0, counting from 0)
  -426> 2018-10-09 13:30:54.388286 7fe8d5372700  0 filestore(/var/lib/ceph/osd/ceph-1) unexpected error code

From: /ceph/teuthology-archive/pdonnell-2018-10-09_04:02:38-multimds-luminous-testing-basic-smithi/3121168/remote/smithi047/log/ceph-osd.1.log.gz

Actions #12

Updated by Greg Farnum over 4 years ago

  • Related to Bug #40522: on_local_recover doesn't touch? added
Actions #13

Updated by Greg Farnum over 4 years ago

  • Is duplicate of Bug #37969: ENOENT on setattrs added
Actions #14

Updated by Greg Farnum over 4 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF