Project

General

Profile

Bug #22624

filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)

Added by Patrick Donnelly 11 months ago. Updated 2 months ago.

Status:
Verified
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
01/08/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
FileStore, OSD
Pull request ID:

Description

2018-01-06 22:52:00.916 7f6acd131700 20 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys(5545): 3.1_head/#3:80000000::::head# = 0
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction(2645): on 0x55e1a6b958d0
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) error opening file /var/lib/ceph/osd/ceph-2/current/3.1_head/10000000161.00000014__head_D58A1871__3 with flags=2: (2) No such file or directory
2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _setattrs(4638): 3.1_head/#3:8e1851ab:::10000000161.00000014:head# = -2
2018-01-06 22:52:00.916 7f6acd131700 -1 filestore(/var/lib/ceph/osd/ceph-2)  error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)
2018-01-06 22:52:00.916 7f6acd131700  0 filestore(/var/lib/ceph/osd/ceph-2) unexpected error code
2018-01-06 22:52:00.916 7f6acd131700  0 filestore(/var/lib/ceph/osd/ceph-2)  transaction dump:
...
2018-01-06 22:52:00.920 7f6acd131700 -1 /build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f6acd131700 time 2018-01-06 22:52:00.920962
/build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: 3180: FAILED assert(0 == "unexpected error")

 ceph version 13.0.0-4469-gbb8574a (bb8574ac361a11c6680c21c2c6f9e9c01854551e) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x55e19b334635]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb02) [0x55e19b1056d2]
 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x55e19b10ae2b]
 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x31a) [0x55e19b10b17a]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb6e) [0x55e19b339a2e]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x55e19b33a710]
 7: (()+0x76ba) [0x7f6adb2546ba]
 8: (clone()+0x6d) [0x7f6ada8633dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From: /ceph/teuthology-archive/pdonnell-2018-01-06_01:09:20-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/2032459/remote/smithi085/log/ceph-osd.2.log.gz

History

#1 Updated by Sage Weil 11 months ago

  • Project changed from RADOS to fs

I think the problem here is that the object doesn't exist but we're doing omap_setkeys on it.. which doesn't implicitly create the object. This is probably an MDS problem where it didn't create the directory yet but is trying to write dirty dentries out?

#2 Updated by Patrick Donnelly 11 months ago

That would be an fs bug, sure.

However, shouldn't the OSD not assert due to an object not existing?

#3 Updated by Zheng Yan 11 months ago

OSD should reply -ENOENT for that case. should be OSD bug

#4 Updated by Kefu Chai 11 months ago

i agree it's a bug in osd. but i don't think osd should return -ENOENT in this case. as Sage pointed out, it should create an object implicitly.

PrimaryLogPG::issue_repop =>
  ReplicatedBackend::submit_transaction() =>
    generate_transaction() =>
      PGTransaction::safe_create_traverse() =>
        /// because the the init_type is set to "Create" by PrimaryLogPG::maybe_create_new_object()
        t->touch(coll, goid);

so, the object should be created before we issue omap_setkeys on it.

#5 Updated by Kefu Chai 11 months ago

i am rerunning the failed test at http://pulpito.ceph.com/kchai-2018-01-12_07:44:06-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/, in hope to see if this issue is reproduciable. if yes, we might want to enable debug-osd=15, to see if "FileStore::_touch()" gets called or not.

#6 Updated by Joao Luis 11 months ago

FWIW, I am currently reproducing this quite reliably on my dev env, on a quite outdated version of master (cbe78ae6291688de426c2685dec48ab5cfe25c4b).

Is this still worth investigating?

#7 Updated by Joao Luis 11 months ago

Disregard my previous comment; different error message for the same assert was unfortunately buried in the logs. Sorry for the noise.

#8 Updated by jianpeng ma 8 months ago

From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)". op 0 mean setxattrs is the first op of Transaction. So no touch op.

#9 Updated by Patrick Donnelly 8 months ago

  • Project changed from fs to RADOS
  • Target version set to v13.0.0
  • Source set to Q/A
  • Backport set to luminous
  • Component(RADOS) FileStore, OSD added

Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug.

#10 Updated by Josh Durgin 6 months ago

  • Status changed from New to Verified

#11 Updated by Patrick Donnelly 2 months ago

  • Target version changed from v13.0.0 to v14.0.0
  • Backport changed from luminous to luminous,mimic

Saw this again in a luminous QA run:

  -429> 2018-10-09 13:30:54.388272 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/3.6_head/2000000001e.00000000__head_6AE4AD66__3 with flags=2: (2) No such file or directory
  -428> 2018-10-09 13:30:54.388279 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) _setattrs(4636): 3.6_head/#3:66b52756:::2000000001e.00000000:head# = -2
  -427> 2018-10-09 13:30:54.388282 7fe8d5372700 -1 filestore(/var/lib/ceph/osd/ceph-1)  error (2) No such file or directory not handled on operation 0x5607d37ee638 (3149.1.0, or op 0, counting from 0)
  -426> 2018-10-09 13:30:54.388286 7fe8d5372700  0 filestore(/var/lib/ceph/osd/ceph-1) unexpected error code

From: /ceph/teuthology-archive/pdonnell-2018-10-09_04:02:38-multimds-luminous-testing-basic-smithi/3121168/remote/smithi047/log/ceph-osd.1.log.gz

Also available in: Atom PDF