Bug #22624
closedfilestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)
0%
Description
2018-01-06 22:52:00.916 7f6acd131700 20 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys(5545): 3.1_head/#3:80000000::::head# = 0 2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction(2645): on 0x55e1a6b958d0 2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) error opening file /var/lib/ceph/osd/ceph-2/current/3.1_head/10000000161.00000014__head_D58A1871__3 with flags=2: (2) No such file or directory 2018-01-06 22:52:00.916 7f6acd131700 10 filestore(/var/lib/ceph/osd/ceph-2) _setattrs(4638): 3.1_head/#3:8e1851ab:::10000000161.00000014:head# = -2 2018-01-06 22:52:00.916 7f6acd131700 -1 filestore(/var/lib/ceph/osd/ceph-2) error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0) 2018-01-06 22:52:00.916 7f6acd131700 0 filestore(/var/lib/ceph/osd/ceph-2) unexpected error code 2018-01-06 22:52:00.916 7f6acd131700 0 filestore(/var/lib/ceph/osd/ceph-2) transaction dump: ... 2018-01-06 22:52:00.920 7f6acd131700 -1 /build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f6acd131700 time 2018-01-06 22:52:00.920962 /build/ceph-13.0.0-4469-gbb8574a/src/os/filestore/FileStore.cc: 3180: FAILED assert(0 == "unexpected error") ceph version 13.0.0-4469-gbb8574a (bb8574ac361a11c6680c21c2c6f9e9c01854551e) mimic (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x55e19b334635] 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb02) [0x55e19b1056d2] 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x55e19b10ae2b] 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x31a) [0x55e19b10b17a] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb6e) [0x55e19b339a2e] 6: (ThreadPool::WorkThread::entry()+0x10) [0x55e19b33a710] 7: (()+0x76ba) [0x7f6adb2546ba] 8: (clone()+0x6d) [0x7f6ada8633dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
From: /ceph/teuthology-archive/pdonnell-2018-01-06_01:09:20-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/2032459/remote/smithi085/log/ceph-osd.2.log.gz
Updated by Sage Weil over 6 years ago
- Project changed from RADOS to CephFS
I think the problem here is that the object doesn't exist but we're doing omap_setkeys on it.. which doesn't implicitly create the object. This is probably an MDS problem where it didn't create the directory yet but is trying to write dirty dentries out?
Updated by Patrick Donnelly over 6 years ago
That would be an fs bug, sure.
However, shouldn't the OSD not assert due to an object not existing?
Updated by Zheng Yan over 6 years ago
OSD should reply -ENOENT for that case. should be OSD bug
Updated by Kefu Chai over 6 years ago
i agree it's a bug in osd. but i don't think osd should return -ENOENT in this case. as Sage pointed out, it should create an object implicitly.
PrimaryLogPG::issue_repop => ReplicatedBackend::submit_transaction() => generate_transaction() => PGTransaction::safe_create_traverse() => /// because the the init_type is set to "Create" by PrimaryLogPG::maybe_create_new_object() t->touch(coll, goid);
so, the object should be created before we issue omap_setkeys on it.
Updated by Kefu Chai over 6 years ago
i am rerunning the failed test at http://pulpito.ceph.com/kchai-2018-01-12_07:44:06-multimds-wip-pdonnell-testing-20180105.234510-testing-basic-smithi/, in hope to see if this issue is reproduciable. if yes, we might want to enable debug-osd=15, to see if "FileStore::_touch()" gets called or not.
Updated by Joao Eduardo Luis over 6 years ago
FWIW, I am currently reproducing this quite reliably on my dev env, on a quite outdated version of master (cbe78ae6291688de426c2685dec48ab5cfe25c4b).
Is this still worth investigating?
Updated by Joao Eduardo Luis over 6 years ago
Disregard my previous comment; different error message for the same assert was unfortunately buried in the logs. Sorry for the noise.
Updated by jianpeng ma almost 6 years ago
From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, counting from 0)". op 0 mean setxattrs is the first op of Transaction. So no touch op.
Updated by Patrick Donnelly almost 6 years ago
- Project changed from CephFS to RADOS
- Target version set to v13.0.0
- Source set to Q/A
- Backport set to luminous
- Component(RADOS) FileStore, OSD added
Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug.
Updated by Patrick Donnelly over 5 years ago
- Target version changed from v13.0.0 to v14.0.0
- Backport changed from luminous to luminous,mimic
Saw this again in a luminous QA run:
-429> 2018-10-09 13:30:54.388272 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/3.6_head/2000000001e.00000000__head_6AE4AD66__3 with flags=2: (2) No such file or directory -428> 2018-10-09 13:30:54.388279 7fe8d5372700 10 filestore(/var/lib/ceph/osd/ceph-1) _setattrs(4636): 3.6_head/#3:66b52756:::2000000001e.00000000:head# = -2 -427> 2018-10-09 13:30:54.388282 7fe8d5372700 -1 filestore(/var/lib/ceph/osd/ceph-1) error (2) No such file or directory not handled on operation 0x5607d37ee638 (3149.1.0, or op 0, counting from 0) -426> 2018-10-09 13:30:54.388286 7fe8d5372700 0 filestore(/var/lib/ceph/osd/ceph-1) unexpected error code
From: /ceph/teuthology-archive/pdonnell-2018-10-09_04:02:38-multimds-luminous-testing-basic-smithi/3121168/remote/smithi047/log/ceph-osd.1.log.gz
Updated by Greg Farnum over 4 years ago
- Related to Bug #40522: on_local_recover doesn't touch? added
Updated by Greg Farnum over 4 years ago
- Is duplicate of Bug #37969: ENOENT on setattrs added