Bug #3698
closedfilestore: ENOENT on clone
0%
Description
Full logs in teuthology:/a/joshd-3631-12-27-12_22.21/29826
-21> 2012-12-29 01:09:41.660730 7f1e33fc0700 0 filestore(/tmp/cephtest/data/osd.2.data) error (2) No such file or directory not handled on operation 30 (17262.0.9, or op 9, counting from 0) -20> 2012-12-29 01:09:41.660752 7f1e33fc0700 0 filestore(/tmp/cephtest/data/osd.2.data) ENOENT on clone suggests osd bug -19> 2012-12-29 01:09:41.660756 7f1e33fc0700 0 filestore(/tmp/cephtest/data/osd.2.data) transaction dump: { "ops": [ { "op_num": 0, "op_name": "remove", "collection": "0.f_head", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 1, "op_name": "remove", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 2, "op_name": "touch", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 3, "op_name": "omap_setheader", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0", "header_length": "0"}, { "op_num": 4, "op_name": "omap_setkeys", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0", "attr_lens": {}}, { "op_num": 5, "op_name": "setattrs", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0", "attr_lens": { "_": 231}}, { "op_num": 6, "op_name": "remove", "collection": "0.f_head", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 7, "op_name": "collection_add", "src_collection": "0.f_head", "dst_collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 8, "op_name": "collection_remove", "collection": "0.f_TEMP", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 9, "op_name": "clonerange2", "collection": "0.f_head", "src_oid": "6d8c2d0f\/plana4621311-45\/head\/\/0", "dst_oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0", "src_offset": 0, "len": 2107129, "dst_offset": 0}, { "op_num": 10, "op_name": "collection_setattr", "collection": "0.f_head", "name": "info", "length": 5}, { "op_num": 11, "op_name": "truncate", "collection": "meta", "oid": "dae1b753\/pginfo_0.f\/0\/\/-1", "offset": 0}, { "op_num": 12, "op_name": "write", "collection": "meta", "oid": "dae1b753\/pginfo_0.f\/0\/\/-1", "length": 1650, "offset": 0, "bufferlist length": 1650}, { "op_num": 13, "op_name": "mkcoll", "collection": "0.f_3df"}, { "op_num": 14, "op_name": "collection_add", "src_collection": "0.f_3df", "dst_collection": "0.f_head", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 15, "op_name": "collection_add", "src_collection": "0.f_3d2", "dst_collection": "0.f_head", "oid": "6d8c2d0f\/plana4621311-45\/3df\/\/0"}, { "op_num": 16, "op_name": "collection_setattr", "collection": "0.f_head", "name": "info", "length": 5}, { "op_num": 17, "op_name": "truncate", "collection": "meta", "oid": "dae1b753\/pginfo_0.f\/0\/\/-1", "offset": 0}, { "op_num": 18, "op_name": "write", "collection": "meta", "oid": "dae1b753\/pginfo_0.f\/0\/\/-1", "length": 1650, "offset": 0, "bufferlist length": 1650}]} -18> 2012-12-29 01:09:41.674379 7f1e33fc0700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread 7f1e33fc0700 time 2012-12-29 01:09:41.661118 os/FileStore.cc: 2681: FAILED assert(0 == "unexpected error") ceph version 0.55.1-369-g942c714 (942c71454b11d3bfb09ee4abe65f1e40936d3bf6) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x11acbb1] 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0x2cf3) [0x104fe3b] 3: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0xe8) [0x104aee2] 4: (FileStore::_do_op(FileStore::OpSequencer*)+0x211) [0x1049727] 5: (FileStore::OpWQ::_process(FileStore::OpSequencer*)+0x27) [0x1069907] 6: (ThreadPool::WorkQueue<FileStore::OpSequencer>::_void_process(void*)+0x2e) [0x106fab6] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x700) [0x11a1e78] 8: (ThreadPool::WorkThread::entry()+0x23) [0x11a3ac3] 9: (Thread::_entry_func(void*)+0x23) [0x119b399] 10: (()+0x7e9a) [0x7f1e3d950e9a] 11: (clone()+0x6d) [0x7f1e3badb4bd]
The job that produced this (once in 200 runs):
nuke-on-error: true overrides: ceph: branch: next conf: client: debug ms: 1 debug objecter: 20 debug rados: 20 global: ms inject socket failures: 5000 fs: xfs log-whitelist: - slow request roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 snap_create: 50 snap_remove: 50 snap_rollback: 50 write: 100 ops: 4000
Updated by Josh Durgin over 11 years ago
Happened again in teuthology:/a/joshd-3631-12-28-12_08.53/30681
Updated by Sage Weil over 11 years ago
Can you add 'debug osd = 20' so the job you're running?
Updated by Tamilarasi muthamizhan over 11 years ago
recent log: ubuntu@teuthology:/a/teuthology-2013-01-01_19:00:03-regression-next-testing-basic/33152
Updated by Sage Weil over 11 years ago
another instance with logs: ubuntu@teuthology:/a/sage-a2/33879
Updated by Sage Weil over 11 years ago
- Assignee changed from Sage Weil to Samuel Just
Updated by Samuel Just over 11 years ago
Ok. The recovery_qos stuff can allow a client op to reorder past a push. This is a problem since the push might be using the target of the client op to clone from. In this case:
1) Primary sends a push for snap N which clones from HEAD
2) Client deletes HEAD
3) Primary sends sends sub_op for the delete
4) Replica processes the delete
5) Replica processes the push and finds that the head is missing.
Working on solution.
Updated by Sage Weil over 11 years ago
- Status changed from In Progress to Resolved