Bug #8887
osd crashes at assert(e.version > info.last_update): PG:add_log_entry
0%
Description
I have ceph cluster with 3 monitors, 3 osd nodes (3 osds in each node)
While Io was going on, rebooted a osd node which includes osds osd.6, osd.7, osd.8.
osd.0 and osd.2 crashed with assert(e.version > info.last_update): PG:add_log_entry
2014-07-17 17:54:14.893962 7f91f3660700 -1 osd/PG.cc: In function 'void PG::add_log_entry(pg_log_entry_t&, ceph::bufferlist&)' thread 7f91f3660700 time 2014-07-17 17:54:13.252064
osd/PG.cc: 2619: FAILED assert(e.version > info.last_update)
ceph version andisk-sprint-2-drop-3-390-g2dbd85c (2dbd85c94cf27a1ff0419c5ea9359af7fe30e9b6)
1: (PG::add_log_entry(pg_log_entry_t&, ceph::buffer::list&)+0x481) [0x733a61]
2: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0xdf) [0x74483f]
3: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xcfe) [0x8193be]
4: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4a6) [0x904586]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2db) [0x7aedcb]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x459) [0x635719]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x346) [0x635ce6]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce) [0xa4a1ce]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa4c420]
10: (()+0x8182) [0x7f920f579182]
11: (clone()+0x6d) [0x7f920d91a30d]
Related issues
History
#1 Updated by Greg Farnum over 9 years ago
- Assignee changed from Sage Weil to Greg Farnum
This error looks familiar to me but we don't have any other tracker entries for it. The commit in question is a part of master now, and although it was never tested in-tree on its own I don't see any changes that I think are relevant. I'll take a look tomorrow.
#2 Updated by Greg Farnum over 9 years ago
- Status changed from New to Duplicate
There isn't much log history to look at here, but the same op is being dequeued twice by the OSD, and the second is hitting this assertion. I believe this is a different instantiation of #8504, which is resolved in-tree after the commit you're running. :)
#3 Updated by Pavan Rallabhandi over 9 years ago
Greg, can you please confirm whether #8346 is also same