Project

General

Profile

Bug #8887

osd crashes at assert(e.version > info.last_update): PG:add_log_entry

Added by Sahana Lokeshappa over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have ceph cluster with 3 monitors, 3 osd nodes (3 osds in each node)

While Io was going on, rebooted a osd node which includes osds osd.6, osd.7, osd.8.

osd.0 and osd.2 crashed with assert(e.version > info.last_update): PG:add_log_entry

2014-07-17 17:54:14.893962 7f91f3660700 -1 osd/PG.cc: In function 'void PG::add_log_entry(pg_log_entry_t&, ceph::bufferlist&)' thread 7f91f3660700 time 2014-07-17 17:54:13.252064
osd/PG.cc: 2619: FAILED assert(e.version > info.last_update)

ceph version andisk-sprint-2-drop-3-390-g2dbd85c (2dbd85c94cf27a1ff0419c5ea9359af7fe30e9b6)
1: (PG::add_log_entry(pg_log_entry_t&, ceph::buffer::list&)+0x481) [0x733a61]
2: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0xdf) [0x74483f]
3: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xcfe) [0x8193be]
4: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4a6) [0x904586]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2db) [0x7aedcb]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x459) [0x635719]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x346) [0x635ce6]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce) [0xa4a1ce]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa4c420]
10: (()+0x8182) [0x7f920f579182]
11: (clone()+0x6d) [0x7f920d91a30d]

monlogs.gz (3.2 MB) Sahana Lokeshappa, 07/21/2014 05:10 AM

log_osd_012.gz (4.62 MB) Sahana Lokeshappa, 07/21/2014 05:10 AM

log_osd_345.gz (3.39 MB) Sahana Lokeshappa, 07/21/2014 05:10 AM

log_osd_678.gz (4.35 MB) Sahana Lokeshappa, 07/21/2014 05:10 AM


Related issues

Duplicates Messengers - Bug #8504: msgr: FAILED assert(0 == "old msgs despite reconnect_seq feature") Resolved 06/02/2014

History

#1 Updated by Greg Farnum over 9 years ago

  • Assignee changed from Sage Weil to Greg Farnum

This error looks familiar to me but we don't have any other tracker entries for it. The commit in question is a part of master now, and although it was never tested in-tree on its own I don't see any changes that I think are relevant. I'll take a look tomorrow.

#2 Updated by Greg Farnum over 9 years ago

  • Status changed from New to Duplicate

There isn't much log history to look at here, but the same op is being dequeued twice by the OSD, and the second is hitting this assertion. I believe this is a different instantiation of #8504, which is resolved in-tree after the commit you're running. :)

#3 Updated by Pavan Rallabhandi over 9 years ago

Greg, can you please confirm whether #8346 is also same

Also available in: Atom PDF