Project

General

Profile

Bug #24650

mark unfound lost revert: out of order trim

Added by Sergey Malinin about 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD crashes in a few seconds after command 'ceph pg X.XX mark_unfound_lost revert'.

10> 2018-06-25 15:52:14.492 7fa14c72b700  1 - 192.168.0.12:6802/1588134 --> 192.168.0.11:6809/380947 -- MOSDPGPushReply(2.27 18340/18338 [PushReplyOp(2:e44bcb41:::1001a677a00.00000000:head)]) v3 -- 0x561df533c480 con 0
9> 2018-06-25 15:52:14.512 7fa15d186700 5 - 192.168.0.12:6802/1588134 >> 192.168.0.11:6809/380947 conn(0x561dec29a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=273 cs=1 l=0). rx osd.1 seq 1126 0x561df533c480 MO
SDPGPush(2.27 18340/18338 [PushOp(2:e44bcb4a:::10012d837c3.00000000:head, version: 12732'2689323, data_included: [0~286148], data_size: 286148, omap_header_size: 0, omap_entries_size: 0, attrset_size: 4, recovery_info: ObjectReco
veryInfo(2:e44bcb4a:::10012d837c3.00000000:head@12732'2689323, size: 286148, copy_subset: [0~286148], clone_subset: {}, snapset: 0=[]:{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:286148, data_complete:tru
e, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3
8> 2018-06-25 15:52:14.512 7fa15d186700 1 - 192.168.0.12:6802/1588134 <== osd.1 192.168.0.11:6809/380947 1126 ==== MOSDPGPush(2.27 18340/18338 [PushOp(2:e44bcb4a:::10012d837c3.00000000:head, version: 12732'2689323, data_in
cluded: [0~286148], data_size: 286148, omap_header_size: 0, omap_entries_size: 0, attrset_size: 4, recovery_info: ObjectRecoveryInfo(2:e44bcb4a:::10012d837c3.00000000:head@12732'2689323, size: 286148, copy_subset: [0~286148], clo
ne_subset: {}, snapset: 0=[]:{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:286148, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, d
ata_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3 ==== 287335+0+0 (2723386426 0 0) 0x561df533c480 con 0x561dec29a000
7> 2018-06-25 15:52:14.512 7fa15d186700 5 - 192.168.0.12:6802/1588134 >> 192.168.0.11:6809/380947 conn(0x561dec29a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=273 cs=1 l=0). rx osd.1 seq 1127 0x561dde8aab00 pg
backfill(progress 2.27 e 18340/18340 lb 2:e44bcb41:::1001a677a00.00000000:head) v3
6> 2018-06-25 15:52:14.512 7fa15d186700 1 - 192.168.0.12:6802/1588134 <== osd.1 192.168.0.11:6809/380947 1127 ==== pg_backfill(progress 2.27 e 18340/18340 lb 2:e44bcb41:::1001a677a00.00000000:head) v3 ==== 967+0+0 (4167883
386 0 0) 0x561dde8aab00 con 0x561dec29a000
5> 2018-06-25 15:52:14.548 7fa14c72b700 1 - 192.168.0.12:6802/1588134 --> 192.168.0.11:6809/380947 -- pg_trim(2.27 18340'5705274 e18340/18340) v2 -- 0x561defeb1180 con 0
4> 2018-06-25 15:52:14.552 7fa14c72b700 1 - 192.168.0.12:6802/1588134 --> 192.168.0.11:6809/380947 -- MOSDPGPushReply(2.27 18340/18338 [PushReplyOp(2:e44bcb4a:::10012d837c3.00000000:head)]) v3 -- 0x561df533c6c0 con 0
3> 2018-06-25 15:52:14.552 7fa15d186700 5 - 192.168.0.12:6802/1588134 >> 192.168.0.11:6809/380947 conn(0x561dec29a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=273 cs=1 l=0). rx osd.1 seq 1128 0x561dfa6ac800 os
d_repop(client.587847.0:334731 2.55 e18340/18338) v2
2> 2018-06-25 15:52:14.552 7fa15d186700 1 - 192.168.0.12:6802/1588134 <== osd.1 192.168.0.11:6809/380947 1128 ==== osd_repop(client.587847.0:334731 2.55 e18340/18338) v2 ==== 1015+0+46 (4132989784 0 1629833233) 0x561dfa6ac
800 con 0x561dec29a000
-1> 2018-06-25 15:52:14.552 7fa142717700 -1 bad trim to 18141'5564428 when complete_to is 18135'5554429 on log((18135'5554428,18340'5655829], crt=18340'5655829)
0> 2018-06-25 15:52:14.560 7fa142717700 -1 /build/ceph-13.2.0/src/osd/PGLog.cc: In function 'void PGLog::IndexedLog::trim(CephContext*, eversion_t, std::set&lt;eversion_t&gt;*, std::set&lt;std::
_cxx11::basic_string&lt;char&gt; >, eversio
n_t
)' thread 7fa142717700 time 2018-06-25 15:52:14.554490
/build/ceph-13.2.0/src/osd/PGLog.cc: 58: FAILED assert(0 == "out of order trim")
ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fa1636535e2]
2: (()+0x26b7a7) [0x7fa1636537a7]
3: (()+0x4d8b96) [0x561dbfe63b96]
4: (PGLog::trim(eversion_t, pg_info_t&)+0xa1) [0x561dbfe63c41]
5: (PG::append_log(std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, eversion_t, eversion_t, ObjectStore::Transaction&, bool)+0x1f6) [0x561dbfe118d6]
6: (non-virtual thunk to PrimaryLogPG::log_operation(std::vector&lt;pg_log_entry_t, std::allocator&lt;pg_log_entry_t&gt; > const&, boost::optional&lt;pg_hit_set_history_t&gt; const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Tra
nsaction&)+0x83) [0x561dbff098c3]
7: (ReplicatedBackend::do_repop(boost::intrusive_ptr&lt;OpRequest&gt;)+0x8f9) [0x561dc00279c9]
8: (ReplicatedBackend::_handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x1a7) [0x561dc002a1f7]
9: (PGBackend::handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x97) [0x561dbff3ec57]
10: (PrimaryLogPG::do_request(boost::intrusive_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x6ed) [0x561dbfeed50d]
11: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, boost::intrusive_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x1b8) [0x561dbfd482b8]
12: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x62) [0x561dbffc17f2]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x598) [0x561dbfd67a48]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7fa16365851e]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fa16365a5a0]
16: (()+0x76db) [0x7fa161d4e6db]
17: (clone()+0x3f) [0x7fa160d1288f]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

History

#1 Updated by Josh Durgin about 3 years ago

  • Subject changed from out of order trim to mark unfound lost revert: out of order trim

Also available in: Atom PDF