Project

General

Profile

Actions

Bug #8121

closed

ReplicatedBackend::build_push_op() should handle a short read or assert

Added by David Zafman about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I noticed that the existing code in build_push_op() may not handle all scenarios of fixing data_included interval when a short read happens. It only adjusts the current interval length. If the read was 0 bytes then the interval could be left with a 0 length instead of being removed. Also, additional intervals are left beyond the short read.

Another possibility is that a short read should just assert because an inconsistency has been found.

This is the untested code for fixing the data_included handling

diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index 8910e00..400be80 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -8289,8 +8289,19 @@ int ReplicatedBackend::build_push_op(const ObjectRecoveryInfo &recovery_info,
       dout(10) << " extent " << p.get_start() << "~" << p.get_len()
               << " is actually " << p.get_start() << "~" << bit.length()
               << dendl;
-      p.set_len(bit.length());
+      interval_set<uint64_t>::iterator save = p++;
+      if (bit.length() == 0)
+        out_op->data_included.erase(save);     //Remove this empty interval
+      else
+        save.set_len(bit.length());
+      // Remove any other intervals present
+      while (p != out_op->data_included.end()) {
+        interval_set<uint64_t>::iterator save = p++;
+        out_op->data_included.erase(save);
+      }
       new_progress.data_complete = true;
+      out_op->data.claim_append(bit);
+      break;
     }
     out_op->data.claim_append(bit);
   }

Actions #1

Updated by David Zafman about 10 years ago

  • Description updated (diff)
Actions #2

Updated by Sahana Lokeshappa almost 10 years ago

I got this assert in ReplicatedBackend::build_push_op .
Steps Followed:

sudo ceph pg map 3.151
osdmap e1274 pg 3.151 (3.151) -> up [2,9,20] acting [2,9,20]
I removed object file1 (inserted object using rados) , rm -f on /var/lib/ceph/osd/ceph-9/current/3.151/file1* and /var/lib/ceph/osd/ceph-2/current/3.151/file1*

Check for scrub errors using :
ceph pg scrub 3.151
ceph -w showed scrub errors on 2 and 9
Ran command:
ceph osd repair 2
Got Seg fault in osd.2:
2014-06-16 10:33:19.906324 7fb9e9543700 0 log [ERR] : 3.151 shard 2 missing a086551/file1/head//3
2014-06-16 10:33:19.906330 7fb9e9543700 0 log [ERR] : 3.151 shard 9 missing a086551/file1/head//3
2014-06-16 10:33:19.906362 7fb9e9543700 0 log [ERR] : 3.151 repair 1 missing, 0 inconsistent objects
2014-06-16 10:33:19.906378 7fb9e9543700 0 log [ERR] : 3.151 repair 2 errors, 2 fixed
2014-06-16 10:33:19.924977 7fb9e9d44700 -1 ** Caught signal (Segmentation fault) *
in thread 7fb9e9d44700

1: /usr/bin/ceph-osd() [0x974a1f]
2: (()+0x10340) [0x7fba089de340]
3: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0xc1c) [0x7d209c]
4: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x3d8) [0x7d2b48]
5: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x3af) [0x7d6b8f]
6: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1af) [0x7d9c6f]
7: (C_ReplicatedBackend_OnPullComplete::finish(ThreadPool::TPHandle&)+0x143) [0x84b083]
8: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x661a09]
9: (ReplicatedPG::BlessedGenContext<ThreadPool::TPHandle&>::finish(ThreadPool::TPHandle&)+0x95) [0x824f65]
10: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x661a09]
11: (ThreadPool::WorkQueueVal<GenContext<ThreadPool::TPHandle&>, GenContext<ThreadPool::TPHandle&>>::_void_process(void*, ThreadPool::TPHandle&)+0x62) [0x6697d2]
12: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0xa4b351]
13: (ThreadPool::WorkThread::entry()+0x10) [0xa4c440]
14: (()+0x8182) [0x7fba089d6182]
15: (clone()+0x6d) [0x7fba06d7730d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

IS this related ? Or should I raise another tracker for this.

Actions #3

Updated by Dhiraj Kamble almost 10 years ago

The second issue reported is not same as the original one.
It is a SEGV due to NULL omap iterator. Please open another tracker for the second issue.

Actions #4

Updated by Samuel Just over 9 years ago

  • Assignee set to David Zafman
  • Priority changed from Normal to Urgent
Actions #5

Updated by Samuel Just over 9 years ago

  • Priority changed from Urgent to High
Actions #6

Updated by David Zafman over 9 years ago

new acting set [1,5] causes 5 to read 1.3e_head/7fc9453e/plana8413303-32/55e but the read size is only 413696 not the expected 416448. This causes osd.1 to crash later because of a 0 sized interval sent.

pg 1.3e is active+inconsistent which isn't surprising.

osd.5
2014-12-16 16:36:34.736853 7f313a510700 10 filestore(/var/lib/ceph/osd/ceph-5) stat 1.3e_head/7fc9453e/plana8413303-32/55e//1 = 0 (size 413696)
2014-12-16 16:36:34.736863 7f313a510700 7 osd.5 pg_epoch: 2198 pg[1.3e( v 2194'1055 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=1 lpr=2196 pi=2154-2195/1 luod=0'0 crt=2191'1051 lcod 2193'1053 active+inconsistent] send_push_op 7fc9453e/plana8413303-32/55e//1 v 1857'870 size 0 recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/55e//1@1857'870, copy_subset: [0~416448], clone_subset: {})
2014-12-16 16:36:34.736888 7f313a510700 15 filestore(/var/lib/ceph/osd/ceph-5) omap_get_header 1.3e_head/7fc9453e/plana8413303-32/55e//1
2014-12-16 16:36:34.737069 7f313a510700 15 filestore(/var/lib/ceph/osd/ceph-5) getattrs 1.3e_head/7fc9453e/plana8413303-32/55e//1
2014-12-16 16:36:34.737143 7f313a510700 20 filestore(/var/lib/ceph/osd/ceph-5) fgetattrs 160 getting '_'
2014-12-16 16:36:34.737160 7f313a510700 20 filestore(/var/lib/ceph/osd/ceph-5) fgetattrs 160 getting '__header'
2014-12-16 16:36:34.737167 7f313a510700 10 filestore(/var/lib/ceph/osd/ceph-5) getattrs no xattr exists in object_map r = 0
2014-12-16 16:36:34.737169 7f313a510700 10 filestore(/var/lib/ceph/osd/ceph-5) getattrs 1.3e_head/7fc9453e/plana8413303-32/55e//1 = 0
2014-12-16 16:36:34.737181 7f313a510700 15 filestore(/var/lib/ceph/osd/ceph-5) get_omap_iterator 1.3e_head/7fc9453e/plana8413303-32/55e//1
2014-12-16 16:36:34.737252 7f313a510700 15 filestore(/var/lib/ceph/osd/ceph-5) read 1.3e_head/7fc9453e/plana8413303-32/55e//1 0~416448
2014-12-16 16:36:34.754544 7f313a510700 10 filestore(/var/lib/ceph/osd/ceph-5) FileStore::read 1.3e_head/7fc9453e/plana8413303-32/55e//1 0~413696/416448
2014-12-16 16:36:34.754565 7f313a510700 10 osd.5 pg_epoch: 2198 pg[1.3e( v 2194'1055 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=1 lpr=2196 pi=2154-2195/1 luod=0'0 crt=2191'1051 lcod 2193'1053 active+inconsistent] extent 0~416448 is actually 0~413696
2014-12-16 16:36:34.754598 7f313a510700 20 osd.5 pg_epoch: 2198 pg[1.3e( v 2194'1055 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=1 lpr=2196 pi=2154-2195/1 luod=0'0 crt=2191'1051 lcod 2193'1053 active+inconsistent] send_pushes: sending push PushOp(7fc9453e/plana8413303-32/2aa//1, version: 984'466, data_included: [0~999800], data_size: 999800, omap_header_size: 160, omap_entries_size: 40, attrset_size: 42, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/2aa//1@984'466, copy_subset: [0~10267644], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:999800, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.1
2014-12-16 16:36:34.754660 7f313a510700 20 osd.5 pg_epoch: 2198 pg[1.3e( v 2194'1055 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=1 lpr=2196 pi=2154-2195/1 luod=0'0 crt=2191'1051 lcod 2193'1053 active+inconsistent] send_pushes: sending push PushOp(7fc9453e/plana8413303-32/55e//1, version: 1857'870, data_included: [0~413696], data_size: 413696, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/55e//1@1857'870, copy_subset: [0~416448], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:413696, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.1
2014-12-16 16:36:34.754727 7f313a510700 1 -- 10.214.131.38:6804/10728 --> 10.214.133.32:6807/15618 -- MOSDPGPush(1.3e 2198 [PushOp(7fc9453e/plana8413303-32/2aa//1, version: 984'466, data_included: [0~999800], data_size: 999800, omap_header_size: 160, omap_entries_size: 40, attrset_size: 42, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/2aa//1@984'466, copy_subset: [0~10267644], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:999800, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(7fc9453e/plana8413303-32/55e//1, version: 1857'870, data_included: [0~413696], data_size: 413696, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/55e//1@1857'870, copy_subset: [0~416448], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:413696, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x5523000 con 0x5c18b00
2014-12-16 16:36:34.754775 7f313a510700 10 osd.5 2198 dequeue_op 0x5449400 finish

osd.1 the new primary crashes:

2014-12-16 16:36:35.289562 7f76ba179700 10 osd.1 2198 dequeue_op 0x4151c00 prio 10 cost 1053328 latency 0.012506 MOSDPGPush(1.3e 2198 [PushOp(7fc9453e/plana8413303-32/2aa//1, version: 984'466, data_included: [999800~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 0, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/2aa//1@984'466, copy_subset: [0~10267644], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:2048376, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(!first, data_recovered_to:999800, data_complete:false, omap_recovered_to:, omap_complete:true)),PushOp(7fc9453e/plana8413303-32/55e//1, version: 1857'870, data_included: [413696~0], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 0, recovery_info: ObjectRecoveryInfo(7fc9453e/plana8413303-32/55e//1@1857'870, copy_subset: [0~416448], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:413696, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(!first, data_recovered_to:413696, data_complete:true, omap_recovered_to:, omap_complete:true))]) v2 pg pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17]
2014-12-16 16:36:35.289633 7f76ba179700 10 osd.1 pg_epoch: 2198 pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17] handle_message: 0x4151c00
2014-12-16 16:36:35.289663 7f76ba179700 10 osd.1 pg_epoch: 2198 pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17] handle_pull_response ObjectRecoveryInfo(7fc9453e/plana8413303-32/2aa//1@984'466, copy_subset: [0~10267644], clone_subset: {})ObjectRecoveryProgress(!first, data_recovered_to:2048376, data_complete:false, omap_recovered_to:, omap_complete:true) data.size() is 1048576 data_included: [999800~1048576]
2014-12-16 16:36:35.289696 7f76ba179700 10 osd.1 pg_epoch: 2198 pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17] new recovery_info ObjectRecoveryInfo(7fc9453e/plana8413303-32/2aa//1@984'466, copy_subset: [0~10267644], clone_subset: {}), new progress ObjectRecoveryProgress(!first, data_recovered_to:2048376, data_complete:false, omap_recovered_to:, omap_complete:true)
2014-12-16 16:36:35.289733 7f76ba179700 10 osd.1 pg_epoch: 2198 pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17] submit_push_data: Creating oid 7fc9453e/plana8413303-32/2aa//1 in the temp collection
2014-12-16 16:36:35.289773 7f76ba179700 10 osd.1 pg_epoch: 2198 pg[1.3e( v 2194'1055 lc 982'465 (0'0,2194'1055] local-les=2197 n=18 ec=6 les/c 2197/2160 2196/2196/2196) [1,5] r=0 lpr=2196 pi=2154-2195/1 rops=2 crt=2194'1055 mlcod 0'0 active+recovering+degraded+inconsistent m=17] handle_pull_response ObjectRecoveryInfo(7fc9453e/plana8413303-32/55e//1@1857'870, copy_subset: [0~416448], clone_subset: {})ObjectRecoveryProgress(!first, data_recovered_to:413696, data_complete:true, omap_recovered_to:, omap_complete:true) data.size() is 0 data_included: [413696~0]
2014-12-16 16:36:35.310354 7f76ba179700 -1 osd/ReplicatedPG.cc: In function 'bool ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp&, PullOp*, std::list<hobject_t>*, ObjectStore::Transaction*)' thread 7f76ba179700 time 2014-12-16 16:36:35.289804
osd/ReplicatedPG.cc: 8678: FAILED assert((data_included.empty() && data.length() == 0) || (!data_included.empty() && data.length() > 0))

ceph version 0.89-435-g90e18ae (90e18aed0e2b3b1efa5709811ea9fb4c3b866699)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb896b5]
2: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp&, PullOp*, std::list&lt;hobject_t, std::allocator&lt;hobject_t&gt; >, ObjectStore::Transaction)+0x6d6) [0x87b886]
3: (ReplicatedBackend::_do_pull_response(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x2ea) [0x87bfca]
4: (ReplicatedBackend::handle_message(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x3ae) [0x9f13ce]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x26b) [0x81602b]
6: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x405) [0x683cb5]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x359) [0x684249]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x828) [0xb79328]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb7b450]
10: (()+0x7df3) [0x7f76d8841df3]
11: (clone()+0x6d) [0x7f76d73243dd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions #7

Updated by David Zafman over 9 years ago

  • Status changed from New to 7
Actions #8

Updated by David Zafman over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #9

Updated by David Zafman over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF