Bug #8066
closedosd/PG.cc: 2826: FAILED assert(r == 0) in update_snap_map (dumpling + firefly)
0%
Description
ubuntu@teuthology:/a/teuthology-2014-04-09_22:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/182710
0> 2014-04-10 13:11:34.589907 7f634e6de700 -1 osd/PG.cc: In function 'void PG::update_snap_map(std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f634e6de700 time 2014-04-10 13:11:34.186415 osd/PG.cc: 2826: FAILED assert(r == 0) ceph version 0.79-42-g010dff1 (010dff12c38882238591bb042f8e497a1f7ba020) 1: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, ObjectStore::Transaction&)+0x3cd) [0x73358d] 2: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0x395) [0x739775] 3: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xcf4) [0x80b044] 4: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4a6) [0x8f3916] 5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2db) [0x7a420b] 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x371) [0x605161] 7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1f4) [0x61efd4] 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x66179c] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0xa37361] 10: (ThreadPool::WorkThread::entry()+0x10) [0xa38250] 11: (()+0x8182) [0x7f63635e8182] 12: (clone()+0x6d) [0x7f636198930d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels ---
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2014-04-09_22:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/182696
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2014-04-15_22:35:26-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/196331
Updated by Sage Weil almost 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-22_19:55:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/210198
Updated by Samuel Just almost 10 years ago
2014-04-23 05:24:38.517382 7f7f6945d700 10 osd.0 911 dequeue_op 0x2b4d780 prio 127 cost 0 latency 0.000238 pg_backfill(finish 3.19 e 911/911 lb MAX) v3 pg pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped]
2014-04-23 05:24:38.517427 7f7f6945d700 10 osd.0 pg_epoch: 911 pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped] handle_message: 0x2b4d780
2014-04-23 05:24:38.517453 7f7f6945d700 10 osd.0 pg_epoch: 911 pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped] do_backfill pg_backfill(finish 3.19 e 911/911 lb MAX) v3
2014-04-23 05:24:38.517491 7f7f6945d700 1 -- 10.214.131.15:6811/32041 --> 10.214.131.15:6806/26389 -- pg_backfill(finish_ack 3.19 e 911/911 lb 0//0//-1) v3 -- ?+0 0x34e3800 con 0x24502c0
2014-04-23 05:24:38.517594 7f7f6945d700 10 log is not dirty
...
2014-04-23 05:24:41.519183 7f7f6ec68700 1 -- 10.214.131.15:6811/32041 <== osd.1 10.214.131.15:6806/26389 2362 ==== osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 ==== 1528+0+1227 (3699701671 0 3164676542) 0x1c52c00 con 0x24502c0
2014-04-23 05:24:41.519219 7f7f6ec68700 10 osd.0 913 do_waiters -- start
2014-04-23 05:24:41.519224 7f7f6ec68700 10 osd.0 913 do_waiters -- finish
2014-04-23 05:24:41.519227 7f7f6ec68700 20 osd.0 913 _dispatch 0x1c52c00 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10
2014-04-23 05:24:41.519304 7f7f6ec68700 10 osd.0 913 handle_replica_op osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 epoch 913
2014-04-23 05:24:41.519314 7f7f6ec68700 15 osd.0 913 require_same_or_newer_map 913 (i am 913) 0x1c52c00
2014-04-23 05:24:41.519320 7f7f6ec68700 20 osd.0 913 _share_map_incoming osd.1 10.214.131.15:6806/26389 913
2014-04-23 05:24:41.519331 7f7f6ec68700 15 osd.0 913 enqueue_op 0x3d72000 prio 127 cost 1227 latency 0.000238 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10
2014-04-23 05:24:41.519358 7f7f6ec68700 10 osd.0 913 do_waiters -- start
2014-04-23 05:24:41.519363 7f7f6ec68700 10 osd.0 913 do_waiters -- finish
2014-04-23 05:24:41.519381 7f7f6945d700 10 osd.0 913 dequeue_op 0x3d72000 prio 127 cost 1227 latency 0.000287 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 pg pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active]
2014-04-23 05:24:41.519440 7f7f6945d700 10 osd.0 pg_epoch: 913 pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active] handle_message: 0x3d72000
2014-04-23 05:24:41.519454 7f7f6945d700 10 osd.0 pg_epoch: 913 pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active] sub_op_modify trans 87e0a559/plana6731246-25/24b//3 v 913'281 (transaction) 390
2014-04-23 05:24:41.519515 7f7f6945d700 15 filestore(/var/lib/ceph/osd/ceph-0) omap_get_values meta/a468ec03/snapmapper/0//-1
2014-04-23 05:24:41.520231 7f7f755a3700 20 journal do_write latency 0.005905
...
osd/PG.cc: 2858: FAILED assert(r == 0)
So, race with backfill, consulting they naughty primary.
Updated by Samuel Just almost 10 years ago
Actually, osd.3 prematurely advanced last_backfill, was still on dumpling.
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-22_19:55:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/210198/remote$ grep 'ceph version' ceph-osd.3.log
2014-04-23 05:03:39.003722 7fd9b03bd800 0 ceph version 0.67.7-72-gc049967 (c049967af829497f8a62e0cbbd6031f85ead8a59), process ceph-osd, pid 27059
2014-04-23 05:03:43.585479 7fece83f9800 0 ceph version 0.67.7-72-gc049967 (c049967af829497f8a62e0cbbd6031f85ead8a59), process ceph-osd, pid 27185