Project

General

Profile

Actions

Bug #8066

closed

osd/PG.cc: 2826: FAILED assert(r == 0) in update_snap_map (dumpling + firefly)

Added by Sage Weil about 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2014-04-09_22:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/182710

     0> 2014-04-10 13:11:34.589907 7f634e6de700 -1 osd/PG.cc: In function 'void PG::update_snap_map(std::vector<pg_log_entry_t>&, ObjectStore::Transaction&)' thread 7f634e6de700 time 2014-04-10 13:11:34.186415
osd/PG.cc: 2826: FAILED assert(r == 0)

 ceph version 0.79-42-g010dff1 (010dff12c38882238591bb042f8e497a1f7ba020)
 1: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, ObjectStore::Transaction&)+0x3cd) [0x73358d]
 2: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0x395) [0x739775]
 3: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xcf4) [0x80b044]
 4: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4a6) [0x8f3916]
 5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2db) [0x7a420b]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x371) [0x605161]
 7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1f4) [0x61efd4]
 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x66179c]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0xa37361]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xa38250]
 11: (()+0x8182) [0x7f63635e8182]
 12: (clone()+0x6d) [0x7f636198930d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #8162: osd: dumpling advances last_backfill prematurelyResolvedSamuel Just04/19/2014

Actions
Actions #1

Updated by Sage Weil about 10 years ago

ubuntu@teuthology:/a/teuthology-2014-04-09_22:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/182696

Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from New to 12
Actions #3

Updated by Sage Weil about 10 years ago

ubuntu@teuthology:/a/teuthology-2014-04-15_22:35:26-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/196331

Actions #4

Updated by Sage Weil almost 10 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-22_19:55:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/210198

Actions #5

Updated by Samuel Just almost 10 years ago

2014-04-23 05:24:38.517382 7f7f6945d700 10 osd.0 911 dequeue_op 0x2b4d780 prio 127 cost 0 latency 0.000238 pg_backfill(finish 3.19 e 911/911 lb MAX) v3 pg pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped]
2014-04-23 05:24:38.517427 7f7f6945d700 10 osd.0 pg_epoch: 911 pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped] handle_message: 0x2b4d780
2014-04-23 05:24:38.517453 7f7f6945d700 10 osd.0 pg_epoch: 911 pg[3.19( v 887'279 (11'2,887'279] lb 87e0a559/plana6731246-25/24b//3 local-les=907 n=1 ec=10 les/c 907/891 904/905/905) [1,0]/[1,0,3] r=1 lpr=905 pi=889-904/6 luod=0'0 crt=887'279 active+remapped] do_backfill pg_backfill(finish 3.19 e 911/911 lb MAX) v3
2014-04-23 05:24:38.517491 7f7f6945d700 1 -- 10.214.131.15:6811/32041 --> 10.214.131.15:6806/26389 -- pg_backfill(finish_ack 3.19 e 911/911 lb 0//0//-1) v3 -- ?+0 0x34e3800 con 0x24502c0
2014-04-23 05:24:38.517594 7f7f6945d700 10 log is not dirty
...
2014-04-23 05:24:41.519183 7f7f6ec68700 1 -- 10.214.131.15:6811/32041 <== osd.1 10.214.131.15:6806/26389 2362 ==== osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 ==== 1528+0+1227 (3699701671 0 3164676542) 0x1c52c00 con 0x24502c0
2014-04-23 05:24:41.519219 7f7f6ec68700 10 osd.0 913 do_waiters -- start
2014-04-23 05:24:41.519224 7f7f6ec68700 10 osd.0 913 do_waiters -- finish
2014-04-23 05:24:41.519227 7f7f6ec68700 20 osd.0 913 _dispatch 0x1c52c00 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10
2014-04-23 05:24:41.519304 7f7f6ec68700 10 osd.0 913 handle_replica_op osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 epoch 913
2014-04-23 05:24:41.519314 7f7f6ec68700 15 osd.0 913 require_same_or_newer_map 913 (i am 913) 0x1c52c00
2014-04-23 05:24:41.519320 7f7f6ec68700 20 osd.0 913 _share_map_incoming osd.1 10.214.131.15:6806/26389 913
2014-04-23 05:24:41.519331 7f7f6ec68700 15 osd.0 913 enqueue_op 0x3d72000 prio 127 cost 1227 latency 0.000238 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10
2014-04-23 05:24:41.519358 7f7f6ec68700 10 osd.0 913 do_waiters -- start
2014-04-23 05:24:41.519363 7f7f6ec68700 10 osd.0 913 do_waiters -- finish
2014-04-23 05:24:41.519381 7f7f6945d700 10 osd.0 913 dequeue_op 0x3d72000 prio 127 cost 1227 latency 0.000287 osd_sub_op(osd.1.0:1594 3.19 87e0a559/plana6731246-25/24b//3 [] v 913'281 snapset=0=[]:[] snapc=0=[]) v10 pg pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active]
2014-04-23 05:24:41.519440 7f7f6945d700 10 osd.0 pg_epoch: 913 pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active] handle_message: 0x3d72000
2014-04-23 05:24:41.519454 7f7f6945d700 10 osd.0 pg_epoch: 913 pg[3.19( v 887'279 (11'2,887'279] local-les=913 n=7 ec=10 les/c 913/913 904/912/905) [1,0] r=1 lpr=912 pi=889-911/7 luod=0'0 crt=887'279 active] sub_op_modify trans 87e0a559/plana6731246-25/24b//3 v 913'281 (transaction) 390
2014-04-23 05:24:41.519515 7f7f6945d700 15 filestore(/var/lib/ceph/osd/ceph-0) omap_get_values meta/a468ec03/snapmapper/0//-1
2014-04-23 05:24:41.520231 7f7f755a3700 20 journal do_write latency 0.005905
...
osd/PG.cc: 2858: FAILED assert(r == 0)

So, race with backfill, consulting they naughty primary.

Actions #6

Updated by Samuel Just almost 10 years ago

Actually, osd.3 prematurely advanced last_backfill, was still on dumpling.

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-22_19:55:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/210198/remote$ grep 'ceph version' ceph-osd.3.log
2014-04-23 05:03:39.003722 7fd9b03bd800 0 ceph version 0.67.7-72-gc049967 (c049967af829497f8a62e0cbbd6031f85ead8a59), process ceph-osd, pid 27059
2014-04-23 05:03:43.585479 7fece83f9800 0 ceph version 0.67.7-72-gc049967 (c049967af829497f8a62e0cbbd6031f85ead8a59), process ceph-osd, pid 27185

Actions #7

Updated by Samuel Just almost 10 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF