Project

General

Profile

Actions

Bug #13434

closed

osd/ReplicatedPG.cc: 5734: FAILED assert(rollback_to)

Added by Samuel Just over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

29> 2015-10-08 18:07:57.522785 7f11593c3700 10 osd.5 34 dequeue_op 0x7f117d28a000 prio 63 cost 0 latency 0.847566 osd_op(client.4127.0:2279 plana2027886-126 [rollback 1] 2.199e9b3e ack+ondisk+write+known_if_redirected e34) v6 pg pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27]
-28> 2015-10-08 18:07:57.522872 7f11593c3700 5 -
op tracker -- seq: 4392, time: 2015-10-08 18:07:57.522871, event: reached_pg, op: osd_op(client.4127.0:2279 plana2027886-126 [rollback 1] 2.199e9b3e ack+ondisk+write+known_if_redirected e34)
27> 2015-10-08 18:07:57.522892 7f11593c3700 20 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] op_has_sufficient_caps pool=2 (cache ) owner=0 need_read_cap=0 need_write_cap=1 need_class_read_cap=0 need_class_write_cap=0 -> yes
-26> 2015-10-08 18:07:57.522948 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] handle_message: 0x7f117d28a000
-25> 2015-10-08 18:07:57.522989 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] do_op osd_op(client.4127.0:2279 plana2027886-126 [rollback 1] 2.199e9b3e ack+ondisk+write+known_if_redirected e34) v6 may_write -> write-ordered flags ack+ondisk+write+known_if_redirected
-24> 2015-10-08 18:07:57.523043 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: found obc in cache: 0x7f117cc96c00
-23> 2015-10-08 18:07:57.523080 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: 0x7f117cc96c00 2/199e9b3e/plana2027886-126/head rwstate(write n=1 w=0) oi: 2/199e9b3e/plana2027886-126/head(34'537 osd.5.0:1237 data_digest s 2804201 uv 519 dd 55a5acba) ssc: 0x7f117c0e4a50 snapset: 1=[1]:[1]+head
-22> 2015-10-08 18:07:57.523133 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] find_object_context 2/199e9b3e/plana2027886-126/head @head oi=2/199e9b3e/plana2027886-126/head(34'537 osd.5.0:1237 data_digest s 2804201 uv 519 dd 55a5acba)
-21> 2015-10-08 18:07:57.523176 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] agent_choose_mode flush_mode: low evict_mode: full num_objects: 64 num_bytes: 199744003 num_objects_dirty: 28 num_objects_omap: 0 num_dirty: 28 num_user_objects: 64 num_user_bytes: 199744003 pool.info.target_max_bytes: 0 pool.info.target_max_objects: 250
-20> 2015-10-08 18:07:57.523223 7f11593c3700 20 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] agent_choose_mode dirty 0.451612 full 1.03226
-19> 2015-10-08 18:07:57.523286 7f11593c3700 5 -
op tracker -- seq: 4392, time: 2015-10-08 18:07:57.523286, event: started, op: osd_op(client.4127.0:2279 plana2027886-126 [rollback 1] 2.199e9b3e ack+ondisk+write+known_if_redirected e34)
-18> 2015-10-08 18:07:57.523299 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] execute_ctx 0x7f117e493e00
-17> 2015-10-08 18:07:57.523344 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] do_op 2/199e9b3e/plana2027886-126/head [rollback 1] ov 34'537 av 34'538 snapc 1=[1] snapset 1=[1]:[1]+head
-16> 2015-10-08 18:07:57.523393 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] do_osd_op 2/199e9b3e/plana2027886-126/head [rollback 1]
-15> 2015-10-08 18:07:57.523446 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] do_osd_op rollback 1
-14> 2015-10-08 18:07:57.523488 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] rollback_to 2/199e9b3e/plana2027886-126/head snapid 1
-13> 2015-10-08 18:07:57.523526 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] find_object_context 2/199e9b3e/plana2027886-126/1 @1 snapset 1=[1]:[1]+head
-12> 2015-10-08 18:07:57.523576 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: obc NOT found in cache: 2/199e9b3e/plana2027886-126/1
-11> 2015-10-08 18:07:57.523615 7f11593c3700 15 filestore(/var/lib/ceph/osd/ceph-5) getattr 2.2_head/2/199e9b3e/plana2027886-126/1 '
'
-10> 2015-10-08 18:07:57.523672 7f11593c3700 10 filestore(/var/lib/ceph/osd/ceph-5) error opening file /var/lib/ceph/osd/ceph-5/current/2.2_head/plana2027886-126__1_199E9B3E__2 with flags=2: (2) No such file or directory
-9> 2015-10-08 18:07:57.523683 7f11593c3700 10 filestore(/var/lib/ceph/osd/ceph-5) getattr 2.2_head/2/199e9b3e/plana2027886-126/1 '_' = -2
-8> 2015-10-08 18:07:57.523687 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: no obc for soid 2/199e9b3e/plana2027886-126/1 and !can_create
-7> 2015-10-08 18:07:57.523740 7f11593c3700 20 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] find_object_context missing clone 2/199e9b3e/plana2027886-126/1
-6> 2015-10-08 18:07:57.523778 7f11593c3700 20 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] maybe_handle_cache cache pool full, waiting
-5> 2015-10-08 18:07:57.523818 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: obc NOT found in cache: 2/199e9b3e/plana2027886-126/1
-4> 2015-10-08 18:07:57.523890 7f11593c3700 15 filestore(/var/lib/ceph/osd/ceph-5) getattr 2.2_head/2/199e9b3e/plana2027886-126/1 '_'
-3> 2015-10-08 18:07:57.523926 7f11593c3700 10 filestore(/var/lib/ceph/osd/ceph-5) error opening file /var/lib/ceph/osd/ceph-5/current/2.2_head/plana2027886-126__1_199E9B3E__2 with flags=2: (2) No such file or directory
-2> 2015-10-08 18:07:57.523937 7f11593c3700 10 filestore(/var/lib/ceph/osd/ceph-5) getattr 2.2_head/2/199e9b3e/plana2027886-126/1 '_' = -2
-1> 2015-10-08 18:07:57.523940 7f11593c3700 10 osd.5 pg_epoch: 34 pg[2.2( v 34'537 lc 21'164 (0'0,34'537] local-les=27 n=64 ec=6 les/c/f 27/7/0 26/26/26) [5,1] r=0 lpr=26 pi=6-25/2 luod=34'536 lua=33'531 rops=1 crt=25'274 mlcod 21'164 active+recovery_wait+degraded NIBBLEWISE m=27] get_object_context: no obc for soid 2/199e9b3e/plana2027886-126/1 and !can_create
0> 2015-10-08 18:07:57.571200 7f11593c3700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::_rollback_to(ReplicatedPG::OpContext*, ceph_osd_op&)' thread 7f11593c3700 time 2015-10-08 18:07:57.524000
osd/ReplicatedPG.cc: 5734: FAILED assert(rollback_to)

ceph version 9.0.3-2073-gb1c7dd8 (b1c7dd851223f18c8e247b8fe75c3806cb659278)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f11782deb5b]
2: (ReplicatedPG::_rollback_to(ReplicatedPG::OpContext*, ceph_osd_op&)+0xf46) [0x7f1177f1d036]
3: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x3948) [0x7f1177f20ac8]
4: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x97) [0x7f1177f3c377]
5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x850) [0x7f1177f3d170]
6: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1d27) [0x7f1177f40297]
7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dd) [0x7f1177eddfcd]
8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f1177d4011d]
9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f1177d4033d]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x7f1177d65154]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f11782cf57f]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f11782d1480]
13: (()+0x8182) [0x7f1176695182]
14: (clone()+0x6d) [0x7f11749dc47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The bug is that with a full cache, we return true from maybe_handle_cache without promoting. _rollback_to needs to handle this properly. I think that actually maybe_handle_cache needs to handle the blocking stuff, and we need a "head objects blocked on full" map as well.

Actions #1

Updated by Sage Weil over 8 years ago

  • Status changed from New to 7
Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF