Project

General

Profile

Actions

Bug #9732

closed

ReplicatedPG::hit_set_trim osd/ReplicatedPG.cc: 11006: FAILED assert(obc)

Added by Loïc Dachary over 9 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The timezone of the machine was incorrect CDT instead of CEST. All other machines (MON and OSD) are on CEST.

On a Firefly cluster upgraded from 0.80.5 to 0.80.6 one OSD fails repeatedly. It has been added to the cluster after the upgrade to 0.80.6. The cluster otherwise runs fine.

/dev/sda2 on /var/lib/ceph/osd/ceph-11 type btrfs (rw,noatime,ssd,space_cache,user_subvol_rm_allowed)

with a journal collocated.
The content of the osd is on teuthology:loic/ceph-11 and the full logs on teuthology:loic/ceph-osd.11.log
   -52> 2014-10-10 09:37:22.955811 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3940 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f105421c0 con 0x7f4f111e9600
   -51> 2014-10-10 09:37:22.955835 7f4efa184700  3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854]
   -50> 2014-10-10 09:37:22.955844 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3941 ==== osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 ==== 1768+0+1155 (4245841136 0 1519041541) 0x7f4f1bc19700 con 0x7f4f111e9600
   -49> 2014-10-10 09:37:22.955857 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: header_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -48> 2014-10-10 09:37:22.955865 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: throttled, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -47> 2014-10-10 09:37:22.955872 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955797, event: all_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -46> 2014-10-10 09:37:22.955877 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955855, event: dispatched, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -45> 2014-10-10 09:37:22.955887 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955887, event: waiting_for_osdmap, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -44> 2014-10-10 09:37:22.956035 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956034, event: reached_pg, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -43> 2014-10-10 09:37:22.956071 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956071, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -42> 2014-10-10 09:37:22.956276 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956276, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -41> 2014-10-10 09:37:22.956319 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956318, event: commit_queued_for_journal_write, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -40> 2014-10-10 09:37:22.957720 7f4efe0ab700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.957720, event: sub_op_applied, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -39> 2014-10-10 09:37:22.957752 7f4efe0ab700  1 -- 192.168.99.247:6812/25256 --> osd.3 192.168.99.251:6804/2024042 -- osd_sub_op_reply(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] ack, result = 0) v2 -- ?+0 0x7f4f2359be00
   -38> 2014-10-10 09:37:23.095231 7f4f008b0700  5 -- op tracker -- , seq: 10902, time: 2014-10-10 09:37:23.095231, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714021 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309156 snapset=0=[]:[] snapc=0=[]) v11
   -37> 2014-10-10 09:37:23.095269 7f4f008b0700  5 -- op tracker -- , seq: 10903, time: 2014-10-10 09:37:23.095269, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714020 64.61 3d27fb61/rb.0.100889.238e1f29.0000000000c3/head//64 [] v 10851'567301 snapset=0=[]:[] snapc=0=[]) v11
   -36> 2014-10-10 09:37:23.095263 7f4eff8ae700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(a38f6dba/rbd_data.e179f4a979b84.000000000000058f/head//64),PushReplyOp(7bb45eba/rb.0.91ea.2ae8944a.000000001200/head//64)]) v2 -- ?+0 0x7f4f0f863e00 con 0x7f4f111e9340
   -35> 2014-10-10 09:37:23.095294 7f4f008b0700  5 -- op tracker -- , seq: 10904, time: 2014-10-10 09:37:23.095294, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714022 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309157 snapset=0=[]:[] snapc=0=[]) v11
   -34> 2014-10-10 09:37:23.095294 7f4eff8ae700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(e87c8fba/rb.0.100889.238e1f29.00000000007b/head//64)]) v2 -- ?+0 0x7f4f0e619840 con 0x7f4f111e9340
   -33> 2014-10-10 09:37:23.095959 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2124 ==== osd_map(10852..10854 src has 8642..10854) v3 ==== 633+0+0 (49899712 0 0) 0x7f4f172766c0 con 0x7f4f111e9340
   -32> 2014-10-10 09:37:23.095991 7f4efa184700  3 osd.11 10854 handle_osd_map epochs [10852,10854], i have 10854, src has [8642,10854]
   -31> 2014-10-10 09:37:23.096260 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2125 ==== pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 ==== 676+0+0 (3895076414 0 0) 0x7f4f1326d000 con 0x7f4f111e9340
   -30> 2014-10-10 09:37:23.096284 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096067, event: header_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -29> 2014-10-10 09:37:23.096292 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096069, event: throttled, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -28> 2014-10-10 09:37:23.096297 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096131, event: all_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -27> 2014-10-10 09:37:23.096300 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096279, event: dispatched, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -26> 2014-10-10 09:37:23.096304 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096304, event: waiting_for_osdmap, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -25> 2014-10-10 09:37:23.096445 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096445, event: reached_pg, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -24> 2014-10-10 09:37:23.096463 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096462, event: started, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -23> 2014-10-10 09:37:23.096473 7f4ef4178700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- pg_backfill(finish_ack 64.3a e 10854/10854 lb 0//0//-1) v3 -- ?+0 0x7f4f198a3400 con 0x7f4f111e9340
   -22> 2014-10-10 09:37:23.096568 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096567, event: done, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -21> 2014-10-10 09:37:23.096597 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] exit Started/ReplicaActive/RepRecovering 12.134041 28 0.000092
   -20> 2014-10-10 09:37:23.096628 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] enter Started/ReplicaActive/RepNotRecovering
   -19> 2014-10-10 09:37:23.096698 7f4ef4178700  1 -- 192.168.99.247:6812/25256 --> osd.6 192.168.99.253:6801/4015 -- MBackfillReserve GRANT  pgid: 64.5f, query_epoch: 10854 v3 -- ?+0 0x7f4f118c5800
   -18> 2014-10-10 09:37:23.096716 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 32.009482 64 0.000109
   -17> 2014-10-10 09:37:23.096736 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] enter Started/ReplicaActive/RepRecovering
   -16> 2014-10-10 09:37:23.096899 7f4f008b0700  5 -- op tracker -- , seq: 10907, time: 2014-10-10 09:37:23.096899, event: write_thread_in_journal_buffer, request: osd_op(client.1383897.0:1239453 rb.0.76e6d.2ae8944a.0000000004c5 [set-alloc-hint object_size 4194304 write_size 4194304,write 2047488~114688] 64.87f43319 ack+ondisk+write e10850) v4
   -15> 2014-10-10 09:37:23.096932 7f4f008b0700  5 -- op tracker -- , seq: 10908, time: 2014-10-10 09:37:23.096932, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1383897.0:1239449 64.61 ea6fd561/rb.0.76e6d.2ae8944a.000000000706/head//64 [] v 10851'567302 snapset=0=[]:[] snapc=0=[]) v11
   -14> 2014-10-10 09:37:23.097258 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2126 ==== pg_info(1 pgs e10854:64.3a) v4 ==== 740+0+0 (2282156553 0 0) 0x7f4f1054ff80 con 0x7f4f111e9340
   -13> 2014-10-10 09:37:23.097280 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097129, event: header_read, request: pg_info(1 pgs e10854:64.3a) v4
   -12> 2014-10-10 09:37:23.097287 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097131, event: throttled, request: pg_info(1 pgs e10854:64.3a) v4
   -11> 2014-10-10 09:37:23.097291 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097191, event: all_read, request: pg_info(1 pgs e10854:64.3a) v4
   -10> 2014-10-10 09:37:23.097295 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097276, event: dispatched, request: pg_info(1 pgs e10854:64.3a) v4
    -9> 2014-10-10 09:37:23.097300 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097299, event: waiting_for_osdmap, request: pg_info(1 pgs e10854:64.3a) v4
    -8> 2014-10-10 09:37:23.097306 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097306, event: started, request: pg_info(1 pgs e10854:64.3a) v4
    -7> 2014-10-10 09:37:23.097326 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097326, event: done, request: pg_info(1 pgs e10854:64.3a) v4
    -6> 2014-10-10 09:37:23.106323 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 220 ==== osd_map(10850..10854 src has 8642..10854) v3 ==== 1076+0+0 (1334635336 0 0) 0x7f4f0fb9b440 con 0x7f4f0e6ec3c0
    -5> 2014-10-10 09:37:23.106373 7f4efb186700  3 osd.11 10854 handle_osd_map epochs [10850,10854], i have 10854, src has [8642,10854]
    -4> 2014-10-10 09:37:23.116946 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 221 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f0fb9a240 con 0x7f4f0e6ec3c0
    -3> 2014-10-10 09:37:23.116969 7f4efb186700  3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854]
    -2> 2014-10-10 09:37:23.116978 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 222 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2686169588 0 0) 0x7f4f175d8a00 con 0x7f4f0e6ec3c0
    -1> 2014-10-10 09:37:23.116987 7f4efb186700 10 monclient: handle_subscribe_ack sent 2014-10-10 09:37:22.862066 renew after 2014-10-10 09:39:52.862066
     0> 2014-10-10 09:37:23.168342 7f4ef4979700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7f4ef4979700 time 2014-10-10 09:37:23.072077
osd/ReplicatedPG.cc: 11006: FAILED assert(obc)

 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)
 1: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x5ff) [0x7f4f0b7e088f]
 2: (ReplicatedPG::hit_set_persist()+0xe37) [0x7f4f0b7e1837]
 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x17eb) [0x7f4f0b7f90bb]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x6cc) [0x7f4f0b792f5c]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x39e) [0x7f4f0b5cc14e]
 6: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1df) [0x7f4f0b5e8b1f]
 7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x7f4f0b62d67c]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x13b0) [0x7f4f0ba8c0e0]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x7f4f0ba8cd90]
 10: (()+0x80a4) [0x7f4f0a84b0a4]
 11: (clone()+0x6d) [0x7f4f08faec2d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues 6 (1 open5 closed)

Related to Ceph - Tasks #12797: create the upgrade test suite for gmt and sortbitmap changeIn ProgressKefu Chai08/26/2015

Actions
Related to Ceph - Bug #12968: mon/OSDMonitor.cc: 1864: FAILED assert(osdmap.get_num_up_osds() == 0 || osdmap.get_up_osd_features() & (1ULL<<54))ResolvedKefu Chai09/05/2015

Actions
Has duplicate Ceph - Bug #11386: "osd/ReplicatedPG.cc: 11268: FAILED assert(obc)" in smoke-giant-distro-basic-magnaDuplicateSamuel Just04/13/2015

Actions
Has duplicate Ceph - Bug #11660: "osd/ReplicatedPG.cc: 10405: FAILED assert(obc)" in rados-hammer-distro-basic-magnaDuplicateLoïc Dachary05/16/2015

Actions
Has duplicate Ceph - Bug #14399: "ReplicatedPG.cc: 10483: FAILED assert(obc)" in rados-hammer-distro-basic-miraDuplicateKefu Chai01/18/2016

Actions
Copied to Ceph - Backport #12848: ReplicatedPG::hit_set_trim osd/ReplicatedPG.cc: 11006: FAILED assert(obc)ResolvedKefu ChaiActions
Actions

Also available in: Atom PDF