Bug #9732
ReplicatedPG::hit_set_trim osd/ReplicatedPG.cc: 11006: FAILED assert(obc)
0%
Description
The timezone of the machine was incorrect CDT instead of CEST. All other machines (MON and OSD) are on CEST.
On a Firefly cluster upgraded from 0.80.5 to 0.80.6 one OSD fails repeatedly. It has been added to the cluster after the upgrade to 0.80.6. The cluster otherwise runs fine.
/dev/sda2 on /var/lib/ceph/osd/ceph-11 type btrfs (rw,noatime,ssd,space_cache,user_subvol_rm_allowed)
with a journal collocated.
The content of the osd is on teuthology:loic/ceph-11 and the full logs on teuthology:loic/ceph-osd.11.log
-52> 2014-10-10 09:37:22.955811 7f4efa184700 1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3940 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f105421c0 con 0x7f4f111e9600 -51> 2014-10-10 09:37:22.955835 7f4efa184700 3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854] -50> 2014-10-10 09:37:22.955844 7f4efa184700 1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3941 ==== osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 ==== 1768+0+1155 (4245841136 0 1519041541) 0x7f4f1bc19700 con 0x7f4f111e9600 -49> 2014-10-10 09:37:22.955857 7f4efa184700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: header_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -48> 2014-10-10 09:37:22.955865 7f4efa184700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: throttled, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -47> 2014-10-10 09:37:22.955872 7f4efa184700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955797, event: all_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -46> 2014-10-10 09:37:22.955877 7f4efa184700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955855, event: dispatched, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -45> 2014-10-10 09:37:22.955887 7f4efa184700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955887, event: waiting_for_osdmap, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -44> 2014-10-10 09:37:22.956035 7f4ef4178700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956034, event: reached_pg, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -43> 2014-10-10 09:37:22.956071 7f4ef4178700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956071, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -42> 2014-10-10 09:37:22.956276 7f4ef4178700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956276, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -41> 2014-10-10 09:37:22.956319 7f4ef4178700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956318, event: commit_queued_for_journal_write, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -40> 2014-10-10 09:37:22.957720 7f4efe0ab700 5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.957720, event: sub_op_applied, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 -39> 2014-10-10 09:37:22.957752 7f4efe0ab700 1 -- 192.168.99.247:6812/25256 --> osd.3 192.168.99.251:6804/2024042 -- osd_sub_op_reply(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] ack, result = 0) v2 -- ?+0 0x7f4f2359be00 -38> 2014-10-10 09:37:23.095231 7f4f008b0700 5 -- op tracker -- , seq: 10902, time: 2014-10-10 09:37:23.095231, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714021 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309156 snapset=0=[]:[] snapc=0=[]) v11 -37> 2014-10-10 09:37:23.095269 7f4f008b0700 5 -- op tracker -- , seq: 10903, time: 2014-10-10 09:37:23.095269, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714020 64.61 3d27fb61/rb.0.100889.238e1f29.0000000000c3/head//64 [] v 10851'567301 snapset=0=[]:[] snapc=0=[]) v11 -36> 2014-10-10 09:37:23.095263 7f4eff8ae700 1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(a38f6dba/rbd_data.e179f4a979b84.000000000000058f/head//64),PushReplyOp(7bb45eba/rb.0.91ea.2ae8944a.000000001200/head//64)]) v2 -- ?+0 0x7f4f0f863e00 con 0x7f4f111e9340 -35> 2014-10-10 09:37:23.095294 7f4f008b0700 5 -- op tracker -- , seq: 10904, time: 2014-10-10 09:37:23.095294, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714022 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309157 snapset=0=[]:[] snapc=0=[]) v11 -34> 2014-10-10 09:37:23.095294 7f4eff8ae700 1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(e87c8fba/rb.0.100889.238e1f29.00000000007b/head//64)]) v2 -- ?+0 0x7f4f0e619840 con 0x7f4f111e9340 -33> 2014-10-10 09:37:23.095959 7f4efa184700 1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2124 ==== osd_map(10852..10854 src has 8642..10854) v3 ==== 633+0+0 (49899712 0 0) 0x7f4f172766c0 con 0x7f4f111e9340 -32> 2014-10-10 09:37:23.095991 7f4efa184700 3 osd.11 10854 handle_osd_map epochs [10852,10854], i have 10854, src has [8642,10854] -31> 2014-10-10 09:37:23.096260 7f4efa184700 1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2125 ==== pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 ==== 676+0+0 (3895076414 0 0) 0x7f4f1326d000 con 0x7f4f111e9340 -30> 2014-10-10 09:37:23.096284 7f4efa184700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096067, event: header_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -29> 2014-10-10 09:37:23.096292 7f4efa184700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096069, event: throttled, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -28> 2014-10-10 09:37:23.096297 7f4efa184700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096131, event: all_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -27> 2014-10-10 09:37:23.096300 7f4efa184700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096279, event: dispatched, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -26> 2014-10-10 09:37:23.096304 7f4efa184700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096304, event: waiting_for_osdmap, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -25> 2014-10-10 09:37:23.096445 7f4ef4178700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096445, event: reached_pg, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -24> 2014-10-10 09:37:23.096463 7f4ef4178700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096462, event: started, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -23> 2014-10-10 09:37:23.096473 7f4ef4178700 1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- pg_backfill(finish_ack 64.3a e 10854/10854 lb 0//0//-1) v3 -- ?+0 0x7f4f198a3400 con 0x7f4f111e9340 -22> 2014-10-10 09:37:23.096568 7f4ef4178700 5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096567, event: done, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 -21> 2014-10-10 09:37:23.096597 7f4ef4178700 5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] exit Started/ReplicaActive/RepRecovering 12.134041 28 0.000092 -20> 2014-10-10 09:37:23.096628 7f4ef4178700 5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] enter Started/ReplicaActive/RepNotRecovering -19> 2014-10-10 09:37:23.096698 7f4ef4178700 1 -- 192.168.99.247:6812/25256 --> osd.6 192.168.99.253:6801/4015 -- MBackfillReserve GRANT pgid: 64.5f, query_epoch: 10854 v3 -- ?+0 0x7f4f118c5800 -18> 2014-10-10 09:37:23.096716 7f4ef4178700 5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 32.009482 64 0.000109 -17> 2014-10-10 09:37:23.096736 7f4ef4178700 5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] enter Started/ReplicaActive/RepRecovering -16> 2014-10-10 09:37:23.096899 7f4f008b0700 5 -- op tracker -- , seq: 10907, time: 2014-10-10 09:37:23.096899, event: write_thread_in_journal_buffer, request: osd_op(client.1383897.0:1239453 rb.0.76e6d.2ae8944a.0000000004c5 [set-alloc-hint object_size 4194304 write_size 4194304,write 2047488~114688] 64.87f43319 ack+ondisk+write e10850) v4 -15> 2014-10-10 09:37:23.096932 7f4f008b0700 5 -- op tracker -- , seq: 10908, time: 2014-10-10 09:37:23.096932, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1383897.0:1239449 64.61 ea6fd561/rb.0.76e6d.2ae8944a.000000000706/head//64 [] v 10851'567302 snapset=0=[]:[] snapc=0=[]) v11 -14> 2014-10-10 09:37:23.097258 7f4efa184700 1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2126 ==== pg_info(1 pgs e10854:64.3a) v4 ==== 740+0+0 (2282156553 0 0) 0x7f4f1054ff80 con 0x7f4f111e9340 -13> 2014-10-10 09:37:23.097280 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097129, event: header_read, request: pg_info(1 pgs e10854:64.3a) v4 -12> 2014-10-10 09:37:23.097287 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097131, event: throttled, request: pg_info(1 pgs e10854:64.3a) v4 -11> 2014-10-10 09:37:23.097291 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097191, event: all_read, request: pg_info(1 pgs e10854:64.3a) v4 -10> 2014-10-10 09:37:23.097295 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097276, event: dispatched, request: pg_info(1 pgs e10854:64.3a) v4 -9> 2014-10-10 09:37:23.097300 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097299, event: waiting_for_osdmap, request: pg_info(1 pgs e10854:64.3a) v4 -8> 2014-10-10 09:37:23.097306 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097306, event: started, request: pg_info(1 pgs e10854:64.3a) v4 -7> 2014-10-10 09:37:23.097326 7f4efa184700 5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097326, event: done, request: pg_info(1 pgs e10854:64.3a) v4 -6> 2014-10-10 09:37:23.106323 7f4efb186700 1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 220 ==== osd_map(10850..10854 src has 8642..10854) v3 ==== 1076+0+0 (1334635336 0 0) 0x7f4f0fb9b440 con 0x7f4f0e6ec3c0 -5> 2014-10-10 09:37:23.106373 7f4efb186700 3 osd.11 10854 handle_osd_map epochs [10850,10854], i have 10854, src has [8642,10854] -4> 2014-10-10 09:37:23.116946 7f4efb186700 1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 221 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f0fb9a240 con 0x7f4f0e6ec3c0 -3> 2014-10-10 09:37:23.116969 7f4efb186700 3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854] -2> 2014-10-10 09:37:23.116978 7f4efb186700 1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 222 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2686169588 0 0) 0x7f4f175d8a00 con 0x7f4f0e6ec3c0 -1> 2014-10-10 09:37:23.116987 7f4efb186700 10 monclient: handle_subscribe_ack sent 2014-10-10 09:37:22.862066 renew after 2014-10-10 09:39:52.862066 0> 2014-10-10 09:37:23.168342 7f4ef4979700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7f4ef4979700 time 2014-10-10 09:37:23.072077 osd/ReplicatedPG.cc: 11006: FAILED assert(obc) ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae) 1: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x5ff) [0x7f4f0b7e088f] 2: (ReplicatedPG::hit_set_persist()+0xe37) [0x7f4f0b7e1837] 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x17eb) [0x7f4f0b7f90bb] 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x6cc) [0x7f4f0b792f5c] 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x39e) [0x7f4f0b5cc14e] 6: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1df) [0x7f4f0b5e8b1f] 7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x7f4f0b62d67c] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x13b0) [0x7f4f0ba8c0e0] 9: (ThreadPool::WorkThread::entry()+0x10) [0x7f4f0ba8cd90] 10: (()+0x80a4) [0x7f4f0a84b0a4] 11: (clone()+0x6d) [0x7f4f08faec2d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Related issues
Associated revisions
osd: use GMT time for the object name of hitsets
- bump the encoding version of pg_hit_set_info_t to 2, so we can
tell if the corresponding hit_set is named using localtime or
GMT - bump the encoding version of pg_pool_t to 20, so we can know
if a pool is using GMT to name the hit_set archive or not. and
we can tell if current cluster allows OSDs not support GMT
mode or not. - add an option named `osd_pool_use_gmt_hitset`. if enabled,
the cluster will try to use GMT mode when creating a new pool
if all the the up OSDs support GMT mode. if any of the
pools in the cluster is using GMT mode, then only OSDs
supporting GMT mode are allowed to join the cluster.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd
allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
osd: use GMT time for the object name of hitsets
- bump the encoding version of pg_hit_set_info_t to 2, so we can
tell if the corresponding hit_set is named using localtime or
GMT - bump the encoding version of pg_pool_t to 20, so we can know
if a pool is using GMT to name the hit_set archive or not. and
we can tell if current cluster allows OSDs not support GMT
mode or not. - add an option named `osd_pool_use_gmt_hitset`. if enabled,
the cluster will try to use GMT mode when creating a new pool
if all the the up OSDs support GMT mode. if any of the
pools in the cluster is using GMT mode, then only OSDs
supporting GMT mode are allowed to join the cluster.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 42f8c5daad16aa849a0b99871d50161673c0c370)
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.
mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd
allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 03a1a3cf023a9aeb2fa26820e49e5efe3f3b3789)
osd: use GMT time for the object name of hitsets
- bump the encoding version of pg_hit_set_info_t to 2, so we can
tell if the corresponding hit_set is named using localtime or
GMT - bump the encoding version of pg_pool_t to 20, so we can know
if a pool is using GMT to name the hit_set archive or not. and
we can tell if current cluster allows OSDs not support GMT
mode or not. - add an option named `osd_pool_use_gmt_hitset`. if enabled,
the cluster will try to use GMT mode when creating a new pool
if all the the up OSDs support GMT mode. if any of the
pools in the cluster is using GMT mode, then only OSDs
supporting GMT mode are allowed to join the cluster.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 42f8c5daad16aa849a0b99871d50161673c0c370)
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.
mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd
allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.
Fixes: #9732
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 03a1a3cf023a9aeb2fa26820e49e5efe3f3b3789)
History
#1 Updated by Loïc Dachary over 9 years ago
- Description updated (diff)
#2 Updated by Loïc Dachary over 9 years ago
- Description updated (diff)
#3 Updated by Samuel Just about 9 years ago
- Status changed from New to Resolved
#4 Updated by Loïc Dachary almost 9 years ago
- Regression set to No
Is there a link to read more about how it was resolved ?
#5 Updated by Samuel Just almost 9 years ago
I think I blamed this on the timezones, though I do not remember how.
#6 Updated by Samuel Just almost 9 years ago
- Status changed from Resolved to 12
This...is not resolved! The utime_t->hobject_t mapping is timezone dependent. Needs to be not timezone dependent when generating the archive object names.
#7 Updated by Loïc Dachary almost 9 years ago
- Backport set to hammer
#8 Updated by Sage Weil almost 9 years ago
- Assignee set to Kefu Chai
#9 Updated by Kefu Chai almost 9 years ago
we can simply change ReplicatedPG::get_hit_set_current_object()
and ReplicatedPG::get_hit_set_archive_object()
, to let them use the GMT. but will also take care of the migration/upgrade of the hitset objects from local time to GMT time. may I bump the cur_struct_v to 9 for this change?
should bump the ``history'' structure's version # for encoding/decoding instead.
see pg_hit_set_info_t::encode()
#10 Updated by Kefu Chai almost 9 years ago
- Status changed from 12 to Fix Under Review
#11 Updated by Kefu Chai over 8 years ago
- Status changed from Fix Under Review to In Progress
sam reminded me:
- we don't want to convert an existing OSD automatically, from localtime to GMT after upgrading to the version with the GMT fix
- because other osds might come up with the old version
- might flip a switch on the mon to enable the new behavior
- or in pg_pool_t use_gmt_hitset = true
- and setting that changes the min features the mon allows for booting osds
- need to consider the mixed version case
- you'll have to introduce a GMT feature bit
- and check the actingset features and either use gmt or local time
- local time if any of the replicas do not support gmt
- and then once gmt is being used, you still have to be able to trim the old localtime objects
- need to add this case to upgrade tests (talk to yuriw)
- should be doing a cache/tiering during upgrade
- the case is for
- hammer to later hammer point release tests and also
- hammer to current master/infernalis
- if we want to backport this to hammer, we will want to make sure it's tested in the firefly->hammer tests
#12 Updated by Kefu Chai over 8 years ago
another occurrence: /a/sage-2015-08-12_14:04:07-rados-wip-sage-testing---basic-multi/1012342/remote/plana91/log/ceph-osd.3.log.gz
2015-08-12 20:41:57.367610 7f2d908a5700 10 osd.3 pg_epoch: 788 pg[128.6( v 788'4 (0'0,788'4] local-les=784 n=2 ec=783 les/c 784/784 783/783/783) [0,3] r=1 lpr=784 luod=0'0 lua=788' 3 crt=788'1 lcod 788'2 active] add_log_entry 788'4 (0'0) modify 128/6:.ceph-internal/hit_set_128.6_archive_2015-08-12 23:41:54.252808_2015-08-12 23:41:57.366034/head by unknown.0 .0:0 2015-08-12 20:41:57.366417 .... 2015-08-12 20:42:08.435416 7f2d930aa700 10 osd.3 pg_epoch: 792 pg[128.6( v 792'16 (0'0,792'16] local-les=790 n=4 ec=783 les/c 790/790 789/789/789) [3,0] r=0 lpr=789 crt=790'14 lcod 791'15 mlcod 791'15 active+clean] get_object_context: no obc for soid 128/6:.ceph-internal/hit_set_128.6_archive_2015-08-12 20:41:54.252808_2015-08-12 20:41:57.366034/head and !can_create
#13 Updated by Kefu Chai over 8 years ago
Kefu Chai wrote:
sam reminded me:
- we don't want to convert an existing OSD automatically, from localtime to GMT after upgrading to the version with the GMT fix
- because other osds might come up with the old version
- might flip a switch on the mon to enable the new behavior
- or in pg_pool_t use_gmt_hitset = true
- and setting that changes the min features the mon allows for booting osds
can we check an OSD if it will be serving a pool when it is trying to join the cluster without using the crush rule? in other words, one will need to figure out if this OSD will be involved when writing objects in a pool. is it the approach you are suggesting?
but we can always check the feature bits of current osdmap against that of the new OSD for sure. but that will make GMT a cluster-wide setting, and it will basically prevent the co-existance of new and old OSDs in the same cluster.
- need to consider the mixed version case
- you'll have to introduce a GMT feature bit
- and check the actingset features and either use gmt or local time
- local time if any of the replicas do not support gmt
after the peering, the actingset might have been changed. so a gmt PG could be turned into a localtime PG if a replica OSD does not support gmt. and what if the primary decides to trim/get a hit-set-archive? if will be using localtime instead of gmt. but previously, this PG was using gmt. will will blow up the OSD again.
so the expected upgrade process is:
- assuming we have a pg [0,1,2], where 0,1,2 are all "old" OSDs using localtime
- add osd.3 which understands GMT
- mark 2 out, backfill to osd.3
- pg is now [0,1,3]
- repeat the steps, until we have all OSDs replaced with new ones
- flip a switch on the mon.
- the GMT feature bit is not set by default on existing cluster (but enabled on a fresh cluster). i am not sure how to detect it. as osdmap always figures out the features by looking at its osd, erasure plugins, etc. but we can either add a
use_gmt_hitset
field in osdmap and bump the osdmapstrcut_v
. make it false in lower versions. or, we can make it a field inpg_pool_t
as you suggested. but it is still a cluster-wide setting. - monitor will set the GMT feature bit if all quorum and all up osds in osdmap support it
- monitor will not allow OSD without the GMT feature bit to join the cluster anymore, once the new osdmap with this feature bit is accepted
- the GMT feature bit is not set by default on existing cluster (but enabled on a fresh cluster). i am not sure how to detect it. as osdmap always figures out the features by looking at its osd, erasure plugins, etc. but we can either add a
sam, what do you think?
#14 Updated by Samuel Just over 8 years ago
I think it needs to be cluster wide, and it just means that new and old can't coexist once you have flipped the switch. While the cluster is mixed, you just leave it (either the whole cluster, or all of the pools depending on whether it's a cluster config, or a per-pool config) in localtime mode.
#15 Updated by Kefu Chai over 8 years ago
- Status changed from In Progress to Fix Under Review
#16 Updated by Kefu Chai over 8 years ago
sam, just want to make sure i understand your suggestion correctly, so the test would be something like:
- setup a hammer cluster
- set a cache tier over a pool
- start exercising rados op on the pool
- upgrade all mon
- upgrade some of the osds to infernalis
ceph osd set sortbitmap
// should failceph set pool set $pool use_gmt_hitset true
// should fail- upgrade the remaining osds
ceph osd set sortbitmap
// should successceph set pool set $pool use_gmt_hitset true
// should success
#17 Updated by Kefu Chai over 8 years ago
- Status changed from Fix Under Review to Pending Backport
#18 Updated by Samuel Just about 8 years ago
- Duplicated by Bug #14399: "ReplicatedPG.cc: 10483: FAILED assert(obc)" in rados-hammer-distro-basic-mira added
#19 Updated by Nathan Cutler almost 8 years ago
- Status changed from Pending Backport to Resolved