Project

General

Profile

Bug #9732

ReplicatedPG::hit_set_trim osd/ReplicatedPG.cc: 11006: FAILED assert(obc)

Added by Loïc Dachary about 8 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The timezone of the machine was incorrect CDT instead of CEST. All other machines (MON and OSD) are on CEST.

On a Firefly cluster upgraded from 0.80.5 to 0.80.6 one OSD fails repeatedly. It has been added to the cluster after the upgrade to 0.80.6. The cluster otherwise runs fine.

/dev/sda2 on /var/lib/ceph/osd/ceph-11 type btrfs (rw,noatime,ssd,space_cache,user_subvol_rm_allowed)

with a journal collocated.
The content of the osd is on teuthology:loic/ceph-11 and the full logs on teuthology:loic/ceph-osd.11.log
   -52> 2014-10-10 09:37:22.955811 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3940 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f105421c0 con 0x7f4f111e9600
   -51> 2014-10-10 09:37:22.955835 7f4efa184700  3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854]
   -50> 2014-10-10 09:37:22.955844 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.3 192.168.99.251:6804/2024042 3941 ==== osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11 ==== 1768+0+1155 (4245841136 0 1519041541) 0x7f4f1bc19700 con 0x7f4f111e9600
   -49> 2014-10-10 09:37:22.955857 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: header_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -48> 2014-10-10 09:37:22.955865 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955734, event: throttled, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -47> 2014-10-10 09:37:22.955872 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955797, event: all_read, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -46> 2014-10-10 09:37:22.955877 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955855, event: dispatched, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -45> 2014-10-10 09:37:22.955887 7f4efa184700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.955887, event: waiting_for_osdmap, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -44> 2014-10-10 09:37:22.956035 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956034, event: reached_pg, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -43> 2014-10-10 09:37:22.956071 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956071, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -42> 2014-10-10 09:37:22.956276 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956276, event: started, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -41> 2014-10-10 09:37:22.956319 7f4ef4178700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.956318, event: commit_queued_for_journal_write, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -40> 2014-10-10 09:37:22.957720 7f4efe0ab700  5 -- op tracker -- , seq: 10964, time: 2014-10-10 09:37:22.957720, event: sub_op_applied, request: osd_sub_op(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] v 10854'136362 snapset=0=[]:[] snapc=0=[], has_updated_hit_set_history) v11
   -39> 2014-10-10 09:37:22.957752 7f4efe0ab700  1 -- 192.168.99.247:6812/25256 --> osd.3 192.168.99.251:6804/2024042 -- osd_sub_op_reply(osd.3.0:2897 64.70 70/hit_set_64.70_archive_2014-10-10 16:36:20.384563_2014-10-10 16:37:22.956668/head/.ceph-internal/64 [] ack, result = 0) v2 -- ?+0 0x7f4f2359be00
   -38> 2014-10-10 09:37:23.095231 7f4f008b0700  5 -- op tracker -- , seq: 10902, time: 2014-10-10 09:37:23.095231, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714021 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309156 snapset=0=[]:[] snapc=0=[]) v11
   -37> 2014-10-10 09:37:23.095269 7f4f008b0700  5 -- op tracker -- , seq: 10903, time: 2014-10-10 09:37:23.095269, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714020 64.61 3d27fb61/rb.0.100889.238e1f29.0000000000c3/head//64 [] v 10851'567301 snapset=0=[]:[] snapc=0=[]) v11
   -36> 2014-10-10 09:37:23.095263 7f4eff8ae700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(a38f6dba/rbd_data.e179f4a979b84.000000000000058f/head//64),PushReplyOp(7bb45eba/rb.0.91ea.2ae8944a.000000001200/head//64)]) v2 -- ?+0 0x7f4f0f863e00 con 0x7f4f111e9340
   -35> 2014-10-10 09:37:23.095294 7f4f008b0700  5 -- op tracker -- , seq: 10904, time: 2014-10-10 09:37:23.095294, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1351789.0:714022 64.3a 2e78d23a/rb.0.100889.238e1f29.000000000142/head//64 [] v 10851'309157 snapset=0=[]:[] snapc=0=[]) v11
   -34> 2014-10-10 09:37:23.095294 7f4eff8ae700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- MOSDPGPushReply(64.3a 10851 [PushReplyOp(e87c8fba/rb.0.100889.238e1f29.00000000007b/head//64)]) v2 -- ?+0 0x7f4f0e619840 con 0x7f4f111e9340
   -33> 2014-10-10 09:37:23.095959 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2124 ==== osd_map(10852..10854 src has 8642..10854) v3 ==== 633+0+0 (49899712 0 0) 0x7f4f172766c0 con 0x7f4f111e9340
   -32> 2014-10-10 09:37:23.095991 7f4efa184700  3 osd.11 10854 handle_osd_map epochs [10852,10854], i have 10854, src has [8642,10854]
   -31> 2014-10-10 09:37:23.096260 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2125 ==== pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3 ==== 676+0+0 (3895076414 0 0) 0x7f4f1326d000 con 0x7f4f111e9340
   -30> 2014-10-10 09:37:23.096284 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096067, event: header_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -29> 2014-10-10 09:37:23.096292 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096069, event: throttled, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -28> 2014-10-10 09:37:23.096297 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096131, event: all_read, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -27> 2014-10-10 09:37:23.096300 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096279, event: dispatched, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -26> 2014-10-10 09:37:23.096304 7f4efa184700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096304, event: waiting_for_osdmap, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -25> 2014-10-10 09:37:23.096445 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096445, event: reached_pg, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -24> 2014-10-10 09:37:23.096463 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096462, event: started, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -23> 2014-10-10 09:37:23.096473 7f4ef4178700  1 -- 192.168.99.247:6812/25256 --> 192.168.99.252:6812/6028125 -- pg_backfill(finish_ack 64.3a e 10854/10854 lb 0//0//-1) v3 -- ?+0 0x7f4f198a3400 con 0x7f4f111e9340
   -22> 2014-10-10 09:37:23.096568 7f4ef4178700  5 -- op tracker -- , seq: 10965, time: 2014-10-10 09:37:23.096567, event: done, request: pg_backfill(finish 64.3a e 10854/10854 lb MAX) v3
   -21> 2014-10-10 09:37:23.096597 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] exit Started/ReplicaActive/RepRecovering 12.134041 28 0.000092
   -20> 2014-10-10 09:37:23.096628 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.3a( v 10851'309157 (9562'306048,10851'309157] local-les=0 n=42 ec=1709 les/c 10819/10673 10741/10818/10691) [7,11]/[7,6] r=-1 lpr=10819 pi=10671-10817/8 luod=0'0 crt=10844'309155 lcod 10844'309155 active] enter Started/ReplicaActive/RepNotRecovering
   -19> 2014-10-10 09:37:23.096698 7f4ef4178700  1 -- 192.168.99.247:6812/25256 --> osd.6 192.168.99.253:6801/4015 -- MBackfillReserve GRANT  pgid: 64.5f, query_epoch: 10854 v3 -- ?+0 0x7f4f118c5800
   -18> 2014-10-10 09:37:23.096716 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 32.009482 64 0.000109
   -17> 2014-10-10 09:37:23.096736 7f4ef4178700  5 osd.11 pg_epoch: 10854 pg[64.5f( v 10844'483555 (9631'480547,10844'483555] lb 0//0//-1 local-les=0 n=0 ec=1709 les/c 10819/10682 10817/10818/10817) [6,11]/[6,7] r=-1 lpr=10818 pi=9580-10817/51 luod=0'0 crt=10844'483555 lcod 10844'483554 active+remapped] enter Started/ReplicaActive/RepRecovering
   -16> 2014-10-10 09:37:23.096899 7f4f008b0700  5 -- op tracker -- , seq: 10907, time: 2014-10-10 09:37:23.096899, event: write_thread_in_journal_buffer, request: osd_op(client.1383897.0:1239453 rb.0.76e6d.2ae8944a.0000000004c5 [set-alloc-hint object_size 4194304 write_size 4194304,write 2047488~114688] 64.87f43319 ack+ondisk+write e10850) v4
   -15> 2014-10-10 09:37:23.096932 7f4f008b0700  5 -- op tracker -- , seq: 10908, time: 2014-10-10 09:37:23.096932, event: write_thread_in_journal_buffer, request: osd_sub_op(client.1383897.0:1239449 64.61 ea6fd561/rb.0.76e6d.2ae8944a.000000000706/head//64 [] v 10851'567302 snapset=0=[]:[] snapc=0=[]) v11
   -14> 2014-10-10 09:37:23.097258 7f4efa184700  1 -- 192.168.99.247:6812/25256 <== osd.7 192.168.99.252:6812/6028125 2126 ==== pg_info(1 pgs e10854:64.3a) v4 ==== 740+0+0 (2282156553 0 0) 0x7f4f1054ff80 con 0x7f4f111e9340
   -13> 2014-10-10 09:37:23.097280 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097129, event: header_read, request: pg_info(1 pgs e10854:64.3a) v4
   -12> 2014-10-10 09:37:23.097287 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097131, event: throttled, request: pg_info(1 pgs e10854:64.3a) v4
   -11> 2014-10-10 09:37:23.097291 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097191, event: all_read, request: pg_info(1 pgs e10854:64.3a) v4
   -10> 2014-10-10 09:37:23.097295 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097276, event: dispatched, request: pg_info(1 pgs e10854:64.3a) v4
    -9> 2014-10-10 09:37:23.097300 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097299, event: waiting_for_osdmap, request: pg_info(1 pgs e10854:64.3a) v4
    -8> 2014-10-10 09:37:23.097306 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097306, event: started, request: pg_info(1 pgs e10854:64.3a) v4
    -7> 2014-10-10 09:37:23.097326 7f4efa184700  5 -- op tracker -- , seq: 10966, time: 2014-10-10 09:37:23.097326, event: done, request: pg_info(1 pgs e10854:64.3a) v4
    -6> 2014-10-10 09:37:23.106323 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 220 ==== osd_map(10850..10854 src has 8642..10854) v3 ==== 1076+0+0 (1334635336 0 0) 0x7f4f0fb9b440 con 0x7f4f0e6ec3c0
    -5> 2014-10-10 09:37:23.106373 7f4efb186700  3 osd.11 10854 handle_osd_map epochs [10850,10854], i have 10854, src has [8642,10854]
    -4> 2014-10-10 09:37:23.116946 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 221 ==== osd_map(10854..10854 src has 8642..10854) v3 ==== 219+0+0 (844825721 0 0) 0x7f4f0fb9a240 con 0x7f4f0e6ec3c0
    -3> 2014-10-10 09:37:23.116969 7f4efb186700  3 osd.11 10854 handle_osd_map epochs [10854,10854], i have 10854, src has [8642,10854]
    -2> 2014-10-10 09:37:23.116978 7f4efb186700  1 -- 192.168.99.247:6811/25256 <== mon.1 192.168.99.252:6789/0 222 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2686169588 0 0) 0x7f4f175d8a00 con 0x7f4f0e6ec3c0
    -1> 2014-10-10 09:37:23.116987 7f4efb186700 10 monclient: handle_subscribe_ack sent 2014-10-10 09:37:22.862066 renew after 2014-10-10 09:39:52.862066
     0> 2014-10-10 09:37:23.168342 7f4ef4979700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7f4ef4979700 time 2014-10-10 09:37:23.072077
osd/ReplicatedPG.cc: 11006: FAILED assert(obc)

 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)
 1: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x5ff) [0x7f4f0b7e088f]
 2: (ReplicatedPG::hit_set_persist()+0xe37) [0x7f4f0b7e1837]
 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x17eb) [0x7f4f0b7f90bb]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x6cc) [0x7f4f0b792f5c]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x39e) [0x7f4f0b5cc14e]
 6: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1df) [0x7f4f0b5e8b1f]
 7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x7f4f0b62d67c]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x13b0) [0x7f4f0ba8c0e0]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x7f4f0ba8cd90]
 10: (()+0x80a4) [0x7f4f0a84b0a4]
 11: (clone()+0x6d) [0x7f4f08faec2d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues

Related to Ceph - Tasks #12797: create the upgrade test suite for gmt and sortbitmap change In Progress 08/26/2015
Related to Ceph - Bug #12968: mon/OSDMonitor.cc: 1864: FAILED assert(osdmap.get_num_up_osds() == 0 || osdmap.get_up_osd_features() & (1ULL<<54)) Resolved 09/05/2015
Duplicated by Ceph - Bug #11386: "osd/ReplicatedPG.cc: 11268: FAILED assert(obc)" in smoke-giant-distro-basic-magna Duplicate 04/13/2015
Duplicated by Ceph - Bug #11660: "osd/ReplicatedPG.cc: 10405: FAILED assert(obc)" in rados-hammer-distro-basic-magna Duplicate 05/16/2015
Duplicated by Ceph - Bug #14399: "ReplicatedPG.cc: 10483: FAILED assert(obc)" in rados-hammer-distro-basic-mira Duplicate 01/18/2016
Copied to Ceph - Backport #12848: ReplicatedPG::hit_set_trim osd/ReplicatedPG.cc: 11006: FAILED assert(obc) Resolved

Associated revisions

Revision 42f8c5da (diff)
Added by Kefu Chai over 7 years ago

osd: use GMT time for the object name of hitsets

  • bump the encoding version of pg_hit_set_info_t to 2, so we can
    tell if the corresponding hit_set is named using localtime or
    GMT
  • bump the encoding version of pg_pool_t to 20, so we can know
    if a pool is using GMT to name the hit_set archive or not. and
    we can tell if current cluster allows OSDs not support GMT
    mode or not.
  • add an option named `osd_pool_use_gmt_hitset`. if enabled,
    the cluster will try to use GMT mode when creating a new pool
    if all the the up OSDs support GMT mode. if any of the
    pools in the cluster is using GMT mode, then only OSDs
    supporting GMT mode are allowed to join the cluster.

Fixes: #9732
Signed-off-by: Kefu Chai <>

Revision 03a1a3cf (diff)
Added by Kefu Chai over 7 years ago

mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd

allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.

Fixes: #9732
Signed-off-by: Kefu Chai <>

Revision 040e390d (diff)
Added by Kefu Chai about 7 years ago

osd: use GMT time for the object name of hitsets

  • bump the encoding version of pg_hit_set_info_t to 2, so we can
    tell if the corresponding hit_set is named using localtime or
    GMT
  • bump the encoding version of pg_pool_t to 20, so we can know
    if a pool is using GMT to name the hit_set archive or not. and
    we can tell if current cluster allows OSDs not support GMT
    mode or not.
  • add an option named `osd_pool_use_gmt_hitset`. if enabled,
    the cluster will try to use GMT mode when creating a new pool
    if all the the up OSDs support GMT mode. if any of the
    pools in the cluster is using GMT mode, then only OSDs
    supporting GMT mode are allowed to join the cluster.

Fixes: #9732
Signed-off-by: Kefu Chai <>
(cherry picked from commit 42f8c5daad16aa849a0b99871d50161673c0c370)

Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.

Revision e8e00dab (diff)
Added by Kefu Chai about 7 years ago

mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd

allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.

Fixes: #9732
Signed-off-by: Kefu Chai <>
(cherry picked from commit 03a1a3cf023a9aeb2fa26820e49e5efe3f3b3789)

Revision 03924041 (diff)
Added by Kefu Chai over 6 years ago

osd: use GMT time for the object name of hitsets

  • bump the encoding version of pg_hit_set_info_t to 2, so we can
    tell if the corresponding hit_set is named using localtime or
    GMT
  • bump the encoding version of pg_pool_t to 20, so we can know
    if a pool is using GMT to name the hit_set archive or not. and
    we can tell if current cluster allows OSDs not support GMT
    mode or not.
  • add an option named `osd_pool_use_gmt_hitset`. if enabled,
    the cluster will try to use GMT mode when creating a new pool
    if all the the up OSDs support GMT mode. if any of the
    pools in the cluster is using GMT mode, then only OSDs
    supporting GMT mode are allowed to join the cluster.

Fixes: #9732
Signed-off-by: Kefu Chai <>
(cherry picked from commit 42f8c5daad16aa849a0b99871d50161673c0c370)

Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.

Revision 87df212c (diff)
Added by Kefu Chai over 6 years ago

mon: add "ceph osd pool set $pool use_gmt_hitset true" cmd

allow "ceph osd pool set $pool use_gmt_hitset <true|1>" as long as
the cluster supports gmt hitset.

Fixes: #9732
Signed-off-by: Kefu Chai <>
(cherry picked from commit 03a1a3cf023a9aeb2fa26820e49e5efe3f3b3789)

History

#1 Updated by Loïc Dachary about 8 years ago

  • Description updated (diff)

#2 Updated by Loïc Dachary about 8 years ago

  • Description updated (diff)

#3 Updated by Samuel Just almost 8 years ago

  • Status changed from New to Resolved

#4 Updated by Loïc Dachary over 7 years ago

  • Regression set to No

Is there a link to read more about how it was resolved ?

#5 Updated by Samuel Just over 7 years ago

I think I blamed this on the timezones, though I do not remember how.

#6 Updated by Samuel Just over 7 years ago

  • Status changed from Resolved to 12

This...is not resolved! The utime_t->hobject_t mapping is timezone dependent. Needs to be not timezone dependent when generating the archive object names.

#7 Updated by Loïc Dachary over 7 years ago

  • Backport set to hammer

#8 Updated by Sage Weil over 7 years ago

  • Assignee set to Kefu Chai

#9 Updated by Kefu Chai over 7 years ago

we can simply change ReplicatedPG::get_hit_set_current_object() and ReplicatedPG::get_hit_set_archive_object(), to let them use the GMT. but will also take care of the migration/upgrade of the hitset objects from local time to GMT time. may I bump the cur_struct_v to 9 for this change?


should bump the ``history'' structure's version # for encoding/decoding instead.

see pg_hit_set_info_t::encode()

#10 Updated by Kefu Chai over 7 years ago

  • Status changed from 12 to Fix Under Review

#11 Updated by Kefu Chai over 7 years ago

  • Status changed from Fix Under Review to In Progress

sam reminded me:

  • we don't want to convert an existing OSD automatically, from localtime to GMT after upgrading to the version with the GMT fix
    • because other osds might come up with the old version
    • might flip a switch on the mon to enable the new behavior
    • or in pg_pool_t use_gmt_hitset = true
    • and setting that changes the min features the mon allows for booting osds
  • need to consider the mixed version case
    1. you'll have to introduce a GMT feature bit
    2. and check the actingset features and either use gmt or local time
    3. local time if any of the replicas do not support gmt
    4. and then once gmt is being used, you still have to be able to trim the old localtime objects
  • need to add this case to upgrade tests (talk to yuriw)
    • should be doing a cache/tiering during upgrade
    • the case is for
      • hammer to later hammer point release tests and also
      • hammer to current master/infernalis
      • if we want to backport this to hammer, we will want to make sure it's tested in the firefly->hammer tests

#12 Updated by Kefu Chai over 7 years ago

another occurrence: /a/sage-2015-08-12_14:04:07-rados-wip-sage-testing---basic-multi/1012342/remote/plana91/log/ceph-osd.3.log.gz

2015-08-12 20:41:57.367610 7f2d908a5700 10 osd.3 pg_epoch: 788 pg[128.6( v 788'4 (0'0,788'4] local-les=784 n=2 ec=783 les/c 784/784 783/783/783) [0,3] r=1 lpr=784 luod=0'0 lua=788'
3 crt=788'1 lcod 788'2 active] add_log_entry 788'4 (0'0) modify   128/6:.ceph-internal/hit_set_128.6_archive_2015-08-12 23:41:54.252808_2015-08-12 23:41:57.366034/head by unknown.0
.0:0 2015-08-12 20:41:57.366417
....
2015-08-12 20:42:08.435416 7f2d930aa700 10 osd.3 pg_epoch: 792 pg[128.6( v 792'16 (0'0,792'16] local-les=790 n=4 ec=783 les/c 790/790 789/789/789) [3,0] r=0 lpr=789 crt=790'14 lcod 791'15 mlcod 791'15 active+clean] get_object_context: no obc for soid 128/6:.ceph-internal/hit_set_128.6_archive_2015-08-12 20:41:54.252808_2015-08-12 20:41:57.366034/head and !can_create

#13 Updated by Kefu Chai over 7 years ago

Kefu Chai wrote:

sam reminded me:

  • we don't want to convert an existing OSD automatically, from localtime to GMT after upgrading to the version with the GMT fix
  • because other osds might come up with the old version
  • might flip a switch on the mon to enable the new behavior
  • or in pg_pool_t use_gmt_hitset = true
  • and setting that changes the min features the mon allows for booting osds

can we check an OSD if it will be serving a pool when it is trying to join the cluster without using the crush rule? in other words, one will need to figure out if this OSD will be involved when writing objects in a pool. is it the approach you are suggesting?

but we can always check the feature bits of current osdmap against that of the new OSD for sure. but that will make GMT a cluster-wide setting, and it will basically prevent the co-existance of new and old OSDs in the same cluster.

  • need to consider the mixed version case
  1. you'll have to introduce a GMT feature bit
  2. and check the actingset features and either use gmt or local time
  3. local time if any of the replicas do not support gmt

after the peering, the actingset might have been changed. so a gmt PG could be turned into a localtime PG if a replica OSD does not support gmt. and what if the primary decides to trim/get a hit-set-archive? if will be using localtime instead of gmt. but previously, this PG was using gmt. will will blow up the OSD again.

so the expected upgrade process is:

  1. assuming we have a pg [0,1,2], where 0,1,2 are all "old" OSDs using localtime
  2. add osd.3 which understands GMT
  3. mark 2 out, backfill to osd.3
  4. pg is now [0,1,3]
  5. repeat the steps, until we have all OSDs replaced with new ones
  6. flip a switch on the mon.
    1. the GMT feature bit is not set by default on existing cluster (but enabled on a fresh cluster). i am not sure how to detect it. as osdmap always figures out the features by looking at its osd, erasure plugins, etc. but we can either add a use_gmt_hitset field in osdmap and bump the osdmap strcut_v. make it false in lower versions. or, we can make it a field in pg_pool_t as you suggested. but it is still a cluster-wide setting.
    2. monitor will set the GMT feature bit if all quorum and all up osds in osdmap support it
    3. monitor will not allow OSD without the GMT feature bit to join the cluster anymore, once the new osdmap with this feature bit is accepted

sam, what do you think?

#14 Updated by Samuel Just over 7 years ago

I think it needs to be cluster wide, and it just means that new and old can't coexist once you have flipped the switch. While the cluster is mixed, you just leave it (either the whole cluster, or all of the pools depending on whether it's a cluster config, or a per-pool config) in localtime mode.

#15 Updated by Kefu Chai over 7 years ago

  • Status changed from In Progress to Fix Under Review

#16 Updated by Kefu Chai over 7 years ago

sam, just want to make sure i understand your suggestion correctly, so the test would be something like:

  1. setup a hammer cluster
  2. set a cache tier over a pool
  3. start exercising rados op on the pool
  4. upgrade all mon
  5. upgrade some of the osds to infernalis
  6. ceph osd set sortbitmap // should fail
  7. ceph set pool set $pool use_gmt_hitset true // should fail
  8. upgrade the remaining osds
  9. ceph osd set sortbitmap // should success
  10. ceph set pool set $pool use_gmt_hitset true // should success

#17 Updated by Kefu Chai over 7 years ago

  • Status changed from Fix Under Review to Pending Backport

#18 Updated by Samuel Just almost 7 years ago

  • Duplicated by Bug #14399: "ReplicatedPG.cc: 10483: FAILED assert(obc)" in rados-hammer-distro-basic-mira added

#19 Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF