Project

General

Profile

Bug #55549

OSDs crashing

Added by Richard Bateman 9 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My apologies if this is the wrong project; I'm so lost on this particular issue that I'm not even sure where to ask for help, but someone on the Rook slack server recommended filing this one here.

I'm running a ceph cluster using Rook on a bare metal kubernetes cluster; it's been working for awhile, but I started having some crashlooping OSDs which set of a chain of mistakes that has so far terminated at a point where a number of my OSDs are hard crashing with a trace that makes no sense to me at all -- I can't figure out enough about it to even know what is causing it. It's happening on several different nodes and 10 different OSDs, as far as I can tell they are basically the same crash but I can post multiple backtraces if that would be helpful.

I've tried to only give you the tail with useful bits, but if more of the log is needed let me know.

There are 12 OSDs which are all doing this -- at this point even if I lose some data I need to get my OSDs usable again, but I have no idea what to do. Any help would be appreciated!

debug   -207> 2022-05-04T15:10:20.508+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[16.12(unlocked)] enter Initial
debug   -206> 2022-05-04T15:10:20.640+0000 7ff9202cf700  4 rocksdb: [db/compaction/compaction_job.cc:1521] [O-2] [JOB 3] Generated table #2034: 29607 keys, 16325971 bytes
debug   -205> 2022-05-04T15:10:20.640+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020647020, "cf_name": "O-2", "job": 3, "event": "table_file_creation", "file_number": 2034, "file_size": 16325971, "file_checksum": "", "file_checksum_func_name": "Unknown", "table_properties": {"data_size": 15589578, "index_size": 661368, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 1, "index_value_is_delta_encoded": 1, "filter_size": 74053, "raw_key_size": 5526086, "raw_average_key_size": 186, "raw_value_size": 13626100, "raw_average_value_size": 460, "num_data_blocks": 3910, "num_entries": 29607, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "O-2", "column_family_id": 9, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1651462542, "oldest_key_time": 0, "file_creation_time": 1651677019, "db_id": "6bf878eb-18ed-41f9-ac09-2a646a68d357", "db_session_id": "GJOWAFWLXV22XUNT4OLS"}}
debug   -204> 2022-05-04T15:10:20.648+0000 7ff9202cf700  4 rocksdb: [db/compaction/compaction_job.cc:1598] [O-2] [JOB 3] Compacted 4@0 + 4@1 files to L1 => 222483995 bytes
debug   -203> 2022-05-04T15:10:20.656+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[16.12( v 110715'727549 (105813'724844,110715'727549] local-lis/les=110622/110623 n=6249 ec=200/200 lis/c=110622/110611 les/c/f=110623/110612/0 sis=110812) [17,19]/[19,13] r=-1 lpr=0 pi=[110611,110812)/1 crt=110715'727549 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.148305 0 0.000000
debug   -202> 2022-05-04T15:10:20.656+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[16.12( v 110715'727549 (105813'724844,110715'727549] local-lis/les=110622/110623 n=6249 ec=200/200 lis/c=110622/110611 les/c/f=110623/110612/0 sis=110812) [17,19]/[19,13] r=-1 lpr=0 pi=[110611,110812)/1 crt=110715'727549 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -201> 2022-05-04T15:10:20.660+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[29.49(unlocked)] enter Initial
debug   -200> 2022-05-04T15:10:20.660+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[29.49s0( empty local-lis/les=0/0 n=0 ec=107727/107727 lis/c=90055/88155 les/c/f=90056/88156/0 sis=110813) [NONE,NONE,38]p38(2) r=-1 lpr=0 pi=[88155,110813)/5 crt=0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.000231 0 0.000000
debug   -199> 2022-05-04T15:10:20.660+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[29.49s0( empty local-lis/les=0/0 n=0 ec=107727/107727 lis/c=90055/88155 les/c/f=90056/88156/0 sis=110813) [NONE,NONE,38]p38(2) r=-1 lpr=0 pi=[88155,110813)/5 crt=0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -198> 2022-05-04T15:10:20.660+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[20.13(unlocked)] enter Initial
debug   -197> 2022-05-04T15:10:20.692+0000 7ff9202cf700  4 rocksdb: (Original Log Time 2022/05/04-15:10:20.697055) [db/compaction/compaction_job.cc:830] [O-2] compacted to: files[0 4 0 0 0 0 0] max score 0.83, MB/sec: 16.3 rd, 16.3 wr, level 1, files in(4, 4) out(4) MB in(0.0, 212.2) out(212.2), read-write-amplify(39462.5) write-amplify(19730.8) OK, records in: 399262, records dropped: 100 output_compression: NoCompression

debug   -196> 2022-05-04T15:10:20.692+0000 7ff9202cf700  4 rocksdb: (Original Log Time 2022/05/04-15:10:20.697097) EVENT_LOG_v1 {"time_micros": 1651677020697079, "job": 3, "event": "compaction_finished", "compaction_time_micros": 13658240, "compaction_time_cpu_micros": 1435828, "output_level": 1, "num_output_files": 4, "total_output_size": 222483995, "num_input_records": 399262, "num_output_records": 399162, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 4, 0, 0, 0, 0, 0]}
debug   -195> 2022-05-04T15:10:20.732+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020738782, "job": 3, "event": "table_file_deletion", "file_number": 2025}
debug   -194> 2022-05-04T15:10:20.740+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020747059, "job": 3, "event": "table_file_deletion", "file_number": 2010}
debug   -193> 2022-05-04T15:10:20.752+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020755389, "job": 3, "event": "table_file_deletion", "file_number": 1995}
debug   -192> 2022-05-04T15:10:20.768+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020772100, "job": 3, "event": "table_file_deletion", "file_number": 1983}
debug   -191> 2022-05-04T15:10:20.776+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020780594, "job": 3, "event": "table_file_deletion", "file_number": 1977}
debug   -190> 2022-05-04T15:10:20.784+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020789468, "job": 3, "event": "table_file_deletion", "file_number": 1976}
debug   -189> 2022-05-04T15:10:20.788+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[20.13s1( v 100738'24297 (61106'19522,100738'24297] local-lis/les=110538/110539 n=14369 ec=561/561 lis/c=110538/110538 les/c/f=110539/110539/0 sis=110812) [NONE,NONE,42]p42(2) r=-1 lpr=0 pi=[110538,110812)/1 crt=100738'24297 lcod 0'0 mlcod 0'0 unknown mbc={} ps=228] exit Initial 0.130346 0 0.000000
debug   -188> 2022-05-04T15:10:20.788+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[20.13s1( v 100738'24297 (61106'19522,100738'24297] local-lis/les=110538/110539 n=14369 ec=561/561 lis/c=110538/110538 les/c/f=110539/110539/0 sis=110812) [NONE,NONE,42]p42(2) r=-1 lpr=0 pi=[110538,110812)/1 crt=100738'24297 lcod 0'0 mlcod 0'0 unknown mbc={} ps=228] enter Reset
debug   -187> 2022-05-04T15:10:20.792+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020797788, "job": 3, "event": "table_file_deletion", "file_number": 1975}
debug   -186> 2022-05-04T15:10:20.800+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.d(unlocked)] enter Initial
debug   -185> 2022-05-04T15:10:20.844+0000 7ff9202cf700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1651677020847207, "job": 3, "event": "table_file_deletion", "file_number": 1974}
debug   -184> 2022-05-04T15:10:20.864+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.d( v 108777'17979 (59522'15057,108777'17979] local-lis/les=110484/110485 n=4268 ec=196/196 lis/c=110484/110484 les/c/f=110485/110485/0 sis=110813) [41] r=-1 lpr=0 pi=[110484,110813)/1 crt=108777'17979 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.066434 0 0.000000
debug   -183> 2022-05-04T15:10:20.868+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.d( v 108777'17979 (59522'15057,108777'17979] local-lis/les=110484/110485 n=4268 ec=196/196 lis/c=110484/110484 les/c/f=110485/110485/0 sis=110813) [41] r=-1 lpr=0 pi=[110484,110813)/1 crt=108777'17979 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -182> 2022-05-04T15:10:20.896+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.2(unlocked)] enter Initial
debug   -181> 2022-05-04T15:10:20.944+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.2( v 102008'17665 (59522'13731,102008'17665] local-lis/les=110485/110486 n=4185 ec=196/196 lis/c=110485/110485 les/c/f=110486/110486/0 sis=110812) [0,41] r=-1 lpr=0 pi=[110485,110812)/1 crt=102008'17665 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.047588 0 0.000000
debug   -180> 2022-05-04T15:10:20.944+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.2( v 102008'17665 (59522'13731,102008'17665] local-lis/les=110485/110486 n=4185 ec=196/196 lis/c=110485/110485 les/c/f=110486/110486/0 sis=110812) [0,41] r=-1 lpr=0 pi=[110485,110812)/1 crt=102008'17665 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -179> 2022-05-04T15:10:20.944+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.1b(unlocked)] enter Initial
debug   -178> 2022-05-04T15:10:21.020+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.1b( v 109268'17293 (59522'13187,109268'17293] local-lis/les=110485/110486 n=4085 ec=196/196 lis/c=110485/110485 les/c/f=110486/110486/0 sis=110829) [19] r=-1 lpr=0 pi=[110485,110829)/1 crt=109268'17293 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.077105 0 0.000000
debug   -177> 2022-05-04T15:10:21.020+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.1b( v 109268'17293 (59522'13187,109268'17293] local-lis/les=110485/110486 n=4085 ec=196/196 lis/c=110485/110485 les/c/f=110486/110486/0 sis=110829) [19] r=-1 lpr=0 pi=[110485,110829)/1 crt=109268'17293 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -176> 2022-05-04T15:10:21.020+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.17(unlocked)] enter Initial
debug   -175> 2022-05-04T15:10:21.100+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.17( v 108625'17870 (59522'13628,108625'17870] local-lis/les=110537/110538 n=4216 ec=196/196 lis/c=110537/110537 les/c/f=110538/110538/0 sis=110812) [41] r=-1 lpr=0 pi=[110537,110812)/1 crt=108625'17870 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.078181 0 0.000000
debug   -174> 2022-05-04T15:10:21.100+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.17( v 108625'17870 (59522'13628,108625'17870] local-lis/les=110537/110538 n=4216 ec=196/196 lis/c=110537/110537 les/c/f=110538/110538/0 sis=110812) [41] r=-1 lpr=0 pi=[110537,110812)/1 crt=108625'17870 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -173> 2022-05-04T15:10:21.100+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[18.f(unlocked)] enter Initial
debug   -172> 2022-05-04T15:10:21.128+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[18.f( v 107633'51751 (81034'51083,107633'51751] local-lis/les=110485/110486 n=15061 ec=63589/233 lis/c=110485/110485 les/c/f=110486/110487/0 sis=110812) [56,42] r=-1 lpr=0 pi=[110485,110812)/1 crt=107633'51751 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.026631 0 0.000000
debug   -171> 2022-05-04T15:10:21.128+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[18.f( v 107633'51751 (81034'51083,107633'51751] local-lis/les=110485/110486 n=15061 ec=63589/233 lis/c=110485/110485 les/c/f=110486/110487/0 sis=110812) [56,42] r=-1 lpr=0 pi=[110485,110812)/1 crt=107633'51751 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -170> 2022-05-04T15:10:21.136+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.16(unlocked)] enter Initial
debug   -169> 2022-05-04T15:10:21.204+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.16( v 110039'17941 (59522'13725,110039'17941] local-lis/les=110537/110538 n=4258 ec=196/196 lis/c=110537/110537 les/c/f=110538/110538/0 sis=110812) [41] r=-1 lpr=0 pi=[110537,110812)/1 crt=110039'17941 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.067675 0 0.000000
debug   -168> 2022-05-04T15:10:21.204+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.16( v 110039'17941 (59522'13725,110039'17941] local-lis/les=110537/110538 n=4258 ec=196/196 lis/c=110537/110537 les/c/f=110538/110538/0 sis=110812) [41] r=-1 lpr=0 pi=[110537,110812)/1 crt=110039'17941 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -167> 2022-05-04T15:10:21.244+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.0(unlocked)] enter Initial
debug   -166> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.0( v 108537'17594 (60138'15460,108537'17594] local-lis/les=110485/110486 n=4110 ec=196/196 lis/c=110485/110485 les/c/f=110486/110487/0 sis=110812) [0,43] r=-1 lpr=0 pi=[110485,110812)/1 crt=108537'17594 lcod 0'0 mlcod 0'0 unknown mbc={}] exit Initial 0.046097 0 0.000000
debug   -165> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 osd.44 pg_epoch: 110830 pg[15.0( v 108537'17594 (60138'15460,108537'17594] local-lis/les=110485/110486 n=4110 ec=196/196 lis/c=110485/110485 les/c/f=110486/110487/0 sis=110812) [0,43] r=-1 lpr=0 pi=[110485,110812)/1 crt=108537'17594 lcod 0'0 mlcod 0'0 unknown mbc={}] enter Reset
debug   -164> 2022-05-04T15:10:21.288+0000 7ff931f393c0  0 osd.44 110830 load_pgs opened 67 pgs
debug   -163> 2022-05-04T15:10:21.288+0000 7ff931f393c0  2 osd.44 110830 superblock: I am osd.44
debug   -162> 2022-05-04T15:10:21.288+0000 7ff931f393c0 10 monclient: init
debug   -161> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding auth protocol: cephx
debug   -160> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding auth protocol: cephx
debug   -159> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding auth protocol: cephx
debug   -158> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding auth protocol: none
debug   -157> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -156> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -155> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -154> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -153> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -152> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -151> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -150> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -149> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -148> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -147> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: crc
debug   -146> 2022-05-04T15:10:21.288+0000 7ff931f393c0  5 AuthRegistry(0x7ffc854d5eb0) adding con mode: secure
debug   -145> 2022-05-04T15:10:21.288+0000 7ff931f393c0  2 auth: KeyRing::load: loaded key file /var/lib/ceph/osd/ceph-44/keyring
debug   -144> 2022-05-04T15:10:21.288+0000 7ff931f393c0  2 auth: KeyRing::load: loaded key file /var/lib/ceph/osd/ceph-44/keyring
debug   -143> 2022-05-04T15:10:21.288+0000 7ff931f393c0 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility:  prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
debug   -142> 2022-05-04T15:10:21.288+0000 7ff931f393c0 -1 osd.44 110830 log_to_monitors true
debug   -141> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient: _reopen_session rank -1
debug   -140> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient: _add_conns ranks=[3,0,1,4,2]
debug   -139> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): picked mon.noname-d con 0x56466cf2e800 addr [v2:172.21.32.26:3300/0,v1:172.21.32.26:6789/0]
debug   -138> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): picked mon.noname-a con 0x5646540d5800 addr [v2:172.21.32.19:3300/0,v1:172.21.32.19:6789/0]
debug   -137> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): picked mon.noname-c con 0x5646540d5000 addr [v2:172.21.32.21:3300/0,v1:172.21.32.21:6789/0]
debug   -136> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): start opening mon connection
debug   -135> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): start opening mon connection
debug   -134> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): start opening mon connection
debug   -133> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient(hunting): _renew_subs
debug   -132> 2022-05-04T15:10:21.292+0000 7ff90b898700  5 osd.44 110830 heartbeat osd_stat(store_statfs(0x35aa93fc000/0x0/0x3a3817d6000, data 0x4892e2f04b/0x48ab898000, compress 0x0/0x0/0x0, omap 0x9d50e, meta 0x2caa2af2), peers [] op hist [])
debug   -131> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): get_auth_request con 0x5646540d5800 auth_method 0
debug   -130> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): get_auth_request method 2 preferred_modes [2,1]
debug   -129> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): _init_auth method 2
debug   -128> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): _init_auth creating new auth
debug   -127> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): get_auth_request con 0x5646540d5000 auth_method 0
debug   -126> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): get_auth_request method 2 preferred_modes [2,1]
debug   -125> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): _init_auth method 2
debug   -124> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): _init_auth creating new auth
debug   -123> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): get_auth_request con 0x56466cf2e800 auth_method 0
debug   -122> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): get_auth_request method 2 preferred_modes [2,1]
debug   -121> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): _init_auth method 2
debug   -120> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): _init_auth creating new auth
debug   -119> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): handle_auth_reply_more payload 9
debug   -118> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): handle_auth_reply_more payload_len 9
debug   -117> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
debug   -116> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): handle_auth_reply_more payload 9
debug   -115> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): handle_auth_reply_more payload_len 9
debug   -114> 2022-05-04T15:10:21.292+0000 7ff92c145700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
debug   -113> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): handle_auth_reply_more payload 9
debug   -112> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): handle_auth_reply_more payload_len 9
debug   -111> 2022-05-04T15:10:21.292+0000 7ff92b944700 10 monclient(hunting): handle_auth_reply_more responding with 36 bytes
debug   -110> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient(hunting): handle_auth_done global_id 34650406 payload 840
debug   -109> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient: _finish_hunting 0
debug   -108> 2022-05-04T15:10:21.292+0000 7ff92b143700  1 monclient: found mon.noname-a
debug   -107> 2022-05-04T15:10:21.292+0000 7ff92b143700 10 monclient: _send_mon_message to mon.noname-a at v2:172.21.32.19:3300/0
debug   -106> 2022-05-04T15:10:21.292+0000 7ff91e8be700  5 prioritycache tune_memory target: 119141256396 mapped: 490504192 unmapped: 394502144 heap: 885006336 old mem: 100464761567 new mem: 100464761567
debug   -105> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: handle_monmap mon_map magic: 0 v1
debug   -104> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient:  got monmap 54 from mon.noname-a (according to old e54)
debug   -103> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: dump:
epoch 54
fsid 0e85e9cf-632e-4737-9f94-6c0429444fe2
last_changed 2022-05-02T16:37:28.174070+0000
created 2021-09-06T19:28:27.939795+0000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:172.21.32.27:3300/0,v1:172.21.32.27:6789/0] mon.y
1: [v2:172.21.32.19:3300/0,v1:172.21.32.19:6789/0] mon.ab
2: [v2:172.21.32.22:3300/0,v1:172.21.32.22:6789/0] mon.ac
3: [v2:172.21.32.21:3300/0,v1:172.21.32.21:6789/0] mon.ad
4: [v2:172.21.32.26:3300/0,v1:172.21.32.26:6789/0] mon.af

debug   -102> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: _finish_auth 0
debug   -101> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-05-04T15:09:51.298655+0000)
debug   -100> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug    -99> 2022-05-04T15:10:21.292+0000 7ff931f393c0  5 monclient: authenticate success, global_id 34650406
debug    -98> 2022-05-04T15:10:21.292+0000 7ff931f393c0 10 monclient: wait_auth_rotating waiting for 30
debug    -97> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: handle_config config(14 keys) v1
debug    -96> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: handle_monmap mon_map magic: 0 v1
debug    -95> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient:  got monmap 54 from mon.ab (according to old e54)
debug    -94> 2022-05-04T15:10:21.292+0000 7ff9190b3700 10 monclient: dump:
epoch 54
fsid 0e85e9cf-632e-4737-9f94-6c0429444fe2
last_changed 2022-05-02T16:37:28.174070+0000
created 2021-09-06T19:28:27.939795+0000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:172.21.32.27:3300/0,v1:172.21.32.27:6789/0] mon.y
1: [v2:172.21.32.19:3300/0,v1:172.21.32.19:6789/0] mon.ab
2: [v2:172.21.32.22:3300/0,v1:172.21.32.22:6789/0] mon.ac
3: [v2:172.21.32.21:3300/0,v1:172.21.32.21:6789/0] mon.ad
4: [v2:172.21.32.26:3300/0,v1:172.21.32.26:6789/0] mon.af

debug    -93> 2022-05-04T15:10:21.292+0000 7ff92a141700  4 set_mon_vals no callback set
debug    -92> 2022-05-04T15:10:21.296+0000 7ff9190b3700 10 monclient: _finish_auth 0
debug    -91> 2022-05-04T15:10:21.296+0000 7ff9190b3700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-04T15:09:51.299188+0000)
debug    -90> 2022-05-04T15:10:21.296+0000 7ff931f393c0 10 monclient: wait_auth_rotating done
debug    -89> 2022-05-04T15:10:21.296+0000 7ff931f393c0 10 start_mon_command cmd=[{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["44"]}]
debug    -88> 2022-05-04T15:10:21.296+0000 7ff931f393c0 10 monclient: _send_command 1 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["44"]}]
debug    -87> 2022-05-04T15:10:21.296+0000 7ff931f393c0 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug    -86> 2022-05-04T15:10:22.032+0000 7ff9190b3700 10 monclient: handle_mon_command_ack 1 [{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["44"]}]
debug    -85> 2022-05-04T15:10:22.032+0000 7ff9190b3700 10 monclient: _finish_command 1 = system:0 osd.44 already set to class hdd. set-device-class item id 44 name 'osd.44' device_class 'hdd': no change. set osd(s)  to class 'hdd'
debug    -84> 2022-05-04T15:10:22.032+0000 7ff931f393c0 10 start_mon_command cmd=[{"prefix": "osd crush create-or-move", "id": 44, "weight":3.6387, "args": ["host=nebrask", "root=default"]}]
debug    -83> 2022-05-04T15:10:22.032+0000 7ff931f393c0 10 monclient: _send_command 2 [{"prefix": "osd crush create-or-move", "id": 44, "weight":3.6387, "args": ["host=nebrask", "root=default"]}]
debug    -82> 2022-05-04T15:10:22.032+0000 7ff931f393c0 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug    -81> 2022-05-04T15:10:22.036+0000 7ff9190b3700 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd crush create-or-move", "id": 44, "weight":3.6387, "args": ["host=nebrask", "root=default"]}]
debug    -80> 2022-05-04T15:10:22.036+0000 7ff9190b3700 10 monclient: _finish_command 2 = system:0 create-or-move updated item name 'osd.44' weight 3.6387 at location {host=nebrask,root=default} to crush map
debug    -79> 2022-05-04T15:10:22.036+0000 7ff90d89c700 -1 log_channel(cluster) log [ERR] : 29.29s0 past_intervals [107732,110810) start interval does not contain the required bound [89131,110810) start
debug    -78> 2022-05-04T15:10:22.036+0000 7ff90d89c700 -1 osd.44 pg_epoch: 110811 pg[29.29s0( empty local-lis/les=0/0 n=0 ec=107732/107732 lis/c=90055/88146 les/c/f=90056/88147/0 sis=110810) [44,41,1]p44(0) r=0 lpr=110810 pi=[107732,110810)/4 crt=0'0 mlcod 0'0 unknown mbc={}] 29.29s0 past_intervals [107732,110810) start interval does not contain the required bound [89131,110810) start
debug    -77> 2022-05-04T15:10:22.036+0000 7ff931f393c0  0 osd.44 110830 done with init, starting boot process
debug    -76> 2022-05-04T15:10:22.036+0000 7ff931f393c0 10 monclient: _renew_subs
debug    -75> 2022-05-04T15:10:22.036+0000 7ff931f393c0 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug    -74> 2022-05-04T15:10:22.036+0000 7ff931f393c0  1 osd.44 110830 start_boot
debug    -73> 2022-05-04T15:10:22.036+0000 7ff931f393c0 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug    -72> 2022-05-04T15:10:22.036+0000 7ff931f393c0  1 osd.44 110830 maybe_override_max_osd_capacity_for_qos default_iops: 315.00 cur_iops: 242.26. Skip OSD benchmark test.
debug    -71> 2022-05-04T15:10:22.036+0000 7ff931f393c0  1 osd.44 110830 maybe_override_options_for_qos: Changing recovery/backfill/sleep settings for QoS
debug    -70> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command status hook 0x56463f245450
debug    -69> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command flush_journal hook 0x56463f245450
debug    -68> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_ops_in_flight hook 0x56463f245450
debug    -67> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command ops hook 0x56463f245450
debug    -66> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_blocked_ops hook 0x56463f245450
debug    -65> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_historic_ops hook 0x56463f245450
debug    -64> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_historic_slow_ops hook 0x56463f245450
debug    -63> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_historic_ops_by_duration hook 0x56463f245450
debug    -62> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_op_pq_state hook 0x56463f245450
debug    -61> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_blocklist hook 0x56463f245450
debug    -60> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_watchers hook 0x56463f245450
debug    -59> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_recovery_reservations hook 0x56463f245450
debug    -58> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_scrub_reservations hook 0x56463f245450
debug    -57> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command get_latest_osdmap hook 0x56463f245450
debug    -56> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command set_heap_property hook 0x56463f245450
debug    -55> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command get_heap_property hook 0x56463f245450
debug    -54> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_objectstore_kv_stats hook 0x56463f245450
debug    -53> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_scrubs hook 0x56463f245450
debug    -52> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command calc_objectstore_db_histogram hook 0x56463f245450
debug    -51> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command flush_store_cache hook 0x56463f245450
debug    -50> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_pgstate_history hook 0x56463f245450
debug    -49> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command compact hook 0x56463f245450
debug    -48> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command get_mapped_pools hook 0x56463f245450
debug    -47> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command smart hook 0x56463f245450
debug    -46> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command list_devices hook 0x56463f245450
debug    -45> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command send_beacon hook 0x56463f245450
debug    -44> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_osd_network hook 0x56463f245450
debug    -43> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_pool_statfs hook 0x56463f245450
debug    -42> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command setomapval hook 0x56466112cd00
debug    -41> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command rmomapkey hook 0x56466112cd00
debug    -40> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command setomapheader hook 0x56466112cd00
debug    -39> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command getomap hook 0x56466112cd00
debug    -38> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command truncobj hook 0x56466112cd00
debug    -37> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command injectdataerr hook 0x56466112cd00
debug    -36> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command injectmdataerr hook 0x56466112cd00
debug    -35> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command set_recovery_delay hook 0x56466112cd00
debug    -34> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command injectfull hook 0x56466112cd00
debug    -33> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command bench hook 0x56463f245450
debug    -32> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command cluster_log hook 0x56463f245450
debug    -31> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command flush_pg_stats hook 0x56463f245450
debug    -30> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command heap hook 0x56463f245450
debug    -29> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command debug dump_missing hook 0x56463f245450
debug    -28> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command debug kick_recovery_wq hook 0x56463f245450
debug    -27> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command cpu_profiler hook 0x56463f245450
debug    -26> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command dump_pg_recovery_stats hook 0x56463f245450
debug    -25> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command reset_pg_recovery_stats hook 0x56463f245450
debug    -24> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command cache drop hook 0x56463f245450
debug    -23> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command cache status hook 0x56463f245450
debug    -22> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command scrub_purged_snaps hook 0x56463f245450
debug    -21> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command scrubdebug hook 0x56463f245450
debug    -20> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command pg hook 0x56463f245450
debug    -19> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command pg hook 0x56463f245450
debug    -18> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command pg hook 0x56463f245450
debug    -17> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command pg hook 0x56463f245450
debug    -16> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command pg hook 0x56463f245450
debug    -15> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command query hook 0x56463f245450
debug    -14> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command mark_unfound_lost hook 0x56463f245450
debug    -13> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command list_unfound hook 0x56463f245450
debug    -12> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command scrub hook 0x56463f245450
debug    -11> 2022-05-04T15:10:22.036+0000 7ff931f393c0  5 asok(0x56463f532000) register_command deep_scrub hook 0x56463f245450
debug    -10> 2022-05-04T15:10:22.040+0000 7ff9190b3700  4 mgrc handle_mgr_map Got map version 7237
debug     -9> 2022-05-04T15:10:22.040+0000 7ff9190b3700  4 mgrc handle_mgr_map Active mgr is now [v2:172.21.32.21:6800/14,v1:172.21.32.21:6801/14]
debug     -8> 2022-05-04T15:10:22.040+0000 7ff9190b3700  4 mgrc reconnect Starting new session with [v2:172.21.32.21:6800/14,v1:172.21.32.21:6801/14]
debug     -7> 2022-05-04T15:10:22.040+0000 7ff92b944700 10 monclient: get_auth_request con 0x56466d1aec00 auth_method 0
debug     -6> 2022-05-04T15:10:22.048+0000 7ff90d89c700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0/rpm/el8/BUILD/ceph-17.2.0/src/osd/PeeringState.cc: In function 'void PeeringState::check_past_interval_bounds() const' thread 7ff90d89c700 time 2022-05-04T15:10:22.041486+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0/rpm/el8/BUILD/ceph-17.2.0/src/osd/PeeringState.cc: 968: ceph_abort_msg("past_interval start interval mismatch")

 ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x56463c23fb99]
 2: (PeeringState::check_past_interval_bounds() const+0x6d5) [0x56463c6357c5]
 3: (PeeringState::Reset::react(PeeringState::AdvMap const&)+0x3ec) [0x56463c64ecdc]
 4: (boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1f8) [0x56463c690ef8]
 5: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x53) [0x56463c4ed793]
 6: (PeeringState::advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1b8) [0x56463c630e68]
 7: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1e6) [0x56463c402da6]
 8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x322) [0x56463c3753f2]
 9: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xd4) [0x56463c377314]
 10: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x56463c60e526]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xaf8) [0x56463c3690d8]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x56463ca6e9c4]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x56463ca6fd64]
 14: /lib64/libpthread.so.0(+0x81cf) [0x7ff9301341cf]
 15: clone()

debug     -5> 2022-05-04T15:10:22.052+0000 7ff9190b3700 10 monclient: handle_get_version_reply finishing 1 version 116584
debug     -4> 2022-05-04T15:10:22.052+0000 7ff92893e700  5 osd.44 110830 heartbeat osd_stat(store_statfs(0x35aa93fc000/0x0/0x3a3817d6000, data 0x4892e2f04b/0x48ab898000, compress 0x0/0x0/0x0, omap 0x9d50e, meta 0x2caa2af2), peers [] op hist [])
debug     -3> 2022-05-04T15:10:22.056+0000 7ff92893e700 10 monclient: _renew_subs
debug     -2> 2022-05-04T15:10:22.056+0000 7ff92893e700 10 monclient: _send_mon_message to mon.ab at v2:172.21.32.19:3300/0
debug     -1> 2022-05-04T15:10:22.056+0000 7ff9190b3700  3 osd.44 110830 handle_osd_map epochs [110831,110870], i have 110830, src has [89131,116584]
debug      0> 2022-05-04T15:10:22.060+0000 7ff90d89c700 -1 *** Caught signal (Aborted) **
 in thread 7ff90d89c700 thread_name:tp_osd_tp

 ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7ff93013ece0]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b8) [0x56463c23fc6c]
 5: (PeeringState::check_past_interval_bounds() const+0x6d5) [0x56463c6357c5]
 6: (PeeringState::Reset::react(PeeringState::AdvMap const&)+0x3ec) [0x56463c64ecdc]
 7: (boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1f8) [0x56463c690ef8]
 8: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x53) [0x56463c4ed793]
 9: (PeeringState::advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1b8) [0x56463c630e68]
 10: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1e6) [0x56463c402da6]
 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x322) [0x56463c3753f2]
 12: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xd4) [0x56463c377314]
 13: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x56463c60e526]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xaf8) [0x56463c3690d8]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x56463ca6e9c4]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x56463ca6fd64]
 17: /lib64/libpthread.so.0(+0x81cf) [0x7ff9301341cf]
 18: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_cleaner
   0/ 5 seastore_lba
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 alienstore
   1/ 5 mclock
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7ff90b898700 / osd_srv_heartbt
  7ff90d89c700 / tp_osd_tp
  7ff9190b3700 / ms_dispatch
  7ff9198b4700 / ceph-osd
  7ff91a8b6700 / bstore_kv_sync
  7ff91e8be700 / bstore_mempool
  7ff9202cf700 / rocksdb:low0
  7ff92893e700 / io_context_pool
  7ff92a141700 / io_context_pool
  7ff92a942700 / admin_socket
  7ff92b143700 / msgr-worker-2
  7ff92b944700 / msgr-worker-1
  7ff92c145700 / msgr-worker-0
  7ff931f393c0 / ceph-osd
  max_recent     10000
  max_new        10000
  log_file /var/lib/ceph/crash/2022-05-04T15:10:22.061653Z_99627a0b-da34-417b-8547-d49f68e7e51e/log
--- end dump of recent events ---

Related issues

Duplicates RADOS - Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start Fix Under Review

History

#1 Updated by Richard Bateman 9 months ago

After days of fighting this (it's on a production cluster) I finally gave up on the least important of the pools -- the one holding backups which I have most of elsewhere. After doing that some of the OSDs came back up, but there are still 7 which are crashlooping -- each complaining about one of the PGs which was on the deleted pool. Is there any kind of trick to tell it "ignore pg 29.63s2" so that it won't crash on startup? if it got far enough to look it would realize that the PG is no longer important and could get back to handling what does matter to it, but since it won't even start up I'm SOL on doing anything useful with anything that uses those OSDs.

#2 Updated by Laura Flores 9 months ago

  • Related to Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start added

#3 Updated by Radoslaw Zarzynski 9 months ago

  • Related to deleted (Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start)

#4 Updated by Radoslaw Zarzynski 9 months ago

  • Duplicates Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start added

Also available in: Atom PDF