Project

General

Profile

Bug #55141

thrashers/fastread: assertion failure: rollback_info_trimmed_to == head

Added by Radoslaw Zarzynski almost 2 years ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the /home/teuthworker/archive/yuriw-2022-03-29_21:35:32-rados-wip-yuri5-testing-2022-03-29-1152-quincy-distro-default-smithi/6767850/teuthology.log:

2022-03-30T01:34:59.499 INFO:tasks.ceph.osd.2.smithi080.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-125-g9053ed98/rpm/el8/BUILD/ceph-17.1.0-125-g9053ed98/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::claim_log_and_clear_rollback_info(const pg_log_t&)' thread 7fa658b53700 time 2022-03-30T01:34:59.459273+0000
2022-03-30T01:34:59.499 INFO:tasks.ceph.osd.2.smithi080.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-125-g9053ed98/rpm/el8/BUILD/ceph-17.1.0-125-g9053ed98/src/osd/PGLog.h: 286: FAILED ceph_assert(rollback_info_trimmed_to == head)
2022-03-30T01:34:59.499 INFO:tasks.ceph.osd.2.smithi080.stderr: ceph version 17.1.0-125-g9053ed98 (9053ed984698b7140d91d3195fcba61aa554fe69) quincy (stable)
2022-03-30T01:34:59.499 INFO:tasks.ceph.osd.2.smithi080.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55d6569c2464]
2022-03-30T01:34:59.500 INFO:tasks.ceph.osd.2.smithi080.stderr: 2: ceph-osd(+0x5d7685) [0x55d6569c2685]
2022-03-30T01:34:59.500 INFO:tasks.ceph.osd.2.smithi080.stderr: 3: (PeeringState::Stray::react(MLogRec const&)+0x3d0) [0x55d656de5390]
2022-03-30T01:34:59.500 INFO:tasks.ceph.osd.2.smithi080.stderr: 4: (boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x280) [0x55d656e1a6c0]
2022-03-30T01:34:59.500 INFO:tasks.ceph.osd.2.smithi080.stderr: 5: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x74) [0x55d656b90c54]
2022-03-30T01:34:59.500 INFO:tasks.ceph.osd.2.smithi080.stderr: 6: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)+0x2d6) [0x55d656b84ea6]
2022-03-30T01:34:59.501 INFO:tasks.ceph.osd.2.smithi080.stderr: 7: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x175) [0x55d656afa3c5]
2022-03-30T01:34:59.501 INFO:tasks.ceph.osd.2.smithi080.stderr: 8: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x55d656d915b6]
2022-03-30T01:34:59.501 INFO:tasks.ceph.osd.2.smithi080.stderr: 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xaf8) [0x55d656aec0e8]
2022-03-30T01:34:59.501 INFO:tasks.ceph.osd.2.smithi080.stderr: 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55d6571f1a64]
2022-03-30T01:34:59.502 INFO:tasks.ceph.osd.2.smithi080.stderr: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55d6571f2e04]
2022-03-30T01:34:59.502 INFO:tasks.ceph.osd.2.smithi080.stderr: 12: /lib64/libpthread.so.0(+0x817f) [0x7fa684d7017f]
2022-03-30T01:34:59.502 INFO:tasks.ceph.osd.2.smithi080.stderr: 13: clone()
2022-03-30T01:34:59.502 INFO:tasks.ceph.osd.2.smithi080.stderr:*** Caught signal (Aborted) **

Related issues

Related to RADOS - Bug #60084: crash: void std::list<pg_log_entry_t, mempool::pool_allocator<(mempool::pool_index_t), pg_log_entry_t> >::_M_insert<pg_log_entry_t const&>(std::_List_iterator<pg_log_entry_t>, pg_log_entry_t const&) New
Duplicated by RADOS - Bug #57913: Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 unique_pool_2 --yes-i-really-really-mean-it Duplicate

History

#1 Updated by Laura Flores almost 2 years ago

Leading up to the ceph_assert failure in osd.2:

/a/yuriw-2022-03-29_21:35:32-rados-wip-yuri5-testing-2022-03-29-1152-quincy-distro-default-smithi/6767850/remote/smithi080/log/ceph-osd.2.log.gz

2022-03-30T01:34:59.458+0000 7fa65e35e700 10 osd.2 pg_epoch: 844 pg[6.13s1( v 824'2298 lc 771'314 (0'0,824'2298] local-lis/les=840/841 n=1265 ec=782/768 lis/c=809/782 les/c/f=810/783/0 sis=840) [3,2,0]/[NONE,2,0]p2(1) async=[3(0)] r=1 lpr=840 pi=[782,840)/2 crt=824'2298 lcod 0'0 mlcod 0'0 activating+undersized+degraded+remapped m=898 mbc={0={(0+0)=838,(0+1)=350,(0+2)=10,(1+0)=1},1={(0+1)=898,(1+0)=177,(1+1)=124},2={(0+1)=897,(1+0)=177,(1+1)=125}}] search_for_missing 6:c87ee54b:::benchmark_data_smithi080_195491_object1379:head 770'88 is on osd.7(0)
2022-03-30T01:34:59.459+0000 7fa65e35e700 10 osd.2 pg_epoch: 844 pg[6.13s1( v 824'2298 lc 771'314 (0'0,824'2298] local-lis/les=840/841 n=1265 ec=782/768 lis/c=809/782 les/c/f=810/783/0 sis=840) [3,2,0]/[NONE,2,0]p2(1) async=[3(0)] r=1 lpr=840 pi=[782,840)/2 crt=824'2298 lcod 0'0 mlcod 0'0 activating+undersized+degraded+remapped m=898 mbc={0={(0+0)=838,(0+1)=349,(0+2)=11,(1+0)=1},1={(0+1)=898,(1+0)=177,(1+1)=124},2={(0+1)=897,(1+0)=177,(1+1)=125}}] search_for_missing 6:c883b7b1:::benchmark_data_smithi080_195491_object38312:head 804'2230 also missing on osd.7(0) (last_update 783'2113 < needed 804'2230)
2022-03-30T01:34:59.459+0000 7fa65e35e700 10 osd.2 pg_epoch: 844 pg[6.13s1( v 824'2298 lc 771'314 (0'0,824'2298] local-lis/les=840/841 n=1265 ec=782/768 lis/c=809/782 les/c/f=810/783/0 sis=840) [3,2,0]/[NONE,2,0]p2(1) async=[3(0)] r=1 lpr=840 pi=[782,840)/2 crt=824'2298 lcod 0'0 mlcod 0'0 activating+undersized+degraded+remapped m=898 mbc={0={(0+0)=838,(0+1)=349,(0+2)=11,(1+0)=1},1={(0+1)=898,(1+0)=177,(1+1)=124},2={(0+1)=897,(1+0)=177,(1+1)=125}}] search_for_missing 6:c88b487d:::benchmark_data_smithi080_195491_object187:head 770'12 is on osd.7(0)
2022-03-30T01:34:59.462+0000 7fa680793700  1 -- [v2:172.21.15.80:6826/205141,v1:172.21.15.80:6827/205141] <== osd.0 v2:172.21.15.80:6818/33883 46 ==== PGlog(6.0s2 log log((792'2126,828'2271], crt=828'2271) pi ([0,0] all_participants= intervals=) pg_lease(ru 0.000000000s ub 1576.709106445s int 16.000000000s) e845/844) v6 ==== 51420+0+0 (crc 0 0 0) 0x55d65e75a000 con 0x55d65dc4e000
2022-03-30T01:34:59.462+0000 7fa680793700 15 osd.2 845 enqueue_peering_evt 6.0s2 epoch_sent: 845 epoch_requested: 844 MLogRec from 0(0) log log((792'2126,828'2271], crt=828'2271) pi ([0,0] all_participants= intervals=) pg_lease(ru 0.000000000s ub 1576.709106445s int 16.000000000s) +create_info
2022-03-30T01:34:59.462+0000 7fa680793700 20 osd.2 op_wq(0) _enqueue OpSchedulerItem(6.0s2 PGPeeringEvent(epoch_sent: 845 epoch_requested: 844 MLogRec from 0(0) log log((792'2126,828'2271], crt=828'2271) pi ([0,0] all_participants= intervals=) pg_lease(ru 0.000000000s ub 1576.709106445s int 16.000000000s) +create_info) prio 255 cost 10 e845)
2022-03-30T01:34:59.463+0000 7fa658b53700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-125-g9053ed98/rpm/el8/BUILD/ceph-17.1.0-125-g9053ed98/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::claim_log_and_clear_rollback_info(const pg_log_t&)' thread 7fa658b53700 time 2022-03-30T01:34:59.459273+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-125-g9053ed98/rpm/el8/BUILD/ceph-17.1.0-125-g9053ed98/src/osd/PGLog.h: 286: FAILED ceph_assert(rollback_info_trimmed_to == head)

#2 Updated by Neha Ojha almost 2 years ago

  • Assignee set to Nitzan Mordechai

#3 Updated by Sridhar Seshasayee over 1 year ago

Observed this in a pacific run:
/a/yuriw-2022-06-15_18:29:33-rados-wip-yuri4-testing-2022-06-15-1000-pacific-distro-default-smithi/6881247

Although the job was marked dead, the crash information is available from smithi093 logs.

Test Description:
rados/thrash-erasure-code-overwrites/{bluestore-bitmap ceph clusters/{fixed-2 openstack} fast/normal mon_election/connectivity msgr-failures/osd-dispatch-delay rados recovery-overrides/{more-async-partial-recovery} supported-random-distro$/{centos_8} thrashers/pggrow thrashosds-health workloads/ec-small-objects-overwrites}

Backtrace:


  -424> 2022-06-15T20:31:19.951+0000 7f3fddfd2700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-239-g224fc22e/rpm/el8/BUILD/ceph-16.2.9-239-g224fc22e/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::claim_log_and_clear_rollback_info(const pg_log_t&)' thread 7f3fddfd2700 time 2022-06-15T20:31:19.949134+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-239-g224fc22e/rpm/el8/BUILD/ceph-16.2.9-239-g224fc22e/src/osd/PGLog.h: 286: FAILED ceph_assert(rollback_info_trimmed_to == head)

 ceph version 16.2.9-239-g224fc22e (224fc22e07cebeecc3e08055cfd6105b1a30f173) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x557bf1bb69f4]
 2: ceph-osd(+0x580c0e) [0x557bf1bb6c0e]
 3: (PeeringState::Stray::react(MLogRec const&)+0x230) [0x557bf1f6d320]
 4: (boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xd5) [0x557bf1f99855]
 5: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5b) [0x557bf1d80dab]
 6: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)+0x2d1) [0x557bf1d758f1]
 7: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x29c) [0x557bf1cebedc]
 8: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x557bf1f1f1f6]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x557bf1cddc88]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x557bf235b8b4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x557bf235e794]
 12: /lib64/libpthread.so.0(+0x81ca) [0x7f4005e491ca]
 13: clone()

#4 Updated by Radoslaw Zarzynski over 1 year ago

  • Backport changed from quincy to pacific,quincy

#5 Updated by Radoslaw Zarzynski over 1 year ago

  • Priority changed from High to Normal

Lowering the priority as we haven't seen a reoccurence last time.

#6 Updated by Radoslaw Zarzynski over 1 year ago

  • Duplicated by Bug #57913: Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 unique_pool_2 --yes-i-really-really-mean-it added

#7 Updated by Radoslaw Zarzynski over 1 year ago

Well, just found a new occurance.

#8 Updated by Nitzan Mordechai over 1 year ago

Radoslaw Zarzynski wrote:

Well, just found a new occurance.

Where can i find it?

#9 Updated by Radoslaw Zarzynski over 1 year ago

Nitzan Mordechai wrote:

Radoslaw Zarzynski wrote:

Well, just found a new occurance.

Where can i find it?

https://tracker.ceph.com/issues/57913#note-2

#10 Updated by Kamoltat (Junior) Sirivadhna 12 months ago

/a/yuriw-2023-03-02_00:09:05-rados-wip-yuri11-testing-2023-03-01-1424-distro-default-smithi/7191380

#11 Updated by Nitzan Mordechai 12 months ago

I'm probably missing something here, but i'll try to summarize my finds

/a/yuriw-2023-03-02_00:09:05-rados-wip-yuri11-testing-2023-03-01-1424-distro-default-smithi/7191380

osd.2 hitting the assert rollback_info_trimmed_to == head
from coredump
head = head = {version = 1834, epoch = 682 ..
rollback_info_trimmed_to = {version = 0, epoch = 0

but info.pgid:

(gdb) 
$10 = {pgid = {pgid = {m_pool = 5, m_seed = 36, static calc_name_buf_size = 36 '$'}, shard = {id = 1 '\001', +static NO_SHARD+ = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}, static calc_name_buf_size = 40 '('}, last_update = {version = 1834, epoch = 682, __pad = 0}, 
  last_complete = {version = 1834, epoch = 682, __pad = 0}, last_epoch_started = 686, last_interval_started = 685, last_user_version = 1253, log_tail = {version = 0, epoch = 0, __pad = 0}, last_backfill = {static POOL_META = -1, static POOL_TEMP_START = -2, oid = {name = ""}, snap = {val = 0}, hash = 0, 
    max = false, nibblewise_key_cache = 0, hash_reverse_bits = 0, pool = -9223372036854775808, nspace = "", key = ""}, purged_snaps = {_size = 0, m = std::map with 0 elements}, stats = {version = {version = 1253, epoch = 611, __pad = 0}, reported_seq = 1288, reported_epoch = 611, state = 12846082, 
    last_fresh = {tv = {tv_sec = 1677735243, tv_nsec = 167430097}}, last_change = {tv = {tv_sec = 1677735237, tv_nsec = 341281420}}, last_active = {tv = {tv_sec = 1677735243, tv_nsec = 167430097}}, last_peered = {tv = {tv_sec = 1677735243, tv_nsec = 167430097}}, last_clean = {tv = {tv_sec = 1677735235, 
        tv_nsec = 101799085}}, last_unstale = {tv = {tv_sec = 1677735243, tv_nsec = 167430097}}, last_undegraded = {tv = {tv_sec = 1677735237, tv_nsec = 130030467}}, last_fullsized = {tv = {tv_sec = 1677735237, tv_nsec = 124288530}}, log_start = {version = 0, epoch = 0, __pad = 0}, ondisk_log_start = {
      version = 0, epoch = 0, __pad = 0}, created = 598, last_epoch_clean = 599, parent = {m_pool = 0, m_seed = 0, static calc_name_buf_size = 36 '$'}, parent_split_bits = 6, last_scrub = {version = 0, epoch = 0, __pad = 0}, last_deep_scrub = {version = 0, epoch = 0, __pad = 0}, last_scrub_stamp = {tv = {
        tv_sec = 1677735223, tv_nsec = 1701357}}, last_deep_scrub_stamp = {tv = {tv_sec = 1677735223, tv_nsec = 1701357}}, last_clean_scrub_stamp = {tv = {tv_sec = 1677735223, tv_nsec = 1701357}}, last_scrub_duration = 0, stats = {sum = {num_bytes = 20529152, num_objects = 0, num_object_clones = 0, 
        num_object_copies = 0, num_objects_missing_on_primary = 0, num_objects_degraded = 0, num_objects_unfound = 0, num_rd = 0, num_rd_kb = 0, num_wr = 0, num_wr_kb = 0, num_scrub_errors = 0, num_objects_recovered = 0, num_bytes_recovered = 0, num_keys_recovered = 0, num_shallow_scrub_errors = 0, 
        num_deep_scrub_errors = 0, num_objects_dirty = 0, num_whiteouts = 0, num_objects_omap = 0, num_objects_hit_set_archive = 0, num_objects_misplaced = 0, num_bytes_hit_set_archive = 0, num_flush = 0, num_flush_kb = 0, num_evict = 0, num_evict_kb = 0, num_promote = 0, num_flush_mode_high = 0, 
        num_flush_mode_low = 0, num_evict_mode_some = 0, num_evict_mode_full = 0, num_objects_pinned = 0, num_objects_missing = 0, num_legacy_snapsets = 0, num_large_omap_objects = 0, num_objects_manifest = 0, num_omap_bytes = 0, num_omap_keys = 0, num_objects_repaired = 0}}, log_size = 1253, 
    log_dups_size = 0, ondisk_log_size = 1253, objects_scrubbed = 0, scrub_duration = 0, up = std::vector of length 3, capacity 4 = {7, 2, 5}, acting = std::vector of length 3, capacity 4 = {2147483647, 3, 5}, avail_no_missing = std::vector of length 3, capacity 3 = {{static NO_OSD = 2147483647, osd = 7, 
        shard = {id = 0 '\000', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}, {static NO_OSD = 2147483647, osd = 0, shard = {id = 2 '\002', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}, {
        static NO_OSD = 2147483647, osd = 2, shard = {id = 1 '\001', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}}, object_location_counts = std::map with 2 elements = {[std::set with 3 elements = {[0] = {static NO_OSD = 2147483647, osd = 0, shard = {
            id = 2 '\002', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}, [1] = {static NO_OSD = 2147483647, osd = 2, shard = {id = 1 '\001', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}, 
        [2] = {static NO_OSD = 2147483647, osd = 7, shard = {id = 0 '\000', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}}] = 827, [std::set with 2 elements = {[0] = {static NO_OSD = 2147483647, osd = 2, shard = {id = 1 '\001', static NO_SHARD = {
              id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}, [1] = {static NO_OSD = 2147483647, osd = 7, shard = {id = 0 '\000', static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}}}] = 426}, mapping_epoch = 685, 
    blocked_by = std::vector of length 0, capacity 0, purged_snaps = {_size = 0, m = std::map with 0 elements}, last_became_active = {tv = {tv_sec = 1677735237, tv_nsec = 337804713}}, last_became_peered = {tv = {tv_sec = 1677735237, tv_nsec = 337804713}}, up_primary = 7, acting_primary = 3, 
    snaptrimq_len = 0, objects_trimmed = 0, snaptrim_duration = 0, scrub_sched_status = {m_scheduled_at = {tv = {tv_sec = 1677735301, tv_nsec = 828622384}}, m_duration_seconds = 0, m_sched_status = pg_scrub_sched_status_t::scheduled, m_is_active = false, m_is_deep = scrub_level_t::shallow, 
      m_is_periodic = true}, stats_invalid = true, dirty_stats_invalid = false, omap_stats_invalid = false, hitset_stats_invalid = false, hitset_bytes_stats_invalid = false, pin_stats_invalid = false, manifest_stats_invalid = false}, history = {epoch_created = 683, epoch_pool_created = 598, 
    last_epoch_started = 665, last_interval_started = 664, last_epoch_clean = 665, last_interval_clean = 664, last_epoch_split = 683, last_epoch_marked_full = 0, same_up_since = 612, same_interval_since = 685, same_primary_since = 685, last_scrub = {version = 1796, epoch = 656, __pad = 0}, last_deep_scrub = {
      version = 0, epoch = 0, __pad = 0}, last_scrub_stamp = {tv = {tv_sec = 1677735336, tv_nsec = 306800488}}, last_deep_scrub_stamp = {tv = {tv_sec = 1677735223, tv_nsec = 1701357}}, last_clean_scrub_stamp = {tv = {tv_sec = 1677735336, tv_nsec = 306800488}}, prior_readable_until_ub = {__r = 0}}, hit_set = {
    current_last_update = {version = 0, epoch = 0, __pad = 0}, history = empty std::__cxx11::list}}

#12 Updated by Nitzan Mordechai 12 months ago

Since this is EC pool, the NO_SHARD is confusing, we are not maintaining rollback_info_trimmed_to on replicas, looking for why we have NO_SHARD

#13 Updated by Radoslaw Zarzynski 12 months ago

  • Status changed from New to In Progress

#14 Updated by Laura Flores 5 months ago

  • Related to Bug #60084: crash: void std::list<pg_log_entry_t, mempool::pool_allocator<(mempool::pool_index_t), pg_log_entry_t> >::_M_insert<pg_log_entry_t const&>(std::_List_iterator<pg_log_entry_t>, pg_log_entry_t const&) added

#15 Updated by Laura Flores 5 months ago

/a/yuriw-2023-10-06_20:29:18-rados-wip-yuri6-testing-2023-10-06-0904-quincy-distro-default-smithi/7415444

2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr:2023-10-07T01:48:49.220+0000 7fe693607700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6-1287-gfedcea84/rpm/el8/BUILD/ceph-17.2.6-1287-gfedcea84/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::claim_log_and_clear_rollback_info(const pg_log_t&)' thread 7fe693607700 time 2023-10-07T01:48:49.217145+0000
2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6-1287-gfedcea84/rpm/el8/BUILD/ceph-17.2.6-1287-gfedcea84/src/osd/PGLog.h: 286: FAILED ceph_assert(rollback_info_trimmed_to == head)
2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr:
2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr: ceph version 17.2.6-1287-gfedcea84 (fedcea84a4bd31f0708715b39e04a135187af2ea) quincy (stable)
2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x562d684d4f25]
2023-10-07T01:48:49.228 INFO:tasks.ceph.osd.2.smithi088.stderr: 2: ceph-osd(+0x59a0eb) [0x562d684d50eb]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 3: (PGLog::IndexedLog::claim_log_and_clear_rollback_info(pg_log_t const&)+0x36) [0x562d68964266]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 4: (PGLog::reset_backfill_claim_log(pg_log_t const&, PGLog::LogEntryHandler*)+0x105) [0x562d68964965]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 5: (PeeringState::Stray::react(MLogRec const&)+0x2aa) [0x562d6893a84a]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 6: (boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x109) [0x562d68969e19]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 7: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x73) [0x562d686b9e63]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 8: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)+0x129) [0x562d6869ca69]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 9: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x2e5) [0x562d685f43e5]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 10: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x59) [0x562d688e0869]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x112f) [0x562d68612d1f]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x435) [0x562d68d56305]
2023-10-07T01:48:49.229 INFO:tasks.ceph.osd.2.smithi088.stderr: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x562d68d58a24]
2023-10-07T01:48:49.230 INFO:tasks.ceph.osd.2.smithi088.stderr: 14: /lib64/libpthread.so.0(+0x814a) [0x7fe6bd62914a]
2023-10-07T01:48:49.230 INFO:tasks.ceph.osd.2.smithi088.stderr: 15: clone()

#16 Updated by Laura Flores about 1 month ago

/a/yuriw-2024-01-23_19:22:22-rados-wip-yuri5-testing-2024-01-11-1300-pacific-distro-default-smithi/7529622

Also available in: Atom PDF