Project

General

Profile

Actions

Bug #65200

closed

PeeringState::get_peer_info(pg_shard_t) const: Assertion `it != peer_info.end()' failed.

Added by Matan Breizman about 1 month ago. Updated 6 days ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd.1: https://pulpito.ceph.com/matan-2024-03-27_13:02:57-crimson-rados-main-distro-crimson-smithi/7626293

After adding a restart OSDs to the thrash tests: https://github.com/ceph/ceph/pull/56511

INFO  2024-03-27 13:27:01,801 [shard 0:main] osd - start_primary_recovery_ops recovering 0 in pg  pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering , missing missing(1 may_include_deletes = 1)
INFO  2024-03-27 13:27:01,801 [shard 0:main] osd - start_primary_recovery_ops 3:48a442ac:::smithi01231316-12:head item.need 40'55  (missing)  (missing head)  
INFO  2024-03-27 13:27:01,801 [shard 0:main] osd - recover_missing 3:48a442ac:::smithi01231316-12:head v 40'55
INFO  2024-03-27 13:27:01,801 [shard 0:main] osd - recover_missing 3:48a442ac:::smithi01231316-12:head v 40'55, new recovery
DEBUG 2024-03-27 13:27:01,801 [shard 0:main] osd - recover_object: 3:48a442ac:::smithi01231316-12:head, 40'55
DEBUG 2024-03-27 13:27:01,801 [shard 0:main] osd - maybe_pull_missing_obj: 3:48a442ac:::smithi01231316-12:head, 40'55
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd -  pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering  ObjectContextLoader::with_head_obc: object 3:48a442ac:::smithi01231316-12:head
INFO  2024-03-27 13:27:01,802 [shard 0:main] osd - start_primary_recovery_ops started 1 skipped 1
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd -  pg_epoch 45 pg[3.2( v 40'55 lc 36'54 (0'0,40'55] local-lis/les=44/45 n=0 ec=14/14 lis/c=44/14 les/c/f=45/15/0 sis=44) [1,0,3] r=0 lpr=44 pi=[14,44)/2 crt=40'55 mlcod 0'0 active+recovering  ObjectContextLoader::get_or_load_obc: cache hit on 3:48a442ac:::smithi01231316-12:head
DEBUG 2024-03-27 13:27:01,802 [shard 0:main] osd - prepare_pull: 3:48a442ac:::smithi01231316-12:head, 40'55
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2476-g56e21662/rpm/el9/BUILD/ceph-19.0.0-2476-g56e21662/src/osd/PeeringState.h:2349: const pg_info_t& PeeringState::get_peer_info(pg_shard_t) const: Assertion `it != peer_info.end()' failed.
Aborting on shard 0.
Backtrace:
Reactor stalled for 159 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x118a06 0x118829 0x6efa70b 0x6efca58 0x6efd993 0x6efde84 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack: 0xffffffffffffff80 0xffffffff8e781dc1 0xffffffff8e782126 0xffffffff8e505d94 0xffffffff8e505f31 0xffffffff8e50733f 0xffffffff8e50801b 0xffffffff8e5084d0 0xffffffff8f07e45c 0xffffffff8f2000ea
Reactor stalled for 303 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195b59 0x6efa069 0x6efc6cb 0x6efd993 0x6efde84 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
Reactor stalled for 539 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195b53 0x6efa069 0x6efe1dd 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
Reactor stalled for 975 ms on shard 0. Backtrace: 0x6bddd 0xb99f089 0xb871b50 0xb8730cc 0xb8732e2 0xb873438 0xb873901 0x54daf 0x195bc1 0x6efa069 0x6efc6cb 0x6efd006 0x6efd5f7 0x6efd7b2 0x6efdcdf 0x6efe723 0x6efedaf 0x6efef6b 0x6ef9294 0x6ef9685 0x6ef994c 0x54daf 0xa154b 0x54d05 0x287f2 0x2871a 0x4dca5 0x3f96f7c 0x4f4503f 0x4f5601c 0x4f5771b 0x4f578fb 0x4f57a79 0x3f5e56c 0x460952c 0x4609732 0x46098e6 0x461333a 0x4613515 0x4613839 0x4613b44 0x4613caf 0x4613d38 0x46175d6 0x461789a 0x461792e 0x46179e6 0x462b282 0x462b4fa 0x462b76d 0x4688845 0xb8847d5 0xb89ea6f 0xb93fa6d 0xb9410bb 0xb61d823 0xb61e19f 0x368057a 0x3feaf 0x3ff5f 0x346c434
kernel callstack:
 0# 0x00007F0AE5EA154C in /lib64/libc.so.6
 1# raise in /lib64/libc.so.6
 2# abort in /lib64/libc.so.6
 3# 0x00007F0AE5E2871B in /lib64/libc.so.6
 4# 0x00007F0AE5E4DCA6 in /lib64/libc.so.6
 5# PeeringState::get_peer_info(pg_shard_t) const in ceph-osd
 6# ReplicatedRecoveryBackend::prepare_pull(boost::intrusive_ptr<crimson::osd::ObjectContext> const&, PullOp&, RecoveryBackend::pull_info_t&, hobject_t const&, eversion_t) in ceph-
Actions #2

Updated by Matan Breizman 26 days ago

  • Status changed from New to Fix Under Review
  • Assignee set to Matan Breizman
  • Pull request ID set to 56611
Actions #3

Updated by Matan Breizman 6 days ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF