Project

General

Profile

Actions

Bug #64282

closed

osd crashes due to unexpected pg creation

Added by Xuehan Xu 3 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - ShardServices::dispatch_context_transaction: empty transaction
DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - peering_event(id=33554908, detail=PeeringEvent(from=0 pgid=2.2 sent=16 requested=16 evt=epoch_sent: 16 epoch_requested: 16 RenewLease)): exit
DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - 0x0 LocalPeeringEvent::start: peering_event(id=33554908, detail=PeeringEvent(from=0 pgid=2.2 sent=16 requested=16 evt=epoch_sent: 16 epoch_requested: 16 RenewLease)): complete
INFO  2024-01-30 05:30:07,428 [shard 0] prioritycache - prioritycache tune_memory target: 4294967296 mapped: 15130624 unmapped: 729088 heap: 15859712 old mem: 2845415832 new mem: 2845415832
INFO  2024-01-30 05:30:07,651 [shard 0] alienstore - stat
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - pg_advance_map(id=33554905, detail=PGAdvanceMap(pg=6.7 from=100 to=112)): advancing map to 102
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 101 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY PeeringState::advance_map handle_advance_map {}/{} -- -1/-1
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 102 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY state<Started>: Started advmap
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 102 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY check_recovery_sources no source osds () went down
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - pg_advance_map(id=33554905, detail=PGAdvanceMap(pg=6.7 from=100 to=112)): start: getting map 103
DEBUG 2024-01-30 05:30:07,695 [shard 0] osd - get_local_map loading osdmap.103 from disk
INFO  2024-01-30 05:30:07,695 [shard 0] osd - load_map osdmap.103
INFO  2024-01-30 05:30:07,695 [shard 0] osd - load_map osdmap.103
INFO  2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 88 pg[5.d( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
DEBUG 2024-01-30 05:30:07,695 [shard 0] osd - load_map_bl loading osdmap.103 from disk
INFO  2024-01-30 05:30:07,695 [shard 2] osd - Entering state: Initial
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - snap_mapper.reset_prefix_itr::from <0> to <CEPH_NOSNAP> ::update_bits
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 88 pg[5.d( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown  ScrubState::ScrubState: entering state ScrubMachine/Inactive
Segmentation fault on shard 2.
Backtrace:
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd -  pg_epoch 112 pg[1.0( v 111'92 (0'0,111'92] local-lis/les=13/15 n=0 ec=13/13 lis/c=13/13 les/c/f=15/18/0 sis=13) [3,0] r=1 lpr=13 luod=0'0 lua=0'0 crt=111'93 lcod 107'91 mlcod 111'93 active PeeringState::update_last_complete_ondisk updating last_complete_ondisk to: 111'92
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd - replicated_request(id=1358, detail=RepRequest(from=3 req=osd_repop(client.4112.0:128 1.0 e111/13 1:30306672:devicehealth::main.db.0000000000000000:head v 111'93, mlcod=111'93) v3)): complete
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd - replicated_request(id=1358, detail=RepRequest(from=3 req=osd_repop(client.4112.0:128 1.0 e111/13 1:30306672:devicehealth::main.db.0000000000000000:head v 111'93, mlcod=111'93) v3)): exit
 0# 0x00005593591ABE41 in ceph-osd
 1# 0x00005593591AC2F5 in ceph-osd
 2# 0x000055935A8DFA68 in ceph-osd
 3# 0x000055935A8DFDD1 in ceph-osd
 4# 0x000055935A916C5E in ceph-osd
 5# 0x000055935A917A56 in ceph-osd
 6# 0x000055935A8B5832 in ceph-osd
 7# 0x00002BA18CE8C1CA in /lib64/libpthread.so.0
 8# clone in /lib64/libc.so.6
Dump of siginfo:
  si_signo: 11
  si_errno: 0
  si_code: 1
  si_pid: 24
  si_uid: 0
  si_status: 0
  si_utime: 0
  si_stime: 0
  si_int: 0
  si_ptr: 0
  si_overrun: 0
  si_timerid: 24
  si_addr: 0x18
  si_band: 24
  si_fd: 0
  si_addr_lsb: 0
  si_lower: 0
  si_upper: 0
  si_pkey: 0
  si_call_addr: 0x18
  si_syscall: 0
  si_arch: 0

The crash was caused by attempted pg creation after the corresponding pool had already been removed.


Files

ceph-osd.0.log.gz (718 KB) ceph-osd.0.log.gz Xuehan Xu, 02/01/2024 05:23 AM
Actions #1

Updated by Matan Breizman 2 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 55407
Actions #2

Updated by Matan Breizman 13 days ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF