Bug #64282: osd crashes due to unexpected pg creation - crimson - Ceph

Actions

Copy link

Bug #64282

closed

osd crashes due to unexpected pg creation

Added by Xuehan Xu 3 months ago. Updated 13 days ago.

Status:

Resolved

Priority:

Normal

Assignee:

Xuehan Xu

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

55407

Crash signature (v1):

Crash signature (v2):

Description

DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - ShardServices::dispatch_context_transaction: empty transaction
DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - peering_event(id=33554908, detail=PeeringEvent(from=0 pgid=2.2 sent=16 requested=16 evt=epoch_sent: 16 epoch_requested: 16 RenewLease)): exit
DEBUG 2024-01-30 05:30:06,943 [shard 2] osd - 0x0 LocalPeeringEvent::start: peering_event(id=33554908, detail=PeeringEvent(from=0 pgid=2.2 sent=16 requested=16 evt=epoch_sent: 16 epoch_requested: 16 RenewLease)): complete
INFO  2024-01-30 05:30:07,428 [shard 0] prioritycache - prioritycache tune_memory target: 4294967296 mapped: 15130624 unmapped: 729088 heap: 15859712 old mem: 2845415832 new mem: 2845415832
INFO  2024-01-30 05:30:07,651 [shard 0] alienstore - stat
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - pg_advance_map(id=33554905, detail=PGAdvanceMap(pg=6.7 from=100 to=112)): advancing map to 102
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 101 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY PeeringState::advance_map handle_advance_map {}/{} -- -1/-1
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 102 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY state<Started>: Started advmap
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 102 pg[6.7( empty local-lis/les=0/0 n=0 ec=73/73 lis/c=0/0 les/c/f=0/0/0 sis=100) [] r=-1 lpr=100 pi=[73,100)/1 crt=0'0 mlcod 0'0 unknown NOTIFY check_recovery_sources no source osds () went down
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - pg_advance_map(id=33554905, detail=PGAdvanceMap(pg=6.7 from=100 to=112)): start: getting map 103
DEBUG 2024-01-30 05:30:07,695 [shard 0] osd - get_local_map loading osdmap.103 from disk
INFO  2024-01-30 05:30:07,695 [shard 0] osd - load_map osdmap.103
INFO  2024-01-30 05:30:07,695 [shard 0] osd - load_map osdmap.103
INFO  2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 88 pg[5.d( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
DEBUG 2024-01-30 05:30:07,695 [shard 0] osd - load_map_bl loading osdmap.103 from disk
INFO  2024-01-30 05:30:07,695 [shard 2] osd - Entering state: Initial
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd - snap_mapper.reset_prefix_itr::from <0> to <CEPH_NOSNAP> ::update_bits
DEBUG 2024-01-30 05:30:07,695 [shard 2] osd -  pg_epoch 88 pg[5.d( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown  ScrubState::ScrubState: entering state ScrubMachine/Inactive
Segmentation fault on shard 2.
Backtrace:
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd -  pg_epoch 112 pg[1.0( v 111'92 (0'0,111'92] local-lis/les=13/15 n=0 ec=13/13 lis/c=13/13 les/c/f=15/18/0 sis=13) [3,0] r=1 lpr=13 luod=0'0 lua=0'0 crt=111'93 lcod 107'91 mlcod 111'93 active PeeringState::update_last_complete_ondisk updating last_complete_ondisk to: 111'92
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd - replicated_request(id=1358, detail=RepRequest(from=3 req=osd_repop(client.4112.0:128 1.0 e111/13 1:30306672:devicehealth::main.db.0000000000000000:head v 111'93, mlcod=111'93) v3)): complete
DEBUG 2024-01-30 05:30:07,696 [shard 0] osd - replicated_request(id=1358, detail=RepRequest(from=3 req=osd_repop(client.4112.0:128 1.0 e111/13 1:30306672:devicehealth::main.db.0000000000000000:head v 111'93, mlcod=111'93) v3)): exit
 0# 0x00005593591ABE41 in ceph-osd
 1# 0x00005593591AC2F5 in ceph-osd
 2# 0x000055935A8DFA68 in ceph-osd
 3# 0x000055935A8DFDD1 in ceph-osd
 4# 0x000055935A916C5E in ceph-osd
 5# 0x000055935A917A56 in ceph-osd
 6# 0x000055935A8B5832 in ceph-osd
 7# 0x00002BA18CE8C1CA in /lib64/libpthread.so.0
 8# clone in /lib64/libc.so.6
Dump of siginfo:
  si_signo: 11
  si_errno: 0
  si_code: 1
  si_pid: 24
  si_uid: 0
  si_status: 0
  si_utime: 0
  si_stime: 0
  si_int: 0
  si_ptr: 0
  si_overrun: 0
  si_timerid: 24
  si_addr: 0x18
  si_band: 24
  si_fd: 0
  si_addr_lsb: 0
  si_lower: 0
  si_upper: 0
  si_pkey: 0
  si_call_addr: 0x18
  si_syscall: 0
  si_arch: 0

The crash was caused by attempted pg creation after the corresponding pool had already been removed.

Files

ceph-osd.0.log.gz (718 KB) ceph-osd.0.log.gz

Xuehan Xu, 02/01/2024 05:23 AM