Bug #52614: Getting retired extent before the corresponding pin is retired - crimson - Ceph

Actions

Copy link

Bug #52614

closed

Getting retired extent before the corresponding pin is retired

Added by Xuehan Xu over 2 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Xuehan Xu

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

43169

Crash signature (v1):

Crash signature (v2):

Description

DEBUG 2021-09-15 14:20:50,824 [shard 0] seastore - Cache::get_root(0x9a8c010): got root read CachedExtent(addr=0x7d457b0, type=ROOT, version=15, dirty_from_or_retired_at=journal_seq_t(segment_seq=20, offset=paddr_t<[0,34], 61440>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=2)
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - write_record, mdlength 8192, dlength 933888, target 24576
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - segment_write to segment [0,35] at offset 24576, physical offset 2348843008, len 942080
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - block: do_writev offset 2348843008 len 942080
DEBUG 2021-09-15 14:20:50,825 [shard 0] test - waiting for command
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - BtreeLBAManager::get_mapping(0x9a8c010): 749505347584
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - Cache::get_root(0x9a8c010): root already on transaction CachedExtent(addr=0x7d457b0, type=ROOT, version=15, dirty_from_or_retired_at=journal_seq_t(segment_seq=20, offset=paddr_t<[0,34], 61440>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=3)
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBATree::get_internal_node(0x9a8c010): reading internal at offset paddr_t<[0,26], 77824>, depth 3
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - Cache::add_extent: extent CachedExtent(addr=0x4aaaa70, type=LADDR_INTERNAL, version=0, dirty_from_or_retired_at=journal_seq_t(segment_seq=0, offset=paddr_t<NULL_SEG, NULL_OFF>), paddr=paddr_t<[0,26], 77824>, state=CLEAN, last_committed_crc=0, refcount=2, size=0, meta=btree_node_meta_t(begin=0, end=0, depth=0))
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - block: do_read offset 1744916480 len 4096
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 53248> -> paddr_t<[0,26], 131072>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 45056> -> paddr_t<[0,26], 122880>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 36864> -> paddr_t<[0,26], 114688>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 49152> -> paddr_t<[0,26], 126976>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 126976> -> paddr_t<[0,26], 204800>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 131072> -> paddr_t<[0,26], 208896>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 69632> -> paddr_t<[0,26], 147456>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBAInternalNode::resolve_relative_addrs paddr_t<BLOCK_REL_SEG, 241664> -> paddr_t<[0,26], 319488>
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - Cache::get_extent(0x9a8c010): Found extent at offset paddr_t<[0,26], 77824> in cache: CachedExtent(addr=0x4aaaa70, type=LADDR_INTERNAL, version=0, dirty_from_or_retired_at=journal_seq_t(segment_seq=0, offset=paddr_t<NULL_SEG, NULL_OFF>), paddr=paddr_t<[0,26], 77824>, state=CLEAN, last_committed_crc=1623800799, refcount=1, size=16, meta=btree_node_meta_t(begin=0, end=18446744073709551615, depth=3))
DEBUG 2021-09-15 14:20:50,825 [shard 0] seastore - LBATree::get_internal_node(0x9a8c010): read internal at offset paddr_t<[0,26], 77824> CachedExtent(addr=0x4aaaa70, type=LADDR_INTERNAL, version=0, dirty_from_or_retired_at=journal_seq_t(segment_seq=0, offset=paddr_t<NULL_SEG, NULL_OFF>), paddr=paddr_t<[0,26], 77824>, state=CLEAN, last_committed_crc=1623800799, refcount=2, size=16, meta=btree_node_meta_t(begin=0, end=18446744073709551615, depth=3))
ERROR 2021-09-15 14:20:50,825 [shard 0] seastore - add_pin: unable to add btree_range_pin_t(begin=0, end=18446744073709551615, depth=3, extent=0x4aaaa70) (CachedExtent(addr=0x4aaaa70, type=LADDR_INTERNAL, version=0, dirty_from_or_retired_at=journal_seq_t(segment_seq=0, offset=paddr_t<NULL_SEG, NULL_OFF>), paddr=paddr_t<[0,26], 77824>, state=CLEAN, last_committed_crc=1623800799, refcount=2, size=16, meta=btree_node_meta_t(begin=0, end=18446744073709551615, depth=3))), found btree_range_pin_t(begin=0, end=18446744073709551615, depth=3, extent=0x771dc10) (CachedExtent(addr=0x771dc10, type=LADDR_INTERNAL, version=2, dirty_from_or_retired_at=journal_seq_t(segment_seq=4294967295, offset=paddr_t<[15,268435455], NULL_OFF>), paddr=paddr_t<[0,26], 77824>, state=INVALID, last_committed_crc=531765597, refcount=2, size=16, meta=btree_node_meta_t(begin=0, end=18446744073709551615, depth=3)))
ERROR 2021-09-15 14:20:50,825 [shard 0] none - ../src/crimson/os/seastore/lba_manager/btree/btree_range_pin.cc:122 : In function 'void crimson::os::seastore::lba_manager::btree::btree_pin_set_t::add_pin(crimson::os::seastore::lba_manager::btree::btree_range_pin_t&)', ceph_assert(%s)
0 == "impossible"
Aborting on shard 0.
Backtrace:
0x237c146
0x233f16b
0x233f422
0x233f52d
0x233f57a
/lib64/libpthread.so.0+0x12dcf
/lib64/libc.so.6+0x3770e
/lib64/libc.so.6+0x21b24
0x21bc799
0x21bc827
0x1b2f619
0x1b493a5
0x1b4940a
0x1b4998e
0x1b49afb
0x1b49bf8
0x1b49d4e
0x1b60c8e
0x233bc54
0x234243f
0x2368768
0x2368df1
0x2310efc
0x23112d3
0x1229bfd
/lib64/libc.so.6+0x236a2
0x122060d

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

Pull request ID set to 43169

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

File tmp.log added

Actions

Copy link

Updated by Samuel Just over 2 years ago

Assignee changed from Xuehan Xu to Samuel Just

I think the relevant lines are:

DEBUG 2021-09-15 20:24:38,501 [shard 0] seastore - Cache::get_root(0x488d010): waiting root CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=DIRTY, last_committed_crc=0, refcount=5)
...
DEBUG 2021-09-15 20:24:38,509 [shard 0] seastore - Cache::get_root(0x488d010): got root read CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=2)

The problem is that the root extent 0x44712d0 got updated between the original call to wait_io() and the task processing transaction 0x488d010 resuming, not the subsequent call where the it was already on the transation:

DEBUG 2021-09-15 20:24:38,509 [shard 0] seastore - Cache::get_root(0x488d010): root already on transaction CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=3)

I think the bug is actually that we need to check that the extent is valid after wait_io() as in get_extent(Transaction &t...).

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

File deleted (~~tmp.log~~)

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

Assignee changed from Samuel Just to Xuehan Xu

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

Samuel Just wrote:

I think the relevant lines are:

DEBUG 2021-09-15 20:24:38,501 [shard 0] seastore - Cache::get_root(0x488d010): waiting root CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=DIRTY, last_committed_crc=0, refcount=5)
...
DEBUG 2021-09-15 20:24:38,509 [shard 0] seastore - Cache::get_root(0x488d010): got root read CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=2)

The problem is that the root extent 0x44712d0 got updated between the original call to wait_io() and the task processing transaction 0x488d010 resuming, not the subsequent call where the it was already on the transation:

DEBUG 2021-09-15 20:24:38,509 [shard 0] seastore - Cache::get_root(0x488d010): root already on transaction CachedExtent(addr=0x44712d0, type=ROOT, version=114, dirty_from_or_retired_at=journal_seq_t(segment_seq=168, offset=paddr_t<[0,230], 114688>), paddr=paddr_t<NULL_SEG, NULL_OFF>, state=INVALID, last_committed_crc=0, refcount=3)

I think the bug is actually that we need to check that the extent is valid after wait_io() as in get_extent(Transaction &t...).

Sorry for this confusion, the log in the attached file is actually for another issue than what this ticket is about, I've created another issue ticket for it, please see: https://tracker.ceph.com/issues/52623

Actions

Copy link

Updated by Xuehan Xu over 2 years ago

Status changed from New to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » crimson

Custom queries

Bug #52614

Getting retired extent before the corresponding pin is retired

Updated by Xuehan Xu over 2 years ago

Updated by Xuehan Xu over 2 years ago

Updated by Samuel Just over 2 years ago

Updated by Xuehan Xu over 2 years ago

Updated by Xuehan Xu over 2 years ago

Updated by Xuehan Xu over 2 years ago

Updated by Xuehan Xu over 2 years ago