Project

General

Profile

Actions

Bug #59165

closed

osd crash due with read gave enoent on osdmap

Added by Matan Breizman about 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In `rados_api_tests`, we run `workunits/rados/test.sh`.
Not consistently, one of the osds crash with the following backtrace:

DEBUG 2023-03-26 09:08:44,237 [shard 0] osd - pg_advance_map(id=729864, detail=PGAdvanceMap(pg=31.1a from=1101 to=1102)): complete
DEBUG 2023-03-26 09:08:44,237 [shard 0] osd - pg_advance_map(id=730380, detail=PGAdvanceMap(pg=73.2 from=1101 to=1102)): complete
terminate called after throwing an instance of 'std::runtime_error'
what(): read gave enoent on #-1:16a18850:::osdmap.1103:0#
Aborting on shard 0.
Backtrace:
Reactor stalled for 38 ms on shard 0.
0# gsignal in /lib64/libc.so.6
1# abort in /lib64/libc.so.6
2# 0x00007F034A4AA09B in /lib64/libstdc++.so.6
3# 0x00007F034A4B053C in /lib64/libstdc++.so.6
4# 0x00007F034A4AF559 in /lib64/libstdc++.so.6
5# __gxx_personality_v0 in /lib64/libstdc++.so.6
6# 0x00007F034916EB03 in /lib64/libgcc_s.so.1
7# _Unwind_Resume in /lib64/libgcc_s.so.1
8# 0x0000560B05296C3C in ceph-osd
9# 0x0000560B05298DA9 in ceph-osd
10# void seastar::futurize<seastar::future<ceph::buffer::v15_2_0::list> >::satisfy_with_result_of<seastar::future<ceph::buffer::v15_2_0::list>::then_wrapped_nrvo<seastar::future<ceph::buffer::v15_2_0::list>, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)> >(seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&, seastar::future_state<ceph::buffer::v15_2_0::list>&&)#1}::operator()(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&, seastar::future_state<ceph::buffer::v15_2_0::list>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::future<ceph::buffer::v15_2_0::list>::then_wrapped_nrvo<seastar::future<ceph::buffer::v15_2_0::list>, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)> >(seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&, seastar::future_state<ceph::buffer::v15_2_0::list>&&)#1}::operator()(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&, seastar::future_state<ceph::buffer::v15_2_0::list>&&) const::{lambda()#1}&&) in ceph-osd
11# seastar::continuation<seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>, seastar::future<ceph::buffer::v15_2_0::list>::then_wrapped_nrvo<seastar::future<ceph::buffer::v15_2_0::list>, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)> >(seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<ceph::buffer::v15_2_0::list>&&, seastar::noncopyable_function<seastar::future<ceph::buffer::v15_2_0::list> (seastar::future<ceph::buffer::v15_2_0::list>&&)>&, seastar::future_state<ceph::buffer::v15_2_0::list>&&)#1}, ceph::buffer::v15_2_0::list>::run_and_dispose() in ceph-osd
12# 0x0000560B1315EDDF in ceph-osd

First instance of this issue (AFAIK) is commented here [1], on main branch with head sha1 of fa8a9c73ae3a5cc905789e96e76c4f9d3a0b0573.
This crash continues to appear every now and then, see osd.1.log in [2], or osd.3.log in [3].

[1] https://github.com/ceph/ceph/pull/49286#discussion_r1095950772
[2] https://pulpito.ceph.com/matan-2023-03-26_08:06:20-crimson-rados-wip-matanb-crimson-only-testing-no_trim-23.3-v2-distro-crimson-smithi/7220607/
[3] https://pulpito.ceph.com/matan-2023-03-26_07:59:51-crimson-rados-wip-matanb-crimson-only-testing-no_trim-23.3-v2-distro-crimson-smithi/7220586/

Actions #1

Updated by Matan Breizman 11 months ago

  • Status changed from New to In Progress
  • Assignee set to Matan Breizman
  • Pull request ID set to 51425
Actions #2

Updated by Matan Breizman 11 months ago

Detailed explanation: https://gist.github.com/Matan-B/67a6c2a9b9c581600799c2c37704e913

Fixes the following case:

DEBUG 2023-05-09 13:27:22,883 [shard 0] osd - pg_advance_map(id=13033, detail=PGAdvanceMap(pg=10.16 from=76 to=77)): complete
DEBUG 2023-05-09 13:27:22,883 [shard 0] osd - pg_advance_map(id=13152, detail=PGAdvanceMap(pg=15.11 from=76 to=77)): sending pg temp
DEBUG 2023-05-09 13:27:22,883 [shard 0] osd - pg_advance_map(id=12506, detail=PGAdvanceMap(pg=3.1b from=77 to=76)): start: getting map 78
DEBUG 2023-05-09 13:27:22,883 [shard 0] osd - get_local_map loading osdmap.78 from disk
INFO  2023-05-09 13:27:22,883 [shard 0] osd - load_map osdmap.78
DEBUG 2023-05-09 13:27:22,883 [shard 0] osd - load_map_bl loading osdmap.78 from disk
DEBUG 2023-05-09 13:27:22,883 [shard 0] alienstore - read
...
DEBUG 2023-05-09 13:27:22,957 [shard 0] osd - pg_advance_map(id=13074, detail=PGAdvanceMap(pg=12.d from=76 to=77)): complete
DEBUG 2023-05-09 13:27:22,957 [shard 0] osd - pg_advance_map(id=12975, detail=PGAdvanceMap(pg=8.b from=76 to=77)): sending pg temp
DEBUG 2023-05-09 13:27:22,957 [shard 0] osd - pg_advance_map(id=13144, detail=PGAdvanceMap(pg=15.7 from=76 to=77)): complete
DEBUG 2023-05-09 13:27:22,957 [shard 0] osd - pg_advance_map(id=12833, detail=PGAdvanceMap(pg=2.2 from=76 to=77)): sending pg temp
DEBUG 2023-05-09 13:27:22,957 [shard 0] osd - OSDMeta::load_map: start read gave enoent on 78
ERROR 2023-05-09 13:27:22,957 [shard 0] none - ../src/crimson/osd/osd_meta.cc:40 : In function 'OSDMeta::load_map(epoch_t)::<lambda()>', abort(%s)
abort() called
// See PGAdvanceMap(pg=3.1b from=77 to=76)
// from is set via `pg->get_osdmap_epoch()`
// 'from' is already newer than 'to'
// we start iterating from 77+1 (osdmap.78 does not exist)
Actions #3

Updated by Matan Breizman 11 months ago

  • Priority changed from Normal to High

This is starting to appear quite frequently in latest tests.

Actions #4

Updated by Matan Breizman 10 months ago

  • Status changed from In Progress to Resolved
  • Pull request ID changed from 51425 to 51961
Actions #5

Updated by Matan Breizman 10 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to reef
Actions #6

Updated by Ernesto Puerta 6 months ago

  • Tags set to backport_processed
Actions #7

Updated by Matan Breizman 3 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF