Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete() - RADOS - Ceph

Actions

Copy link

Bug #47299

open

Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()

Added by Denis Krienbühl over 3 years ago. Updated almost 2 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Correctness/Safety

Target version:

Ceph - v15.2.4

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v15.2.13, Ceph - v15.2.4, Ceph - v15.2.6, Ceph - v15.2.8

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

97fd8dff92c029abb7aa00e77a6af85ca1b3d963876c082a4f5a579418498427
ea5d5202cbfd479dad466304faaf11b74609c65a810668ae789b64ec19b8be0d
f582692869a94580abf07e6695f97d0a75b4174a2d1727abee7acfb06a234e32
ffcf6fed9c89d0cc0b574d879c8d8d999500d074a062d6e1de03b68fd9eedfbe

Crash signature (v2):

c36502861e39f1a8c525a3621d8188c85791ee487af85e9e92f93d47c47faf50

Description

Some of our ODSs will sometimes crash with the following message:

{
    "crash_id": "2020-09-04T04:16:06.718363Z_78670637-4fad-494f-abea-2afd6e64a970",
    "timestamp": "2020-09-04T04:16:06.718363Z",
    "process_name": "ceph-osd",
    "entity_name": "osd.166",
    "ceph_version": "15.2.4",
    "utsname_hostname": "prod-nvme1-c-rma1",
    "utsname_sysname": "Linux",
    "utsname_release": "4.15.0-72-generic",
    "utsname_version": "#81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019",
    "utsname_machine": "x86_64",
    "os_name": "Ubuntu",
    "os_id": "ubuntu",
    "os_version_id": "18.04",
    "os_version": "18.04.3 LTS (Bionic Beaver)",
    "assert_condition": "p->second.need <= v || p->second.is_delete()",
    "assert_func": "void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]",
    "assert_file": "/build/ceph-15.2.4/src/osd/osd_types.h",
    "assert_line": 4774,
    "assert_thread_name": "tp_osd_tp",
    "assert_msg": "/build/ceph-15.2.4/src/osd/osd_types.h: In function 'void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]' thread 7fd8cfcd8700 time 2020-09-04T06:16:06.699081+0200\n/build/ceph-15.2.4/src/osd/osd_types.h: 4774: FAILED ceph_assert(p->second.need <= v || p->second.is_delete())\n",
    "backtrace": [
        "(()+0x12890) [0x7fd8f7255890]",
        "(gsignal()+0xc7) [0x7fd8f5f07e97]",
        "(abort()+0x141) [0x7fd8f5f09801]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a5) [0x560dc2d9a7b5]",
        "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x560dc2d9a93f]",
        "(()+0xc3d9c8) [0x560dc307b9c8]",
        "(ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)+0x591) [0x560dc3159f31]",
        "(ReplicatedBackend::do_push_reply(boost::intrusive_ptr<OpRequest>)+0xfa) [0x560dc315a67a]",
        "(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x177) [0x560dc315c737]",
        "(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x560dc2fe2867]",
        "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6fd) [0x560dc2f853fd]",
        "(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) [0x560dc2e09aab]",
        "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x67) [0x560dc3066d07]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90c) [0x560dc2e2751c]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x560dc34713ec]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560dc3474640]",
        "(()+0x76db) [0x7fd8f724a6db]",
        "(clone()+0x3f) [0x7fd8f5fea88f]" 
    ]
}

Before and after the crash, we'll see a of errors like this being spammed into the logs:

1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fd8d6ce6700' had timed out after 15

At this point we recreate the OSD, which solves the problem.

Files

Download all files

crash.log.gz (169 KB) crash.log.gz		Denis Krienbühl, 09/04/2020 06:57 AM
crash2.txt (152 KB) crash2.txt		Tobias Urdin, 05/03/2021 12:07 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Status changed from New to Need More Info

Is it possible for you to capture osd logs with debug_osd=30? We'll also try to reproduce this at our end.

Actions

Copy link

Updated by Denis Krienbühl over 3 years ago

Neha Ojha wrote:

Is it possible for you to capture osd logs with debug_osd=30? We'll also try to reproduce this at our end.

Unfortunately we already reset the OSD, as we've been suffering from a number of problems with our cluster and didn't have a chance to keep this in a broken state for investigation. We expect that we may run into further issues like this though, and if permits we'll happily provide more output.

Actions

Copy link

Updated by Tobias Urdin almost 3 years ago

File crash2.txt crash2.txt added

I had multiple OSDs die during an upgrade from Nautilus to Octopus with this trace. See the attached crash2.txt

Actions

Copy link

Updated by Tobias Urdin almost 3 years ago

Neha, can you draw any conclusions from the above debug_osd=30 log with this issue?

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Status changed from Need More Info to New

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Has duplicate Bug #52180: crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]: assert(p->second.need <= v || p->second.is_delete()) added

Actions

Copy link

Updated by Tobias Urdin about 2 years ago

Just got this again during a recovery after doing maintenance on another node this OSD crashed.

1> 2022-03-06T21:08:01.635+0100 7fc59a875700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.15/rpm/el7/BUILD/ceph-15.2.15/src/osd/osd_types.h: In function 'void pg_missing_set&lt;TrackChanges&gt;::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]' thread 7fc59a875700 time 2022-03-06T21:08:01.523875+0100
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.15/rpm/el7/BUILD/ceph-15.2.15/src/osd/osd_types.h: 4774: FAILED ceph_assert(p>second.need <= v || p->second.is_delete())

ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x558cce356419]
 2: (()+0x4df5e1) [0x558cce3565e1]
 3: (PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&)+0x1a1) [0x558cce68a261]
 4: (ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)+0x56d) [0x558cce754acd]
 5: (ReplicatedBackend::do_push_reply(boost::intrusive_ptr&lt;OpRequest&gt;)+0xf2) [0x558cce759142]
 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x227) [0x558cce7594d7]
 7: (PGBackend::handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x4a) [0x558cce5f532a]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x5cb) [0x558cce59b47b]
 9: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, boost::intrusive_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x2f9) [0x558cce43ab69]
 10: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x69) [0x558cce676609]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x143a) [0x558cce45644a]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x558ccea42206]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558ccea44d50]
 14: (()+0x7ea5) [0x7fc5bda74ea5]
 15: (clone()+0x6d) [0x7fc5bc9379fd]

0> 2022-03-06T21:08:01.673+0100 7fc59a875700 -1 ** Caught signal (Aborted) *
 in thread 7fc59a875700 thread_name:tp_osd_tp

ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
 1: (()+0xf630) [0x7fc5bda7c630]
 2: (gsignal()+0x37) [0x7fc5bc86f3d7]
 3: (abort()+0x148) [0x7fc5bc870ac8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19b) [0x558cce356468]
 5: (()+0x4df5e1) [0x558cce3565e1]
 6: (PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&)+0x1a1) [0x558cce68a261]
 7: (ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)+0x56d) [0x558cce754acd]
 8: (ReplicatedBackend::do_push_reply(boost::intrusive_ptr&lt;OpRequest&gt;)+0xf2) [0x558cce759142]
 9: (ReplicatedBackend::_handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x227) [0x558cce7594d7]
 10: (PGBackend::handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x4a) [0x558cce5f532a]
 11: (PrimaryLogPG::do_request(boost::intrusive_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x5cb) [0x558cce59b47b]
 12: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, boost::intrusive_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x2f9) [0x558cce43ab69]
 13: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x69) [0x558cce676609]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x143a) [0x558cce45644a]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x558ccea42206]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558ccea44d50]
 17: (()+0x7ea5) [0x7fc5bda74ea5]
 18: (clone()+0x6d) [0x7fc5bc9379fd]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 0 none
0/ 0 lockdep
0/ 0 context
0/ 0 crush
0/ 0 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 0 buffer
0/ 0 timer
0/ 0 filer
0/ 0 striper
0/ 0 objecter
0/ 0 rados
0/ 0 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_rwl
0/ 0 journaler
0/ 0 objectcacher
0/ 5 immutable_obj_cache
0/ 0 client
0/ 0 osd
0/ 0 optracker
0/ 0 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
0/ 0 mon
0/ 0 monc
0/ 0 paxos
0/ 0 tp
0/ 0 auth
0/ 0 crypto
0/ 0 finisher
0/ 0 reserver
0/ 0 heartbeatmap
0/ 0 perfcounter
0/ 0 rgw
1/ 5 rgw_sync
0/ 0 civetweb
0/ 0 javaclient
0/ 0 asok
0/ 0 throttle
0/ 0 refs
0/ 0 compressor
0/ 0 bluestore
0/ 0 bluefs
0/ 0 bdev
0/ 0 kstore
0/ 0 rocksdb
0/ 0 leveldb
0/ 0 memdb
0/ 0 fuse
0/ 0 mgr
0/ 0 mgrc
0/ 0 dpdk
0/ 0 eventtrace
1/ 5 prioritycache
0/ 5 test
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
-- pthread ID / name mapping for recent threads ---
7fc59a875700 / tp_osd_tp
7fc5ad09a700 / bstore_mempool
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.67.log
--- end dump of recent events ---

Actions

Copy link

Updated by Radoslaw Zarzynski about 2 years ago

If this is easily reproducible could you please provide us with logs of replicas for the failing PG? It can be figured out ceph osd pg dump.
Alternatively, a log from primary but with increased debugs (debug_ms = 1, debug_osd = 30) would be also helpful.

Actions

Copy link

Updated by Telemetry Bot about 2 years ago

Crash signature (v1) updated (diff)
Crash signature (v2) updated (diff)
Affected Versions v15.2.13, v15.2.6, v15.2.8 added

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f582692869a94580abf07e6695f97d0a75b4174a2d1727abee7acfb06a234e32

Assert condition: p->second.need <= v || p->second.is_delete()
Assert function: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]

Sanitized backtrace:

    ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)
    ReplicatedBackend::do_push_reply(boost::intrusive_ptr<OpRequest>)
    ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)
    PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)
    PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)
    OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)
    ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)
    OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)
    ShardedThreadPool::shardedthreadpool_worker(unsigned int)
    ShardedThreadPool::WorkThreadSharded::entry()
    clone()

Crash dump sample:

{
    "archived": "2021-08-23 17:18:32.303571",
    "assert_condition": "p->second.need <= v || p->second.is_delete()",
    "assert_file": "osd/osd_types.h",
    "assert_func": "void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]",
    "assert_line": 4774,
    "assert_msg": "osd/osd_types.h: In function 'void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]' thread 7f72e348b700 time 2021-08-23T09:50:07.539096-0700\nosd/osd_types.h: 4774: FAILED ceph_assert(p->second.need <= v || p->second.is_delete())",
    "assert_thread_name": "tp_osd_tp",
    "backtrace": [
        "(()+0x12730) [0x7f72ff25a730]",
        "(gsignal()+0x10b) [0x7f72fed1f7bb]",
        "(abort()+0x121) [0x7f72fed0a535]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a5) [0x559eb6f49b09]",
        "(()+0x9c5c90) [0x559eb6f49c90]",
        "(()+0xcc0c95) [0x559eb7244c95]",
        "(ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)+0x575) [0x559eb7319a95]",
        "(ReplicatedBackend::do_push_reply(boost::intrusive_ptr<OpRequest>)+0xfa) [0x559eb731dc9a]",
        "(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x237) [0x559eb731e037]",
        "(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x57) [0x559eb71b0e97]",
        "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x62f) [0x559eb7154a5f]",
        "(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x325) [0x559eb6fed5c5]",
        "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x64) [0x559eb7231414]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12fa) [0x559eb700a00a]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x559eb76108b4]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559eb7613330]",
        "(()+0x7fa3) [0x7f72ff24ffa3]",
        "(clone()+0x3f) [0x7f72fede14cf]" 
    ],
    "ceph_version": "15.2.13",
    "crash_id": "2021-08-23T16:50:07.609190Z_40d40626-04c7-4f75-b10d-404b107288e8",
    "entity_name": "osd.f44e26114dd19dd3c396a7933df86e932f20de67",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-osd",
    "stack_sig": "ea5d5202cbfd479dad466304faaf11b74609c65a810668ae789b64ec19b8be0d",
    "timestamp": "2021-08-23T16:50:07.609190Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.11.21-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.11.21-1~bpo10 (Wed, 02 Jun 2021 11:34:45 +0200)" 
}

Actions

Copy link

#10

Updated by Telemetry Bot about 2 years ago

Crash signature (v1) updated (diff)
Crash signature (v2) updated (diff)

Actions

Copy link

#11

Updated by Radoslaw Zarzynski about 2 years ago

Status changed from New to Need More Info
Crash signature (v1) updated (diff)

Still new more info. See the comment #8.

Actions

Copy link

#12

Updated by Tobias Urdin almost 2 years ago

    "assert_condition": "p->second.need <= v || p->second.is_delete()",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/osd/osd_types.h",
    "assert_func": "void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]",
    "assert_line": 4910,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/osd/osd_types.h: In function 'void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]' thread 7f41e33ba700 time 2022-05-12T11:21:23.175952+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/osd/osd_types.h: 4910: FAILED ceph_assert(p->second.need <= v || p->second.is_delete())\n",
    "assert_thread_name": "tp_osd_tp",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7f4207b5dce0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5637d1a4dbcf]",
        "/usr/bin/ceph-osd(+0x56ad98) [0x5637d1a4dd98]",
        "(PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&)+0x1b4) [0x5637d1dcc164]",
        "(ReplicatedBackend::handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*)+0x585) [0x5637d1eef615]",
        "(ReplicatedBackend::do_push_reply(boost::intrusive_ptr<OpRequest>)+0x101) [0x5637d1ef22a1]",
        "(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x237) [0x5637d1ef6e17]",
        "(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x5637d1d297d2]",
        "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x5637d1ccc95e]",
        "(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x5637d1b558e9]",
        "(ceph::osd::scheduler::PGRecoveryMsg::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x5637d1db3248]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x5637d1b75d48]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x5637d21e75b4]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5637d21ea254]",
        "/lib64/libpthread.so.0(+0x81cf) [0x7f4207b531cf]",
        "clone()" 
    ],

One OSD crashed with this when upgrading from Octopus to Pacific

Actions

Copy link

#13

Updated by Radoslaw Zarzynski almost 2 years ago

Hello! A note from a bug scrub:

1. This issue looks like being caused by a particular data stored in OSD which
2. makes it very hard to replicate locally so
3. logs (or a coredump) would be really helpful.

For improving the debugging in the future we can also to split the assert's conditions into dedicated assert. However, this won't help with this particular case.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #47299

Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()

Updated by Neha Ojha over 3 years ago

Updated by Denis Krienbühl over 3 years ago

Updated by Tobias Urdin almost 3 years ago

Updated by Tobias Urdin almost 3 years ago

Updated by Neha Ojha over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Tobias Urdin about 2 years ago

Updated by Radoslaw Zarzynski about 2 years ago

Updated by Telemetry Bot about 2 years ago

Updated by Telemetry Bot about 2 years ago

Updated by Radoslaw Zarzynski about 2 years ago

Updated by Tobias Urdin almost 2 years ago

Updated by Radoslaw Zarzynski almost 2 years ago