Project

General

Profile

Actions

Bug #52173

open

crash in ProtocolV2::send_message()

Added by Telemetry Bot over 2 years ago. Updated about 2 years ago.

Status:
Need More Info
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Telemetry
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

275bfebdff86cb8d90c56459c72a2b625eb9cba8e48a334deeaa708ac8c3e921
6d94986fed58ed0b64c4deac7007641e663ad4efe8f8763e7d6b392a60e77674


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=275bfebdff86cb8d90c56459c72a2b625eb9cba8e48a334deeaa708ac8c3e921

Sanitized backtrace:

    __pthread_mutex_lock()
    ProtocolV2::send_message(Message*)
    AsyncConnection::send_message(Message*)
    PG::send_cluster_message(int, Message*, unsigned int, bool)
    PeeringState::send_lease()
    PeeringState::proc_renew_lease()
    PeeringState::Active::react(RenewLease const&)
    boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list12<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<PeeringState::AllReplicasActivated>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<RemoteReservationRevokedTooFull>, boost::statechart::custom_reaction<RemoteReservationRevoked>, boost::statechart::custom_reaction<PeeringState::DoRecovery>, boost::statechart::custom_reaction<RenewLease>, boost::statechart::custom_reaction<MLeaseAck>, boost::statechart::custom_reaction<PeeringState::CheckReadable> >, boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)
    boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)
    boost::statechart::simple_state<PeeringState::Clean, PeeringState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)
    boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)
    PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)
    OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)
    ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)
    OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)
    ShardedThreadPool::shardedthreadpool_worker(unsigned int)
    ShardedThreadPool::WorkThreadSharded::entry()
    clone()

Crash dump sample:
{
    "backtrace": [
        "(()+0x12730) [0x7f3278410730]",
        "(()+0x1129c) [0x7f327840f29c]",
        "(__pthread_mutex_lock()+0x54) [0x7f3278408714]",
        "(ProtocolV2::send_message(Message*)+0x3c3) [0x564f581aa333]",
        "(AsyncConnection::send_message(Message*)+0x596) [0x564f5818ae66]",
        "(PG::send_cluster_message(int, Message*, unsigned int, bool)+0x60) [0x564f578e78f0]",
        "(PeeringState::send_lease()+0x11e) [0x564f57aa81ee]",
        "(PeeringState::proc_renew_lease()+0x4d) [0x564f57ab8f6d]",
        "(PeeringState::Active::react(RenewLease const&)+0x18) [0x564f57ab8fb8]",
        "(boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list12<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<PeeringState::AllReplicasActivated>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<RemoteReservationRevokedTooFull>, boost::statechart::custom_reaction<RemoteReservationRevoked>, boost::statechart::custom_reaction<PeeringState::DoRecovery>, boost::statechart::custom_reaction<RenewLease>, boost::statechart::custom_reaction<MLeaseAck>, boost::statechart::custom_reaction<PeeringState::CheckReadable> >, boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x321) [0x564f57b11911]",
        "(boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x27e) [0x564f57b11d1e]",
        "(boost::statechart::simple_state<PeeringState::Clean, PeeringState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x7d) [0x564f57b08fed]",
        "(boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6c) [0x564f57913dac]",
        "(PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)+0x2c6) [0x564f57906436]",
        "(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x17c) [0x564f5788342c]",
        "(ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x52) [0x564f57a9da82]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12fa) [0x564f5787600a]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x564f57e7c8b4]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x564f57e7f330]",
        "(()+0x7fa3) [0x7f3278405fa3]",
        "(clone()+0x3f) [0x7f3277fb44cf]" 
    ],
    "ceph_version": "15.2.13",
    "crash_id": "2021-06-25T12:18:12.331799Z_742f9350-d4fd-4f55-a187-19bacc8a333c",
    "entity_name": "osd.a95ffd7d537df64d721779075dc86e3bcb4d181d",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-osd",
    "stack_sig": "6d94986fed58ed0b64c4deac7007641e663ad4efe8f8763e7d6b392a60e77674",
    "timestamp": "2021-06-25T12:18:12.331799Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.119-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.4.119-1 (Tue, 01 Jun 2021 15:32:00 +0200)" 
}

Actions #1

Updated by Telemetry Bot over 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v15.2.10, v15.2.11, v15.2.13, v15.2.8, v15.2.9 added
Actions #2

Updated by Neha Ojha over 2 years ago

  • Subject changed from crash: __pthread_mutex_lock() to crash in ProtocolV2::send_message()
  • Status changed from New to Need More Info
  • Priority changed from Normal to Low

Seen on 2 octopus clusters.

Actions #3

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions

Also available in: Atom PDF