Project

General

Profile

Actions

Bug #43404

closed

mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs

Added by Sage Weil over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -25> 2019-12-21T10:49:03.977+0000 7fd356ad0700  5 mon.a@0(leader).paxos(paxos active c 1..6) is_readable = 1 - now=2019-12-21T10:49:03.979190+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 6
   -24> 2019-12-21T10:49:03.977+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 preprocess_query mon_command({"prefix": "osd pool create", "pool": "rbd", "pg_num": 8} v 0) v1 from client.? 127.0.0.1:0/2475729131
   -23> 2019-12-21T10:49:03.977+0000 7fd356ad0700  7 mon.a@0(leader).osd e1 prepare_update mon_command({"prefix": "osd pool create", "pool": "rbd", "pg_num": 8} v 0) v1 from client.? 127.0.0.1:0/2475729131
   -22> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 prepare_new_pool crush smoke test duration: 0.002999928s
   -21> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 should_propose
   -20> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).paxosservice(osdmap 1..1) propose_pending
   -19> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 encode_pending e 2
   -18> 2019-12-21T10:49:03.980+0000 7fd356ad0700  1 mon.a@0(leader).osd e1 do_prune osdmap full prune enabled
   -17> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 should_prune currently holding only 0 epochs (min osdmap epochs: 500); do not prune.
   -16> 2019-12-21T10:49:03.980+0000 7fd356ad0700  1 mon.a@0(leader).osd e1 encode_pending skipping prime_pg_temp; mapping job did not start
   -15> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs
   -14> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 scan_for_creating_pgs queueing pool create for 1 replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode off last_change 2 flags hashpspool,creating stripe_width 0
   -13> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs 1 pools queued
   -12> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs 0 pgs removed because they're created
   -11> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs pool 1 created 2 modified 2019-12-21T10:49:03.981713+0000 [0-8)
   -10> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.0
    -9> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.1
    -8> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.2
    -7> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.3
    -6> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.4
    -5> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.5
    -4> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.6
    -3> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.7
    -2> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs done with queue for 1
    -1> 2019-12-21T10:49:03.980+0000 7fd356ad0700 10 mon.a@0(leader).osd e1 update_pending_pgs queue remaining: 0 pools
     0> 2019-12-21T10:49:03.982+0000 7fd356ad0700 -1 *** Caught signal (Aborted) **
 in thread 7fd356ad0700 thread_name:ms_dispatch

 ceph version 15.0.0-8774-g97c6c2d (97c6c2d59fd21e35c97001dcd6afbf0f747c0784) octopus (dev)
 1: (()+0x12d80) [0x7fd361eb6d80]
 2: (gsignal()+0x10f) [0x7fd360b9193f]
 3: (abort()+0x127) [0x7fd360b7bc95]
 4: (()+0x2df7d8) [0x7fd3641ca7d8]
 5: (OSDMap::_pg_to_raw_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*, unsigned int*) const+0x3df) [0x7fd3645697af]
 6: (OSDMap::_pg_to_up_acting_osds(pg_t const&, std::vector<int, std::allocator<int> >*, int*, std::vector<int, std::allocator<int> >*, int*, bool) const+0x19c) [0x7fd36456a09c]
 7: (OSDMonitor::update_pending_pgs(OSDMap::Incremental const&, OSDMap const&)+0xf76) [0x562aa96a1366]
 8: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x441) [0x562aa96b3041]
 9: (PaxosService::propose_pending()+0x223) [0x562aa96707f3]
 10: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x114f) [0x562aa9671dbf]
 11: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x2627) [0x562aa9575707]
 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x888) [0x562aa9579c98]
 13: (Monitor::_ms_dispatch(Message*)+0xe9e) [0x562aa957b80e]

/a/sage-2019-12-21_02:45:13-rados-wip-sage-testing-2019-12-20-1617-distro-basic-smithi/4621807

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #43731: nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgsResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil over 4 years ago

/a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660728

   -15> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 encode_pending e 2
   -14> 2020-01-12T22:45:26.670+0000 7fdeae24c700  1 mon.a@0(leader).osd e1 do_prune osdmap full prune enabled
   -13> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 should_prune currently holding only 0 epochs (min osdmap epochs: 500); do not prune.
   -12> 2020-01-12T22:45:26.670+0000 7fdeae24c700  1 mon.a@0(leader).osd e1 encode_pending skipping prime_pg_temp; mapping job did not start
   -11> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs
   -10> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 scan_for_creating_pgs queueing pool create for 1 replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 autoscale_mode off last_change 2 flags hashpspool,creating stripe_width 0
    -9> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs 1 pools queued
    -8> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs 0 pgs removed because they're created
    -7> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs pool 1 created 2 modified 2020-01-12T22:45:26.671798+0000 [0-4)
    -6> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.0
    -5> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.1
    -4> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.2
    -3> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs adding 1.3
    -2> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs done with queue for 1
    -1> 2020-01-12T22:45:26.670+0000 7fdeae24c700 10 mon.a@0(leader).osd e1 update_pending_pgs queue remaining: 0 pools
     0> 2020-01-12T22:45:26.672+0000 7fdeae24c700 -1 *** Caught signal (Aborted) **
 in thread 7fdeae24c700 thread_name:ms_dispatch

 ceph version 15.0.0-9201-ge743ac1 (e743ac1e7f8462d3b8556e5d2d5e32c08b1c1281) octopus (dev)
 1: (()+0x12d80) [0x7fdeb961cd80]
 2: (gsignal()+0x10f) [0x7fdeb82f793f]
 3: (abort()+0x127) [0x7fdeb82e1c95]
 4: (()+0x2cb298) [0x7fdebb91c298]
 5: (OSDMap::_pg_to_raw_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*, unsigned int*) const+0x3df) [0x7fdebbcbe9cf]
 6: (OSDMap::_pg_to_up_acting_osds(pg_t const&, std::vector<int, std::allocator<int> >*, int*, std::vector<int, std::allocator<int> >*, int*, bool) const+0x19c) [0x7fdebbcbf2bc]
 7: (OSDMonitor::update_pending_pgs(OSDMap::Incremental const&, OSDMap const&)+0xf56) [0x564c6f00caa6]
 8: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x441) [0x564c6f01e8b1]
 9: (PaxosService::propose_pending()+0x223) [0x564c6efdbe83]
 10: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x114f) [0x564c6efdd44f]
 11: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x547f) [0x564c6eee414f]
 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x888) [0x564c6eee55d8]
 13: (Monitor::_ms_dispatch(Message*)+0xe9e) [0x564c6eee714e]
 14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x564c6ef14adc]
 15: (DispatchQueue::entry()+0x12b2) [0x7fdebbae0ca2]
 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fdebbb836f1]
 17: (()+0x82de) [0x7fdeb96122de]
 18: (clone()+0x43) [0x7fdeb83bca63]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #2

Updated by Sage Weil over 4 years ago

  • Status changed from New to Fix Under Review
  • Backport set to nautilus
  • Pull request ID set to 32661
Actions #3

Updated by Sage Weil over 4 years ago

  • Pull request ID changed from 32661 to 32673
Actions #4

Updated by Sage Weil over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #43731: nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs added
Actions #6

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF