Project

General

Profile

Actions

Bug #6982

closed

osd crashed when running mixed versions of dumpling and master

Added by Tamilarasi muthamizhan over 10 years ago. Updated over 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

have a cluster up and running on dumpling, upgrading only the osds on one node to master branch followed by thrashing osds, causes the osd that is on master to crash with core dump.

the config file used:

overrides:
  ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.c
  - osd.3
  - osd.4
  - osd.5
  - client.0
targets:
  ubuntu@mira057.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+IR8k7Hrnnl7zZUj9Kyb5GrlCodfuyvxkpokRLWZGLbjPOzd3gdszhLCWa0F2FVzl/2upKr9VfMzoVYF5Q3eKn7sQJ1AmDdvHINKM6hYnm2ruKxzLCjK11wdr5Gt/WFQ3g6U5YFjIX19cLVLhrPwj0aM+27cTv+6KZrl56dPwRj7vVnyB7CIUmc1NpbD/LN+Oan+DISnWNvSUrdq0e70owvuZv2uHgWOJstErLD/arxQ97A1AdxLcfi8sAA12Gu3if4t+Aq+6KmZorrxQimni06b7vWr9EC5NDuxcOm5ReGkyy45ED4QK7yVCmzmQFpRX+X1ZLbQq71zcQf7E3EW7
  ubuntu@mira074.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuzTGrQ9CaYud6lfhAXosbRh8/P1xCeTQfxuj5QYWYJf079r2b4IPlhW+rOc2ZfK5HkOatZH0+eV6eMZREYMLZNn8n+S3jQclWpyyoI6U0B0TP65ByYRtI2f+wvab5TGBWXHasGLNQh7zzxadhLWMVQ9AT/7c5oJTEHe1+BRIfvR0dBpK/cCrOlVjwcGUYkZn6s/My216zbVVuENHXa62NJBAlmNEWJsJHRh9IEDB+Cl+PmD+qD5zAWgJr2e2OtOWh+9v8v6YWOyO3KEhg/BKKxmBevkdcKZTcybjARDjU2IMu9nyeOhH1F+8xQQJ7dDRQ5TA7DYH6lKO9iEHD8YFr
tasks:
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0:
      branch: master
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rados/test.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rados/test.sh
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    clients:
      client.0:
      - rados/test.sh

    -1> 2013-12-11 14:01:50.054564 7fc76b7a1700  5 --OSD::tracker-- reqid: unknown.0.0:0, seq: 30236, time: 2013-12-11 14:01:50.054564, event: done, request: pg_query(281.1 epoch 902) v2
     0> 2013-12-11 14:01:50.055138 7fc766797700 -1 osd/PG.cc: In function 'void PG::start_flush(ObjectStore::Transaction*, std::list<Context*>*, std::list<Context*>*)' thread 7fc766797700 time 2013-12-11 1
4:01:50.049548
osd/PG.cc: 4517: FAILED assert(!flushed)

 ceph version 0.67.4-36-g9875c8b (9875c8b1992c59cc0c40901a44573676cdff2669)
 1: (PG::start_flush(ObjectStore::Transaction*, std::list<Context*, std::allocator<Context*> >*, std::list<Context*, std::allocator<Context*> >*)+0x1b4) [0x6f3e64]
 2: (PG::RecoveryState::ReplicaActive::ReplicaActive(boost::statechart::state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_
mode)0>::my_context)+0x11c) [0x70a6dc]
 3: (boost::statechart::state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<
PG::RecoveryState::Started> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&)+0x
5c) [0x743c0c]
 4: (boost::statechart::detail::safe_reaction_result boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::transit_impl<PG:
:RecoveryState::ReplicaActive, PG::RecoveryState::RecoveryMachine, boost::statechart::detail::no_transition_function>(boost::statechart::detail::no_transition_function const&)+0x91) [0x7458f1]
 5: (PG::RecoveryState::Stray::react(PG::MLogRec const&)+0x6df) [0x72d05f]
 6: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n
a, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x
166) [0x74f3f6]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::even
t_base const&)+0x5b) [0x7387cb]
 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::e
vent_base const&)+0x11) [0x738ae1]
 9: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x313) [0x6fa143]
 10: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2cb) [0x68fc1b]
 11: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cd7a2]
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b4f06]
 13: (ThreadPool::WorkThread::entry()+0x10) [0x8b6d10]
 14: (()+0x7e9a) [0x7fc77a09be9a]
 15: (clone()+0x6d) [0x7fc7781e7ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

logs are copied to :/home/ubuntu/bug


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #7019: osd/PG.cc: 4517: FAILED assert(!flushed)Duplicate12/16/2013

Actions
Actions #1

Updated by Sage Weil over 10 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF