Project

General

Profile

Actions

Bug #37671

closed

race between split and pg create

Added by Sage Weil over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-12-15 16:58:54.928 7fbb544ee700 10 bluestore(/var/lib/ceph/osd/ceph-4) queue_transactions ch 0x5646e2f30f00 1.3_head
2018-12-15 16:58:54.928 7fbb544ee700 20 bluestore(/var/lib/ceph/osd/ceph-4) _txc_create osr 0x5646e2f34360 = 0x5646e2e86300 seq 2
2018-12-15 16:58:54.928 7fbb544ee700 15 bluestore(/var/lib/ceph/osd/ceph-4) _create_collection 1.b_head bits 4
2018-12-15 16:58:54.928 7fbb544ee700 10 bluestore(/var/lib/ceph/osd/ceph-4) _create_collection 1.b_head bits 4 = 0
2018-12-15 16:58:54.928 7fbb544ee700 15 bluestore(/var/lib/ceph/osd/ceph-4) _split_collection 1.3_head to 1.b_head  bits 4
...
  -111> 2018-12-15 16:58:54.930 7fbb504e6700 10 bluestore(/var/lib/ceph/osd/ceph-4) queue_transactions ch 0x5646e32401e0 1.b_head
  -110> 2018-12-15 16:58:54.930 7fbb504e6700 20 bluestore(/var/lib/ceph/osd/ceph-4) _txc_create osr 0x5646e3053b00 = 0x5646e2e86900 seq 1
   -61> 2018-12-15 16:58:54.933 7fbb504e6700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-1726-g8dd0576/rpm/el7/BUILD/ceph-14.0.1-1726-g8dd0576/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7fbb504e6700 time 2018-12-15 16:58:54.931284
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-1726-g8dd0576/rpm/el7/BUILD/ceph-14.0.1-1726-g8dd0576/src/os/bluestore/BlueStore.cc: 10700: FAILED ceph_assert(!c)

 ceph version 14.0.1-1726-g8dd0576 (8dd05761256103fb07e8d432bb1cdcf92be3c854) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x5646d69f8dbe]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x5646d69f8f8c]
 3: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x2ad0) [0x5646d6fccaa0]
 4: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x54c) [0x5646d6fd0a7c]
 5: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x82) [0x5646d6ba7a42]
 6: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x58) [0x5646d6afc2d8]
 7: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202) [0x5646d6b55662]
 8: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x5646d6dbd3e0]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa0c) [0x5646d6b4a09c]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x5646d716e6f3]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5646d71792b0]
 12: (()+0x7e25) [0x7fbb7d08ae25]

/a/sage-2018-12-15_15:32:35-rados-wip-sage2-testing-2018-12-14-0711-distro-basic-smithi/3366560
Actions #1

Updated by Sage Weil over 5 years ago

1.b and 1.1b existed in the osd
1.3 was imported at an old epoch prior to it splitting into 1.b and 1.1b.

Actions #2

Updated by Sage Weil over 5 years ago

ah, it's a double-split, 1.3 -> 1.b -> 1.1b

2018-12-15T16:57:24.537 INFO:teuthology.orchestra.run.smithi007.stdout:Importing pgid 1.3
2018-12-15T16:57:24.537 INFO:teuthology.orchestra.run.smithi007.stdout:pool 1 pg_num 8 -> 18
2018-12-15T16:57:24.537 INFO:teuthology.orchestra.run.smithi007.stdout:pool 1 pg_num 18 -> 28

but the import check was checking children for each step.. which means it was looking at whether 1.3's split children in the 18 -> 28 transition included 1.b

Actions #3

Updated by Sage Weil over 5 years ago

  • Status changed from 12 to Fix Under Review
Actions #4

Updated by Sage Weil over 5 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF