Project

General

Profile

Bug #36108

Assertion due to ENOENT result on clonerange2

Added by VladimĂ­r Kincl about 1 year ago. Updated 11 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
Start date:
09/21/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

All my OSDs started crashing after addding a couple of new OSDs into the cluster. They just keep coming up and down. I currently run a large EC pool (k=28, m=4) for cephfs usage.

Stacktrace:
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5638405dda72]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x15fa) [0x56384048b57a]
3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x56384048cbe6]
4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x56384001e10f]
5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x6c) [0x56383ffa001c]
6: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x56383ffce552]
7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x2c) [0x56384003e41c]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x5638405e4c18]
9: (ThreadPool::WorkThread::entry()+0x10) [0x5638405e5db0]
10: (()+0x7494) [0x7fd825f06494]
11: (clone()+0x3f) [0x7fd824f8dacf]

Possible Error messages from the log:
-11> 2018-09-21 13:23:53.566927 7fd80eb1d700 -1 bluestore(/var/lib/ceph/osd/ceph-27) _txc_add_transaction error (2) No such file or directory not handled on operation 30 (op 1, counting from 0)
-10> 2018-09-21 13:23:53.566942 7fd80eb1d700 -1 bluestore(/var/lib/ceph/osd/ceph-27) ENOENT on clone suggests osd bug
-9> 2018-09-21 13:23:53.566944 7fd80eb1d700 0 bluestore(/var/lib/ceph/osd/ceph-27) transaction dump: {
"ops": [ {
"op_num": 0,
"op_name": "truncate",
"collection": "2.26s20_head",
"oid": "14#2:641e1aca:::10000762212.00000000:head#",
"offset": 4096
}, {
"op_num": 1,
"op_name": "clonerange2",
"collection": "2.26s20_head",
"src_oid": "14#2:641e1aca:::10000762212.00000000:head#212c4",
"dst_oid": "14#2:641e1aca:::10000762212.00000000:head#",
"src_offset": 0,
"len": 4096,
"dst_offset": 0
}, {
"op_num": 2,
"op_name": "remove",
"collection": "2.26s20_head",
"oid": "14#2:641e1aca:::10000762212.00000000:head#212c4"
}, {
"op_num": 3,
"op_name": "setattrs",
"collection": "2.26s20_head",
"oid": "14#2:641e1aca:::10000762212.00000000:head#",
"attr_lens": {
"_": 275,
"hinfo_key": 146,
"snapset": 35
}
}, {
"op_num": 4,
"op_name": "nop"
}, {
"op_num": 5,
"op_name": "op_omap_rmkeyrange",
"collection": "2.26s20_head",
"oid": "14#2:64000000::::head#",
"first": "0000000546.00000000000000135875",
"last": "4294967295.18446744073709551615"
}, {
"op_num": 6,
"op_name": "omap_setkeys",
"collection": "2.26s20_head",
"oid": "14#2:64000000::::head#",
"attr_lens": {
"_biginfo": 4107,
"_epoch": 4,
"_info": 1151,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
}
]
}

2018-09-21 13:23:53.571181 7fd80eb1d700 -1 /build/ceph-12.2.8/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7fd80eb1d700 time 2018-09-21 13:23:53.567044
/build/ceph-12.2.8/src/os/bluestore/BlueStore.cc: 9394: FAILED assert(0 == "unexpected error")

Related issues

Related to RADOS - Bug #36598: osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests osd bug" Can't reproduce
Related to RADOS - Bug #36739: ENOENT in collection_move_rename on EC backfill target Resolved 11/08/2018

History

#1 Updated by Igor Fedotov about 1 year ago

  • Subject changed from OSDs keep crashing to Assertion due to ENOENT result on clonerange2

#2 Updated by Neha Ojha about 1 year ago

  • Related to Bug #36598: osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests osd bug" added

#3 Updated by Sage Weil 11 months ago

  • Related to Bug #36739: ENOENT in collection_move_rename on EC backfill target added

#4 Updated by Sage Weil 11 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF