Project

General

Profile

Actions

Bug #48540

open

_txc_add_transaction error (17) File exists not handled on operation

Added by Arthur S over 3 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

All 3 OSD's hosting the same PG went down and could not restart directly (after a while it was possible), but this had impact on the cluster since this pg was down. All OSDS had the same error message (Error 17). {
"ops": [ {
"op_num": 0,
"op_name": "op_coll_move_rename",
"old_collection": "19.3s2_head",
"old_oid": "2#19:c29f9b10:::100008335f5.00000000:head#",
"new_collection": "19.3s2_head",
"new_oid": "2#19:c29f9b10:::100008335f5.00000000:head#605e"
}, {
"op_num": 1,
"op_name": "create",
"collection": "19.3s2_head",
"oid": "2#19:c29f9b10:::100008335f5.00000000:head#"
}, {
"op_num": 2,
"op_name": "setattrs",
"collection": "19.3s2_head",
"oid": "2#19:c29f9b10:::100008335f5.00000000:head#",
"attr_lens": {
"_": 275,
"_layout": 30,
"_parent": 184,
"snapset": 35
}
}, {
"op_num": 3,
"op_name": "setattr",
"collection": "19.3s2_head",
"oid": "2#19:c29f9b10:::100008335f5.00000000:head#",
"name": "hinfo_key",
"length": 30
}
]
}

-1> 2020-12-09T20:48:39.965+0000 7fcb3a889700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.5/rpm/el8/BUILD/ceph-15.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7fcb3a889700 time 2020-12-09T20:48:39.961244+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.5/rpm/el8/BUILD/ceph-15.2.5/src/os/bluestore/BlueStore.cc: 12874: ceph_abort_msg("unexpected error")
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x559cd31514fe]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1507) [0x559cd3720d87]
3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x407) [0x559cd3722f47]
4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x559cd33d7308]
5: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0xd1d) [0x559cd3593a7d]
6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x306) [0x559cd35a9d76]
7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x559cd3403db2]
8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x559cd33acd9e]
9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x559cd3234109]
10: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x559cd348fbd8]
11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x559cd32515df]
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x559cd388a324]
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x559cd388cf84]
14: (()+0x82de) [0x7fcb601222de]
15: (clone()+0x43) [0x7fcb5ee59e83]
0> 2020-12-09T20:48:39.971+0000 7fcb3a889700 -1 ** Caught signal (Aborted) *
in thread 7fcb3a889700 thread_name:tp_osd_tp
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
1: (()+0x12dd0) [0x7fcb6012cdd0]
2: (gsignal()+0x10f) [0x7fcb5ed9570f]
3: (abort()+0x127) [0x7fcb5ed7fb25]
4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x559cd31515cf]
5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1507) [0x559cd3720d87]
6: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x407) [0x559cd3722f47]
7: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x559cd33d7308]
8: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0xd1d) [0x559cd3593a7d]
9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x306) [0x559cd35a9d76]
10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x559cd3403db2]
11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x559cd33acd9e]
12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x559cd3234109]
13: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x559cd348fbd8]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x559cd32515df]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x559cd388a324]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x559cd388cf84]
17: (()+0x82de) [0x7fcb601222de]
18: (clone()+0x43) [0x7fcb5ee59e83]

{
"crash_id": "2020-12-09T20:48:39.971606Z_c7487445-d5b3-4cef-973e-83d4b481dd6a",
"timestamp": "2020-12-09T20:48:39.971606Z",
"process_name": "ceph-osd",
"entity_name": "osd.68",
"ceph_version": "15.2.5",
"utsname_hostname": "ceph-node-3",
"utsname_sysname": "Linux",
"utsname_release": "4.18.0-193.14.2.el8_2.x86_64",
"utsname_version": "#1 SMP Sun Jul 26 03:54:29 UTC 2020",
"utsname_machine": "x86_64",
"os_name": "CentOS Linux",
"os_id": "centos",
"os_version_id": "8",
"os_version": "8 (Core)",
"assert_condition": "abort",
"assert_func": "void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)",
"assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.5/rpm/el8/BUILD/ceph-15.2.5/src/os/bluestore/BlueStore.cc",
"assert_line": 12874,
"assert_thread_name": "tp_osd_tp",
"assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.5/rpm/el8/BUILD/ceph-15.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7fcb3a889700 time 2020-12-09T20:48:39.961244+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.5/rpm/el8/BUILD/ceph-15.2.5/src/os/bluestore/BlueStore.cc: 12874: ceph_abort_msg(\"unexpected error\")\n",
"backtrace": [
"(()+0x12dd0) [0x7fcb6012cdd0]",
"(gsignal()+0x10f) [0x7fcb5ed9570f]",
"(abort()+0x127) [0x7fcb5ed7fb25]",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x559cd31515cf]",
"(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1507) [0x559cd3720d87]",
"(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x407) [0x559cd3722f47]",
"(non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x58) [0x559cd33d7308]",
"(ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0xd1d) [0x559cd3593a7d]",
"(ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x306) [0x559cd35a9d76]",
"(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x52) [0x559cd3403db2]",
"(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5de) [0x559cd33acd9e]",
"(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x559cd3234109]",
"(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x559cd348fbd8]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x559cd32515df]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x559cd388a324]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x559cd388cf84]",
"(()+0x82de) [0x7fcb601222de]",
"(clone()+0x43) [0x7fcb5ee59e83]"
]
}

ceph pg map 19.3:
osdmap e36104 pg 19.3 (19.3) -> up [12,27,68] acting [12,27,68]


Files

log.zip (146 KB) log.zip Arthur S, 12/10/2020 12:46 PM
Actions #1

Updated by Neha Ojha over 3 years ago

  • Subject changed from 3 OSD's crashed (hosting same PG 19.3). _txc_add_transaction error (17) File exists not handled on operation to _txc_add_transaction error (17) File exists not handled on operation
  • Status changed from New to Need More Info

Can you provide osd logs with debug_osd=20? It is very hard to tell what's causing the crash without enough logs.

Actions #2

Updated by Arthur S over 3 years ago

I have not seen this behavior anymore, so I would suggest to archive this issue for reference until it happens again (and then change logging levels).

Actions

Also available in: Atom PDF