Project

General

Profile

Bug #42175

_txc_add_transaction error (2) No such file or directory not handled on operation 15

Added by Neha Ojha 6 months ago. Updated 2 months ago.

Status:
New
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature:

Description

2019-10-02T22:47:02.700 INFO:tasks.rados.rados.0.smithi191.stdout:1328: snap_create
2019-10-02T22:47:02.708 INFO:tasks.ceph.osd.2.smithi191.stderr:2019-10-02T22:47:02.674+0000 7ffbdadf7700 -1 received  signal: Hangup from /usr/bin/python /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 2  (PID: 14830) UID: 0
2019-10-02T22:47:02.719 INFO:tasks.rados.rados.0.smithi191.stdout:1326:  expect (ObjNum 475 snap 107 seq_num 475)
2019-10-02T22:47:02.735 INFO:tasks.ceph.osd.1.smithi191.stderr:2019-10-02T22:47:02.702+0000 7f85d5e98700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _txc_add_transaction error (2) No such file or directory not handled on operation 15 (op 0, counting from 0)
2019-10-02T22:47:02.735 INFO:tasks.ceph.osd.1.smithi191.stderr:2019-10-02T22:47:02.702+0000 7f85d5e98700 -1 bluestore(/var/lib/ceph/osd/ceph-1) unexpected error code
2019-10-02T22:47:02.739 INFO:tasks.ceph.osd.1.smithi191.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-5729-g804458b/rpm/el7/BUILD/ceph-15.0.0-5729-g804458b/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f85d5e98700 time 2019-10-02T22:47:02.703425+0000
2019-10-02T22:47:02.739 INFO:tasks.ceph.osd.1.smithi191.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-5729-g804458b/rpm/el7/BUILD/ceph-15.0.0-5729-g804458b/src/os/bluestore/BlueStore.cc: 12589: ceph_abort_msg("unexpected error")
2019-10-02T22:47:02.739 INFO:tasks.ceph.osd.1.smithi191.stderr: ceph version 15.0.0-5729-g804458b (804458bf51d672c58fe3cb0cb33d91c185ac73e9) octopus (dev)
2019-10-02T22:47:02.739 INFO:tasks.ceph.osd.1.smithi191.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xdd) [0x55b8e8529d9a]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0xcb0) [0x55b8e8ab0cb0]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x55b8e8accfe0]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x55b8e8794614]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xc38) [0x55b8e891cb48]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x257) [0x55b8e892c077]
2019-10-02T22:47:02.740 INFO:tasks.ceph.osd.1.smithi191.stderr: 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) [0x55b8e87c92ba]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5cb) [0x55b8e876e63b]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2f9) [0x55b8e860ae09]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 10: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x69) [0x55b8e8851cd9]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x143a) [0x55b8e862741a]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55b8e8bfb566]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b8e8bfe0b0]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 14: (()+0x7ea5) [0x7f85fa919ea5]
2019-10-02T22:47:02.741 INFO:tasks.ceph.osd.1.smithi191.stderr: 15: (clone()+0x6d) [0x7f85f97dd8cd]

Both jobs went dead, so no logs.

rados:thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/fastclose.yaml msgr/async-v1only.yaml objectstore/bluestore-low-osd-mem-target.yaml rados.yaml supported-random-distro$/{rhel_7.yaml} thrashers/morepggrow.yaml thrashosds-health.yaml workloads/snaps-few-objects.yaml}

/a/nojha-2019-10-02_21:21:43-rados:thrash-master-distro-basic-smithi/4352410/

rados:thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-async-partial-recovery.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/osd-delay.yaml msgr/async.yaml objectstore/bluestore-low-osd-mem-target.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/mapgap.yaml thrashosds-health.yaml workloads/cache-pool-snaps-readproxy.yaml}

/a/nojha-2019-10-02_21:21:43-rados:thrash-master-distro-basic-smithi/4352415/

Note that it appears with both more-active-recovery.yaml and more-async-partial-recovery.yaml(newly introduced).

We need logs to verify if this is a dup of https://tracker.ceph.com/issues/40522 or https://tracker.ceph.com/issues/37969.


Related issues

Related to RADOS - Bug #37969: ENOENT on setattrs Can't reproduce 01/18/2019
Related to RADOS - Bug #40522: on_local_recover doesn't touch? New

History

#1 Updated by David Zafman 5 months ago

  • Assignee set to Neha Ojha
  • Priority changed from High to Urgent
  • Target version set to v12.2.13
  • Affected Versions v12.2.13, v14.2.4 added
  • ceph-qa-suite rados added
  • Component(RADOS) OSD added

Seen in luminous for final point release:

http://pulpito.ceph.com/yuriw-2019-11-08_02:53:57-rados-wip-yuri8-testing-2019-11-07-2054-luminous-distro-basic-smithi/4481648/

The only pull request being tested is test code only related to fix "mon addr" python error. https://github.com/ceph/ceph/pull/31433

Related to rollforward backport https://github.com/ceph/ceph/pull/31036 ?

   -19> 2019-11-08 10:44:35.869011 7f05ea7d0700 20 osd.2 31 should_share_map osd.0 172.21.15.134:6802/9784 31
   -18> 2019-11-08 10:44:35.869021 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'681 (0'0,31'681] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 crt=31'681 lcod 31'680 active mbc={}] _handle_message: 0x55687bd1b480
   -17> 2019-11-08 10:44:35.869045 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'681 (0'0,31'681] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 crt=31'681 lcod 31'680 active mbc={}] do_repop 6:d023cdf0:::benchmark_data_smithi134_18834_object40:head v 31'682 (transaction) 179
   -16> 2019-11-08 10:44:35.869071 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'681 (0'0,31'681] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 crt=31'681 lcod 31'680 active mbc={}] append_log log((0'0,31'681], crt=31'681) [31'682 (31'527) modify   6:d023cdf0:::benchmark_data_smithi134_18834_object40:head by client.4887.0:1 2019-11-08 10:44:35.862706 0]
   -15> 2019-11-08 10:44:35.869092 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'682 (0'0,31'682] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 lua=31'681 crt=31'681 lcod 31'680 active mbc={}] add_log_entry 31'682 (31'527) modify   6:d023cdf0:::benchmark_data_smithi134_18834_object40:head by client.4887.0:1 2019-11-08 10:44:35.862706 0
   -14> 2019-11-08 10:44:35.869106 7f05ea7d0700 20 osd.2 pg_epoch: 31 pg[6.3( v 31'682 (0'0,31'682] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 lua=31'681 crt=31'682 lcod 31'680 active mbc={}] rollforward: entry=31'682 (31'527) modify   6:d023cdf0:::benchmark_data_smithi134_18834_object40:head by client.4887.0:1 2019-11-08 10:44:35.862706 0
   -13> 2019-11-08 10:44:35.869119 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'682 (0'0,31'682] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 lua=31'681 crt=31'682 lcod 31'680 active mbc={}] append_log approx pg log length =  682
   -12> 2019-11-08 10:44:35.869128 7f05ea7d0700 10 osd.2 pg_epoch: 31 pg[6.3( v 31'682 (0'0,31'682] local-lis/les=23/24 n=86 ec=23/23 lis/c 23/23 les/c/f 24/24/0 23/23/23) [0,2] r=1 lpr=24 luod=0'0 lua=31'681 crt=31'682 lcod 31'680 active mbc={}] append_log transaction_applied = 1
   -11> 2019-11-08 10:44:35.869137 7f05ea7d0700 10 trim proposed trim_to = 0'0
   -10> 2019-11-08 10:44:35.869146 7f05ea7d0700  5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 31'682, trimmed: , trimmed_dups: , clear_divergent_priors: 0
    -9> 2019-11-08 10:44:35.869596 7f05ea7d0700 10 bluestore(/var/lib/ceph/osd/ceph-2) queue_transactions existing 0x55687bd47d40 osr(6.3 0x55687bc511c0)
    -8> 2019-11-08 10:44:35.869607 7f05ea7d0700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_create osr 0x55687bd47d40 = 0x55687d253500 seq 685
    -7> 2019-11-08 10:44:35.869620 7f05ea7d0700 15 bluestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys 6.3_head #6:c0000000::::head#
    -6> 2019-11-08 10:44:35.869629 7f05ea7d0700 10 bluestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys 6.3_head #6:c0000000::::head# = 0
    -5> 2019-11-08 10:44:35.869635 7f05ea7d0700 10 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction op 12 got ENOENT on #6:d023cdf0:::benchmark_data_smithi134_18834_object40:head#
    -4> 2019-11-08 10:44:35.869638 7f05ea7d0700 10 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction op 15 got ENOENT on #6:d023cdf0:::benchmark_data_smithi134_18834_object40:head#
    -3> 2019-11-08 10:44:35.869640 7f05ea7d0700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction error (2) No such file or directory not handled on operation 15 (op 1, counting from 0)
    -2> 2019-11-08 10:44:35.869648 7f05ea7d0700 -1 bluestore(/var/lib/ceph/osd/ceph-2) unexpected error code
    -1> 2019-11-08 10:44:35.869649 7f05ea7d0700  0  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "truncate",
            "collection": "6.3_head",
            "oid": "#6:d023cdf0:::benchmark_data_smithi134_18834_object40:head#",
            "offset": 0
        },
        {
            "op_num": 1,
            "op_name": "setattrs",
            "collection": "6.3_head",
            "oid": "#6:d023cdf0:::benchmark_data_smithi134_18834_object40:head#",
            "attr_lens": {
                "_": 294,
                "snapset": 43
            }
        }
    ]
}

     0> 2019-11-08 10:44:35.876259 7f05ea7d0700 -1 /build/ceph-12.2.12-698-g5eb0e61/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f05ea7d0700 time 2019-11-08 10:44:35.869678
/build/ceph-12.2.12-698-g5eb0e61/src/os/bluestore/BlueStore.cc: 9931: FAILED assert(0 == "unexpected error")

 ceph version 12.2.12-698-g5eb0e61 (5eb0e61eb7de5279d8fd33ca6910149ed3cb4e68) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x55686f7b51f5]
 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xefd) [0x55686f65f70d]
 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x4f8) [0x55686f6619f8]
 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x68) [0x55686f3860f8]
 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa4b) [0x55686f4baaab]
 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x297) [0x55686f4bee97]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55686f3c44c7]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x66b) [0x55686f32a09b]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x39c) [0x55686f19a2ac]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55686f44d64a]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1c16) [0x55686f1cc9e6]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5ca) [0x55686f7b9c7a]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55686f7bd2c0]

#2 Updated by David Zafman 5 months ago

#3 Updated by David Zafman 5 months ago

  • Related to Bug #40522: on_local_recover doesn't touch? added

#4 Updated by Sage Weil 2 months ago

  • Priority changed from Urgent to High

Also available in: Atom PDF