Project

General

Profile

Bug #22330

ec: src/common/interval_map.h: 161: FAILED assert(len > 0)

Added by Patrick Donnelly about 1 year ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
EC Pools
Target version:
Start date:
12/06/2017
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
multimds
Component(RADOS):
EC plugins
Pull request ID:

Description

2017-12-05T11:23:19.900 INFO:tasks.ceph.osd.3.smithi086.stderr:/build/ceph-13.0.0-3732-g70d0667/src/common/interval_map.h: In function 'void interval_map<K, V, S>::insert(K, K, V&&) [with K = long unsigned int; V = ceph::buffer::list; S = bl_split_merge]' thread 7f8bd8140700 time 2017-12-05 11:23:19.922589
2017-12-05T11:23:19.900 INFO:tasks.ceph.osd.3.smithi086.stderr:/build/ceph-13.0.0-3732-g70d0667/src/common/interval_map.h: 161: FAILED assert(len > 0)
2017-12-05T11:23:19.903 INFO:tasks.ceph.osd.3.smithi086.stderr: ceph version 13.0.0-3732-g70d0667 (70d06678b3571264de00c11f4a08eae8375ff04c) mimic (dev)
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5574c549bbb2]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 2: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x10b4) [0x5574c52207e4]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 3: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x82) [0x5574c51f1262]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 4: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0xf39) [0x5574c51fb269]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 5: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x178) [0x5574c520a858]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5574c50f2bb0]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x531) [0x5574c50aad11]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x367) [0x5574c4efb017]
2017-12-05T11:23:19.904 INFO:tasks.ceph.osd.3.smithi086.stderr: 9: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x5a) [0x5574c5172a6a]
2017-12-05T11:23:19.905 INFO:tasks.ceph.osd.3.smithi086.stderr: 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xe7b) [0x5574c4f01aab]
2017-12-05T11:23:19.905 INFO:tasks.ceph.osd.3.smithi086.stderr: 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x7bb) [0x5574c549f43b]
2017-12-05T11:23:19.905 INFO:tasks.ceph.osd.3.smithi086.stderr: 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5574c54a1910]
2017-12-05T11:23:19.905 INFO:tasks.ceph.osd.3.smithi086.stderr: 13: (()+0x76ba) [0x7f8bf4a886ba]
2017-12-05T11:23:19.905 INFO:tasks.ceph.osd.3.smithi086.stderr: 14: (clone()+0x6d) [0x7f8bf3aff3dd]

From: /ceph/teuthology-archive/pdonnell-2017-12-05_06:50:02-multimds-wip-pdonnell-testing-20171205.044504-testing-basic-smithi/1931903/teuthology.log


Related issues

Related to RADOS - Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_off() + range.first.get_len())) Resolved 10/25/2017
Duplicated by RADOS - Bug #36271: src/common/interval_map.h: 161: FAILED ceph_assert(len > 0) Duplicate
Copied to RADOS - Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0) Resolved
Copied to RADOS - Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0) Resolved

History

#1 Updated by Patrick Donnelly about 1 year ago

Another: /ceph/teuthology-archive/pdonnell-2017-12-05_06:54:06-kcephfs-wip-pdonnell-testing-20171205.044504-testing-basic-smithi/1932525/teuthology.log

#2 Updated by Patrick Donnelly 7 months ago

  • Priority changed from Normal to Urgent
  • Target version set to v13.0.0

http://pulpito.ceph.com/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smithi/2464211

/ceph/teuthology-archive/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smithi/2464211/teuthology.log

and

http://pulpito.ceph.com/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smithi/2464361

/ceph/teuthology-archive/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smithi/2464361/teuthology.log

#3 Updated by Sage Weil 7 months ago

  • Status changed from New to Verified
  • Assignee set to Sage Weil

#4 Updated by Sage Weil 7 months ago

  • Status changed from Verified to Need More Info

need to capture some logs...

#5 Updated by Sage Weil 6 months ago

  • Assignee deleted (Sage Weil)

#6 Updated by Patrick Donnelly 4 months ago

/ceph/teuthology-archive/pdonnell-2018-08-02_13:06:29-multimds-wip-pdonnell-testing-20180802.044402-testing-basic-smithi/2852847/remote/smithi141/coredump/1533233406.10526.core

Coredump ^

strangely logs aren't getting collected

#7 Updated by Neha Ojha 3 months ago

Running the multimds:basic suite with --filter 'clusters/9-mds.yaml conf/{client.yaml mds.yaml mon.yaml osd.yaml} inline/yes.yaml mount/kclient/{mount.yaml overrides/{distro/random/{k-testing.yaml supported$/{rhel_latest.yaml}} ms-die-on-skipped.yaml}} objectstore-ec/bluestore-ec-root.yaml overrides/{basic/{frag_enable.yaml whitelist_health.yaml whitelist_wrongly_marked_down.yaml} fuse-default-perm-no.yaml} q_check_counter/check_counter.yaml tasks/cfuse_workunit_suites_fsx.yaml' N=20 times, reproduces this failure reliably.

http://pulpito.ceph.com/nojha-2018-09-17_15:49:25-multimds:basic-master-distro-basic-smithi/

#8 Updated by Neha Ojha 3 months ago

  • Status changed from Need More Info to Verified

#9 Updated by Patrick Donnelly 2 months ago

  • Duplicated by Bug #36271: src/common/interval_map.h: 161: FAILED ceph_assert(len > 0) added

#10 Updated by Patrick Donnelly 2 months ago

Latest instance with logs/cores: /ceph/teuthology-archive/pdonnell-2018-10-01_03:19:12-multimds-wip-pdonnell-testing-20181001.011252-distro-basic-smithi/3090388/teuthology.log

#11 Updated by Neha Ojha 2 months ago

  • Assignee set to Neha Ojha

#12 Updated by Neha Ojha 2 months ago

  • Related to Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_off() + range.first.get_len())) added

#13 Updated by Neha Ojha 2 months ago

  • Status changed from Verified to Need Review

#14 Updated by Neha Ojha 2 months ago

  • Status changed from Need Review to Pending Backport
  • Backport set to luminous,mimic

#15 Updated by Neha Ojha 2 months ago

Note that there is a common PR to be backported for this issue and https://tracker.ceph.com/issues/21931

#16 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0) added

#17 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0) added

#18 Updated by Nathan Cutler about 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF