Project

General

Profile

Bug #41006

cephfs-data-scan scan_links FAILED ceph_assert(p->second >= before+len)

Added by Dan van der Ster 3 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
crash
Pull request ID:
Crash signature:

Description

Running cephfs-data-scan scan_links on a test 14.2.2 cluster I get this assertion:

   -12> 2019-07-30 10:48:51.575 7f8bd1b31d80  1 librados: init done
   -11> 2019-07-30 10:48:51.575 7f8bd1b31d80  4 datascan.init: resolving metadata pool 2
   -10> 2019-07-30 10:48:51.576 7f8bbcae4700  4 mgrc handle_mgr_map Got map version 65
    -9> 2019-07-30 10:48:51.576 7f8bbcae4700  4 mgrc handle_mgr_map Active mgr is now [v2:137.138.62.86:6818/4114,v1:137.138.62.86:6819/4114]
    -8> 2019-07-30 10:48:51.576 7f8bbcae4700  4 mgrc reconnect Starting new session with [v2:137.138.62.86:6818/4114,v1:137.138.62.86:6819/4114]
    -7> 2019-07-30 10:48:51.576 7f8bd1b31d80  4 datascan.init: found metadata pool 'cephfs_metadata'
    -6> 2019-07-30 10:48:51.576 7f8bd1b31d80  4 datascan.main: resolving metadata pool 2
    -5> 2019-07-30 10:48:51.576 7f8bc2af0700 10 monclient: get_auth_request con 0x557c829f4400 auth_method 0
    -4> 2019-07-30 10:48:51.577 7f8bc1aee700 10 monclient: get_auth_request con 0x557c829b3800 auth_method 0
    -3> 2019-07-30 10:48:51.584 7f8bc2af0700 10 monclient: get_auth_request con 0x557c829b3400 auth_method 0
    -2> 2019-07-30 10:48:52.743 7f8bd1b31d80 -1 mds.0.inotable: erasing 0x10000002d2a to 0x10000002f08
    -1> 2019-07-30 10:48:52.744 7f8bd1b31d80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/include/interval_set.h: In function 'void interval_set<T, Map>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; Map = std::map<inodeno_t, inodeno_t, std::less<inodeno_t>, std::allocator<std::pair<const inodeno_t, inodeno_t> > >]' thread 7f8bd1b31d80 time 2019-07-30 10:48:52.744443
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/include/interval_set.h: 527: FAILED ceph_assert(p->second >= before+len)

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f8bc7f24046]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f8bc7f24214]
 3: (()+0x1ac8c3) [0x557c808b38c3]
 4: (InoTable::force_consume_to(inodeno_t)+0x301) [0x557c808aeb11]
 5: (DataScan::scan_links()+0x2ac3) [0x557c80840693]
 6: (DataScan::main(std::vector<char const*, std::allocator<char const*> > const&)+0x121b) [0x557c8082673b]
 7: (main()+0x211) [0x557c80825201]
 8: (__libc_start_main()+0xf5) [0x7f8bc4df5495]
 9: (()+0x1259b0) [0x557c8082c9b0]

Related issues

Copied to fs - Backport #41476: mimic: cephfs-data-scan scan_links FAILED ceph_assert(p->second >= before+len) Rejected
Copied to fs - Backport #41477: nautilus: cephfs-data-scan scan_links FAILED ceph_assert(p->second >= before+len) Resolved

History

#1 Updated by Zheng Yan 3 months ago

looks like discontiguous free inode number can trigger the crash

#2 Updated by Patrick Donnelly 3 months ago

  • Assignee set to Zheng Yan
  • Target version set to v15.0.0
  • Start date deleted (07/30/2019)
  • Source set to Community (user)
  • Backport set to nautilus,mimic
  • Component(FS) tools added
  • Labels (FS) crash added

#3 Updated by Zheng Yan 3 months ago

  • Status changed from New to Need Review
  • Pull request ID set to 29411

#4 Updated by Patrick Donnelly 2 months ago

  • Status changed from Need Review to Pending Backport

#5 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #41476: mimic: cephfs-data-scan scan_links FAILED ceph_assert(p->second >= before+len) added

#6 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #41477: nautilus: cephfs-data-scan scan_links FAILED ceph_assert(p->second >= before+len) added

#7 Updated by Nathan Cutler 13 days ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF