Project

General

Profile

Actions

Bug #22550

open

mds: FAILED assert(probe->known_size[p->oid] <= shouldbe) when mds start

Added by jianxiong shao over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I stop the mds while coping files to the cluster, then I try to start mds later, I encounter a failed assertion.

Here is a snippet of the log:

state change up:reconnect --> up:rejoin

2017-08-03 11:53:43.634941 7f8006e85700  1 mds.0.396 rejoin_start

2017-08-03 11:53:43.635003 7f8006e85700  1 mds.0.396 rejoin_joint_start

2017-08-03 11:53:43.635435 7f8006e85700  1 mds.0.396 rejoin_done

2017-08-03 11:53:44.690199 7f8006e85700  1 mds.0.396 handle_mds_map i am now mds.0.396

2017-08-03 11:53:44.690206 7f8006e85700  1 mds.0.396 handle_mds_map state change up:rejoin --> up:active

2017-08-03 11:53:44.690216 7f8006e85700  1 mds.0.396 recovery_done -- successful recovery!

2017-08-03 11:53:44.690335 7f8006e85700  1 mds.0.396 active_start

2017-08-03 11:53:44.690494 7f8006e85700 10 mds.0.filer probe bwd 100000007e6 starting from 68 size is 68

2017-08-03 11:53:44.690500 7f8006e85700 10 mds.0.filer _probe 100000007e6 0~68

2017-08-03 11:53:44.690511 7f8006e85700 10 mds.0.filer _probe  probing 100000007e6.00000000

2017-08-03 11:53:44.690696 7f8006e85700 10 mds.0.filer probe bwd 10000000415 starting from 505817071616 size is 536870912000

2017-08-03 11:53:44.690702 7f8006e85700 10 mds.0.filer _probe 10000000415 505816285184~786432

2017-08-03 11:53:44.690706 7f8006e85700 10 mds.0.filer _probe  probing 10000000415.0001d714

2017-08-03 11:53:44.690737 7f8006e85700 10 mds.0.filer probe bwd 10000000414 starting from 511094374400 size is 536870912000

2017-08-03 11:53:44.690741 7f8006e85700 10 mds.0.filer _probe 10000000414 511092719616~1654784

2017-08-03 11:53:44.690744 7f8006e85700 10 mds.0.filer _probe  probing 10000000414.0001dbfe

2017-08-03 11:53:44.690771 7f8006e85700 10 mds.0.filer probe bwd 10000000419 starting from 507094802432 size is 536870912000

2017-08-03 11:53:44.690775 7f8006e85700 10 mds.0.filer _probe 10000000419 507091353600~3448832

2017-08-03 11:53:44.690780 7f8006e85700 10 mds.0.filer _probe  probing 10000000419.0001d844

2017-08-03 11:53:44.690872 7f8006e85700 10 mds.0.filer probe bwd 100000007f4 starting from 530814210048 size is 536870912000

2017-08-03 11:53:44.690878 7f8006e85700 10 mds.0.filer _probe 100000007f4 530810142720~4067328

2017-08-03 11:53:44.690882 7f8006e85700 10 mds.0.filer _probe  probing 100000007f4.0001ee5b

2017-08-03 11:53:44.691073 7f8006e85700 10 mds.0.filer probe bwd 100000007f1 starting from 528468410368 size is 536870912000

2017-08-03 11:53:44.691077 7f8006e85700 10 mds.0.filer _probe 100000007f1 528465526784~2883584

2017-08-03 11:53:44.691081 7f8006e85700 10 mds.0.filer _probe  probing 100000007f1.0001ec2c

2017-08-03 11:53:44.691193 7f800257b700 10 mds.0.filer _probed 10000000415 object 10000000415.0001d714 has size 4194304 mtime 2017-07-31 23:58:59.0.808972s

2017-08-03 11:53:44.691230 7f800257b700 10 mds.0.filer _probed  10000000415 object 10000000415.0001d714 should be 786432, actual is 4194304 probing_off is 505816285184 size is 536870912000

2017-08-03 11:53:44.691454 7f8006e85700 10 mds.0.filer probe bwd 100000007f5 starting from 528604708864 size is 536870912000

2017-08-03 11:53:44.691460 7f8006e85700 10 mds.0.filer _probe 100000007f5 528603938816~770048

2017-08-03 11:53:44.691464 7f8006e85700 10 mds.0.filer _probe  probing 100000007f5.0001ec4d

2017-08-03 11:53:44.691641 7f8006e85700 10 mds.0.filer probe bwd 100000007f8 starting from 528404148224 size is 536870912000

2017-08-03 11:53:44.691646 7f8006e85700 10 mds.0.filer _probe 100000007f8 528402612224~1536000

2017-08-03 11:53:44.691650 7f8006e85700 10 mds.0.filer _probe  probing 100000007f8.0001ec1d

2017-08-03 11:53:44.692121 7f8006e85700 10 mds.0.filer probe bwd 100000007f3 starting from 529241079808 size is 536870912000

2017-08-03 11:53:44.692126 7f8006e85700 10 mds.0.filer _probe 100000007f3 529237278720~3801088

2017-08-03 11:53:44.692131 7f8006e85700 10 mds.0.filer _probe  probing 100000007f3.0001ece4

2017-08-03 11:53:44.692340 7f8006e85700 10 mds.0.filer probe bwd 100000007f2 starting from 532010594304 size is 536870912000

2017-08-03 11:53:44.692346 7f8006e85700 10 mds.0.filer _probe 100000007f2 532009713664~880640

2017-08-03 11:53:44.692349 7f8006e85700 10 mds.0.filer _probe  probing 100000007f2.0001ef79

2017-08-03 11:53:44.692556 7f8006e85700 10 mds.0.filer probe bwd 100000007f6 starting from 528478908416 size is 536870912000

2017-08-03 11:53:44.692560 7f8006e85700 10 mds.0.filer _probe 100000007f6 528478109696~798720

2017-08-03 11:53:44.692564 7f8006e85700 10 mds.0.filer _probe  probing 100000007f6.0001ec2f

2017-08-03 11:53:44.692750 7f8006e85700 10 mds.0.filer probe bwd 100000007f7 starting from 536870912000 size is 536870912000

2017-08-03 11:53:44.692797 7f8006e85700 10 mds.0.filer _probe 100000007f7 536866717696~4194304

2017-08-03 11:53:44.692802 7f8006e85700 10 mds.0.filer _probe  probing 100000007f7.0001f3ff

2017-08-03 11:53:44.693026 7f800257b700 -1 osdc/Filer.cc: In function 'bool Filer::_probed(Filer::Probe*, const object_t&, uint64_t, ceph::real_time, Filer::Probe::unique_lock&)' thread 7f800257b700 time 2017-08-03 11:53:44.691242
osdc/Filer.cc: 236: FAILED assert(probe->known_size[p->oid] <= shouldbe)

ceph version 10.2.5 (94f3330e6c598dce8ff312e155cd5b08d6cb54a3)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x88) [0x559e6a363b08]

2: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::unique_lock<std::mutex>&)+0xee4) [0x559e6a1cc924]

3: (Filer::C_Probe::finish(int)+0x6c) [0x559e6a1cf00c]

4: (Context::complete(int)+0x9) [0x559e69f47a09]

5: (Finisher::finisher_thread_entry()+0x45e) [0x559e6a28470e]

6: (()+0x8182) [0x7f800d08d182]

7: (clone()+0x6d) [0x7f800b5d647d]

It seems caused by the wrong file max_size which is always the multiples of file stipe unit(here is 4M),

Actions #1

Updated by Abhishek Lekshmanan over 6 years ago

  • Project changed from Stable releases to CephFS
Actions #2

Updated by Patrick Donnelly over 6 years ago

  • Tracker changed from Tasks to Bug
  • Subject changed from FAILED assert(probe->known_size[p->oid] <= shouldbe) when mds start to mds: FAILED assert(probe->known_size[p->oid] <= shouldbe) when mds start
  • Description updated (diff)
  • Regression set to No
  • Severity set to 3 - minor
  • Release set to jewel
  • Component(FS) MDS added
Actions

Also available in: Atom PDF