Actions
Bug #42177
closedosd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc)
% Done:
0%
Source:
Tags:
Backport:
nautilus, mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
First the object is deleted,
2019-10-03T15:06:49.550+0000 7fc32b7fd700 20 osd.3 pg_epoch: 1890 pg[10.d( v 1890'22609 (1871'19600,1890'22609] local-lis/les=1886/1887 n=1065 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1890'22605 crt=1890'22609 lcod 1890'22604 mlcod 1890'22604 active+remapped+backfilling rops=1 mbc={255={}}] do_op: op osd_op(client.88309.0:359596 10.d 10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head [delete] snapc 0=[] ondisk+write+known_i f_redirected e1890) v8 ... 2019-10-03T15:06:49.550+0000 7fc32b7fd700 15 bluestore(/var/lib/ceph/osd/ceph-3) _remove 10.d_head #10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head# onode 0x5596869d9180 txc 0x559673554d00
and the transcontext io_done, but blocked on an earlier op in the sequencer,
2019-10-03T15:06:49.550+0000 7fc32b7fd700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish_io 0x559673554d00 2019-10-03T15:06:49.550+0000 7fc32b7fd700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x559673554d00 io_done
then backfill does a listing,
2019-10-03T15:06:49.562+0000 7fc32b7fd700 20 bluestore(/var/lib/ceph/osd/ceph-3) _collection_list oid #10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head# end GHMAX
the scan_range() code finds the obc in the cache, with an empty version (exists=false, but we don't notice that)
2019-10-03T15:06:49.574+0000 7fc32b7fd700 20 osd.3 pg_epoch: 1890 pg[10.d( v 1890'22610 (1871'19600,1890'22610] local-lis/les=1886/1887 n=1064 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1890'22608 crt=1890'22610 lcod 1890'22607 mlcod 1890'22607 active+remapped+backfilling mbc={255={}}] 10:ba3ebe71:::benchmark_data_smithi205_13542_object173326:head 1816'10905 2019-10-03T15:06:49.574+0000 7fc32b7fd700 20 osd.3 pg_epoch: 1890 pg[10.d( v 1890'22610 (1871'19600,1890'22610] local-lis/les=1886/1887 n=1064 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1890'22608 crt=1890'22610 lcod 1890'22607 mlcod 1890'22607 active+remapped+backfilling mbc={255={}}] 10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head 0'0 2019-10-03T15:06:49.574+0000 7fc32b7fd700 15 bluestore(/var/lib/ceph/osd/ceph-3) getattr 10.d_head #10:ba407bb4:::benchmark_data_smithi205_13542_object186954:head# _ 2019-10-03T15:06:49.574+0000 7fc32b7fd700 10 bluestore(/var/lib/ceph/osd/ceph-3) getattr 10.d_head #10:ba407bb4:::benchmark_data_smithi205_13542_object186954:head# _ = 0 2019-10-03T15:06:49.574+0000 7fc32b7fd700 20 osd.3 pg_epoch: 1890 pg[10.d( v 1890'22610 (1871'19600,1890'22610] local-lis/les=1886/1887 n=1064 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1890'22608 crt=1890'22610 lcod 1890'22607 mlcod 1890'22607 active+remapped+backfilling mbc={255={}}] 10:ba407bb4:::benchmark_data_smithi205_13542_object186954:head 1827'11735
then later, when backfill finally gets to it,
-2292> 2019-10-03T15:06:52.638+0000 7fc32b7fd700 20 osd.3 pg_epoch: 1891 pg[10.d( v 1891'22961 (1872'19900,1891'22961] local-lis/les=1886/1887 n=713 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1891'22960 crt=1891'22961 lcod 1891'22959 mlco d 1891'22959 active+remapped+backfilling mbc={255={}}] peer shard 0 backfill BackfillInfo(MAX-MAX 0 objects) -2291> 2019-10-03T15:06:52.638+0000 7fc32b7fd700 10 osd.3 pg_epoch: 1891 pg[10.d( v 1891'22961 (1872'19900,1891'22961] local-lis/les=1886/1887 n=713 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1891'22960 crt=1891'22961 lcod 1891'22959 mlcod 1891'22959 active+remapped+backfilling mbc={255={}}] get_object_context: obc NOT found in cache: 10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head -2290> 2019-10-03T15:06:52.638+0000 7fc32b7fd700 15 bluestore(/var/lib/ceph/osd/ceph-3) getattr 10.d_head #10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head# _ -2289> 2019-10-03T15:06:52.638+0000 7fc32b7fd700 10 bluestore(/var/lib/ceph/osd/ceph-3) getattr 10.d_head #10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head# _ = -2 -2288> 2019-10-03T15:06:52.638+0000 7fc32b7fd700 10 osd.3 pg_epoch: 1891 pg[10.d( v 1891'22961 (1872'19900,1891'22961] local-lis/les=1886/1887 n=713 ec=1746/1746 lis/c=1886/1874 les/c/f=1887/1875/0 sis=1886) [0,3]/[3,11] backfill=[0] r=0 lpr=1886 pi=[1874,1886)/1 luod=1891'22960 crt=1891'22961 lcod 1891'22959 mlcod 1891'22959 active+remapped+backfilling mbc={255={}}] get_object_context: no obc for soid 10:ba3eee7a:::benchmark_data_smithi205_13542_object171119:head and !can_create ... -772> 2019-10-03T15:06:52.662+0000 7fc32b7fd700 -1 /build/ceph-15.0.0-5726-g55667ef/src/osd/PrimaryLogPG.cc: In function 'uint64_t PrimaryLogPG::recover_backfill(uint64_t, ThreadPool::TPHandle&, bool*)' thread 7fc32b7fd700 time 2019-10-03T15:06:52.644799+0000 /build/ceph-15.0.0-5726-g55667ef/src/osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc) ceph version 15.0.0-5726-g55667ef (55667ef848f8044c1c432effde953e61dc59eef4) octopus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x559664694f3a] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x559664695115] 3: (PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&, bool*)+0x51e7) [0x55966488cb97] 4: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0xed2) [0x55966488dbf2] 5: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x249) [0x559664706269] 6: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x559664972619] 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90c) [0x559664724b7c] 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x559664d6354c] 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x559664d666c0] 10: (()+0x76db) [0x7fc34cf406db] 11: (clone()+0x3f) [0x7fc34bce088f]
i just saw this twice in a row,
/a/sage-2019-10-03_14:03:55-rados:upgrade-wip-sage2-testing-2019-10-02-0937-distro-basic-smithi/4353979
/a/sage-2019-10-03_00:05:14-rados:upgrade-wip-sage2-testing-2019-10-02-0937-distro-basic-smithi/4353003
which is surprising because i haven't seen it before. :/
Updated by Sage Weil over 4 years ago
- Status changed from 12 to Fix Under Review
- Pull request ID set to 30715
Updated by Neha Ojha over 4 years ago
- Has duplicate Bug #42176: FAILED ceph_assert(obc) in PrimaryLogPG::recover_backfill() added
Updated by Sage Weil over 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus, mimic, luminous
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42197: nautilus: osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc) added
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42198: mimic: osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc) added
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42199: luminous: osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc) added
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions