Project

General

Profile

Actions

Bug #38049

closed

random osds failing in thread_name:bstore_kv_final

Added by Lawrence Smith over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since upgrading to mimic 13.2.4 single osds are randomly failing with the following assert:

Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -67> 2019-01-25 15:57:08.414 7eff6730a700  1 get compressor snappy = 0x7effa0dd9240
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -66> 2019-01-25 15:57:08.414 7eff6730a700 10 bluestore(/var/lib/ceph/osd/ceph-225) read 6.141s3_head 3#6:82b4e9d2:::10003f4713e.00000003:head# 0x
0~1000000 = 524288
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -65> 2019-01-25 15:57:08.414 7eff6730a700 10 bluestore(/var/lib/ceph/osd/ceph-225) stat 6.141s3_head 3#6:82b4e9d2:::10003f4713e.00000003:head#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -64> 2019-01-25 15:57:08.414 7eff6730a700 10 bluestore(/var/lib/ceph/osd/ceph-225) stat 6.141s3_head 3#6:82b4e9e0:::1000cfcc030.0000000e:head#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -63> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore(/var/lib/ceph/osd/ceph-225).collection(6.141s3_head 0x7eff8b6fee00) get_onode oid 3#6:82b4
e9e0:::1000cfcc030.0000000e:head# key 0x83800000000000000682b4e9e0213130'00cfcc030.0000000e!='0xfffffffffffffffeffffffffffffffff'o'
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -62> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore(/var/lib/ceph/osd/ceph-225).collection(6.141s3_head 0x7eff8b6fee00)  r 0 v.len 438
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -61> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore.blob(0x7eff91796770) get_ref 0x0~80000 Blob(0x7eff91796770 blob([0x525f8f10000~70000] clen
 0x80000 -> 0x6da33 compressed) use_tracker(0x0 0x0) SharedBlob(0x7eff917967e0 sbid 0x0))
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -60> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore.blob(0x7eff91796770) get_ref init 0x80000, 80000
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -59> 2019-01-25 15:57:08.414 7eff6730a700 15 bluestore(/var/lib/ceph/osd/ceph-225) getattrs 6.141s3_head 3#6:82b4e9e0:::1000cfcc030.0000000e:head
#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -58> 2019-01-25 15:57:08.414 7eff6730a700 10 bluestore(/var/lib/ceph/osd/ceph-225) getattrs 6.141s3_head 3#6:82b4e9e0:::1000cfcc030.0000000e:head
# = 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -57> 2019-01-25 15:57:08.414 7eff6730a700 15 bluestore(/var/lib/ceph/osd/ceph-225) read 6.141s3_head 3#6:82b4e9e0:::1000cfcc030.0000000e:head# 0x
0~1000000
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -56> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read 0x0~1000000 size 0x80000 (524288)
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -55> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff91796770 blob([0x525f8f10000~70000] 
clen 0x80000 -> 0x6da33 compressed) use_tracker(0x80000 0x80000) SharedBlob(0x7eff917967e0 sbid 0x0)) need 0x0~80000 cache has 0x[]
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -54> 2019-01-25 15:57:08.414 7eff6730a700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff91796770 blob([0x525f8f10000~70000] 
clen 0x80000 -> 0x6da33 compressed) use_tracker(0x80000 0x80000) SharedBlob(0x7eff917967e0 sbid 0x0)) need {<0x0, 0x80000> : [0x0:0~80000]}
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -53> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff8e8b60e0 blob([0x164bf7e0000~50000] 
csum crc32c/0x1000) use_tracker(0x5*0x10000 0x[10000,10000,10000,10000,10000]) SharedBlob(0x7eff8e8b6150 sbid 0x0)) need 0x{<0x0, 0x50000> : [0x0:0~50000]}
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -52> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff8e8b61c0 blob([!~50000,0x164bf830000
~10000] csum+has_unused crc32c/0x1000 unused=0x1fff) use_tracker(0x6*0x10000 0x[0,0,0,0,0,c000]) SharedBlob(0x7eff8e8b6380 sbid 0x0)) need 0x{<0x50000, 0xc000> : [0x50000:50000~c000]}
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -51> 2019-01-25 15:57:08.434 7eff65b07700 10 bluestore(/var/lib/ceph/osd/ceph-225) read 6.36fs6_head 6#6:f6ef61c0:::1000ee9b1a4.0000001b:head# 0x
0~1000000 = 376832
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -50> 2019-01-25 15:57:08.434 7eff65b07700 10 bluestore(/var/lib/ceph/osd/ceph-225) stat 6.36fs6_head 6#6:f6ef61c0:::1000ee9b1a4.0000001b:head#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -49> 2019-01-25 15:57:08.434 7eff65b07700 10 bluestore(/var/lib/ceph/osd/ceph-225) stat 6.36fs6_head 6#6:f6ef61ec:::10008fb62c4.00000004:head#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -48> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225).collection(6.36fs6_head 0x7eff8b716a00) get_onode oid 6#6:f6ef
61ec:::10008fb62c4.00000004:head# key 0x868000000000000006f6ef61ec213130'008fb62c4.00000004!='0xfffffffffffffffeffffffffffffffff'o'
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -47> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225).collection(6.36fs6_head 0x7eff8b716a00)  r 0 v.len 438
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -46> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore.blob(0x7eff8e8b65b0) get_ref 0x0~80000 Blob(0x7eff8e8b65b0 blob([0x164bf9a0000~70000] clen
 0x80000 -> 0x69b43 compressed) use_tracker(0x0 0x0) SharedBlob(0x7eff8e8b6620 sbid 0x0))
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -45> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore.blob(0x7eff8e8b65b0) get_ref init 0x80000, 80000
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -44> 2019-01-25 15:57:08.434 7eff65b07700 15 bluestore(/var/lib/ceph/osd/ceph-225) getattrs 6.36fs6_head 6#6:f6ef61ec:::10008fb62c4.00000004:head
#
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -43> 2019-01-25 15:57:08.434 7eff65b07700 10 bluestore(/var/lib/ceph/osd/ceph-225) getattrs 6.36fs6_head 6#6:f6ef61ec:::10008fb62c4.00000004:head
# = 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -42> 2019-01-25 15:57:08.434 7eff65b07700 15 bluestore(/var/lib/ceph/osd/ceph-225) read 6.36fs6_head 6#6:f6ef61ec:::10008fb62c4.00000004:head# 0x
0~1000000
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -41> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read 0x0~1000000 size 0x80000 (524288)
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -40> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff8e8b65b0 blob([0x164bf9a0000~70000] 
clen 0x80000 -> 0x69b43 compressed) use_tracker(0x80000 0x80000) SharedBlob(0x7eff8e8b6620 sbid 0x0)) need 0x0~80000 cache has 0x[]
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -39> 2019-01-25 15:57:08.434 7eff65b07700 20 bluestore(/var/lib/ceph/osd/ceph-225) _do_read  blob Blob(0x7eff8e8b65b0 blob([0x164bf9a0000~70000] 
clen 0x80000 -> 0x69b43 compressed) use_tracker(0x80000 0x80000) SharedBlob(0x7eff8e8b6620 sbid 0x0)) need {<0x0, 0x80000> : [0x0:0~80000]}
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -38> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.MempoolThread(0x7effa55d1b08) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_us
ed: 132811826 meta_alloc: 0 meta_used: 3696326 data_alloc: 0 data_used: 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -37> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa730b440) _trim onodes 0 / 0 buffers 0 / 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -36> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57131c0) _trim onodes 2 / 0 buffers 0 / 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -35> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57131c0) _trim considering 0x7efe17fcbb00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -34> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.sharedblob(0x7eff91796460) put 0x7eff91796460 removing self from set 0x7eff8b6fef00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -33> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.BufferSpace(0x7eff91796478 in 0x7effa57131c0) _clear
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -32> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57131c0) _trim considering 0x7efe17fccd00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -31> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57131c0) _trim  3#6:82b4e9e0:::1000cfcc030.0000000e:head# has 3 refs; skipp
ing
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -30> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim onodes 5 / 0 buffers 0 / 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -29> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim considering 0x7eff58c1af00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -28> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim  2#6:80a659e9:::1000fbf04ab.0000000a:head# has 2 refs; skipp
ing
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -27> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim considering 0x7efe0d738600
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -26> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim  2#6:80a63af2:::1000fe25508.00000042:head#b02fa has 2 refs; 
skipping
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -25> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim considering 0x7efe10372300
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -24> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim  2#6:80a659e9:::1000fbf04ab.0000000a:head#b02fc has 2 refs; 
skipping
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -23> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim considering 0x7efe0d73a100
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -22> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim  2#6:80800000::::head# has 6 refs; skipping
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -21> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim considering 0x7eff6409e200
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -20> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa57132a0) _trim  2#6:80a67c40:::1000fecf3fa.000071fb:head# has 5 refs; skipp
ing
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -19> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713380) _trim onodes 2 / 0 buffers 0 / 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -18> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713380) _trim considering 0x7efeeb060400
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -17> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713380) _trim  4#6:cb400000::::head# has 3 refs; skipping
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -16> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713380) _trim considering 0x7efec813bb00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -15> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713380) _trim  4#6:cb5868b1:::1000ff1ac08.0000450d:head# has 5 refs; skipp
ing
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -14> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim onodes 4 / 0 buffers 0 / 0
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -13> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim considering 0x7eff2653dc00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -12> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.sharedblob(0x7eff8e777d60) put 0x7eff8e777d60 removing self from set 0x7eff8b716b00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -11> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.BufferSpace(0x7eff8e777d78 in 0x7effa5713460) _clear
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:    -10> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim considering 0x7eff2653df00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -9> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.sharedblob(0x7eff8e8b6000) put 0x7eff8e8b6000 removing self from set 0x7eff8b716b00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -8> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.BufferSpace(0x7eff8e8b6018 in 0x7effa5713460) _clear
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -7> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim considering 0x7eff24f5b900
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -6> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.sharedblob(0x7eff8e8b6150) put 0x7eff8e8b6150 removing self from set 0x7eff8b716b00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -5> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.BufferSpace(0x7eff8e8b6168 in 0x7effa5713460) _clear
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -4> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.sharedblob(0x7eff8e8b6380) put 0x7eff8e8b6380 removing self from set 0x7eff8b716b00
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -3> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.BufferSpace(0x7eff8e8b6398 in 0x7effa5713460) _clear
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -2> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim considering 0x7eff24f5c800
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:     -1> 2019-01-25 15:57:08.434 7eff97faf700 20 bluestore.2QCache(0x7effa5713460) _trim  6#6:f6ef61ec:::10008fb62c4.00000004:head# has 3 refs; skipping
Jan 25 15:57:08 kaa-115 ceph-osd[24687]:      0> 2019-01-25 15:57:08.434 7eff8fd23700 -1 *** Caught signal (Aborted) **
 in thread 7eff8fd23700 thread_name:bstore_kv_final

 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
 1: (()+0x8cc83b) [0x55ebefd5f83b]
 2: (()+0x13230) [0x7effa982f230]
 3: (gsignal()+0x10b) [0x7effa86cdbcd]
 4: (abort()+0x148) [0x7effa86cf298]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x264) [0x7effac667547]
 6: (ceph::register_assert_context(CephContext*)+0) [0x7effac6675b0]
 7: (StupidAllocator::_insert_free(unsigned long, unsigned long)+0xbc4) [0x55ebefd3a6c0]
 8: (StupidAllocator::release(interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&)+0x6b) [0x55ebefd3a759]
 9: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0xc4) [0x55ebefc0fd7e]
 10: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x447) [0x55ebefc25cb1]
 11: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x90) [0x55ebefc3472a]
 12: (BlueStore::_kv_finalize_thread()+0x47d) [0x55ebefc358a9]
 13: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55ebefc8a187]
 14: (()+0x75e5) [0x7effa98235e5]
 15: (clone()+0x3f) [0x7effa8794e9f]

The affected osds can be restarted manually, but the failures happen quite frequently. Multiple (3-5) osds usually crash in quick succession, and this seems to occur during compression with snappy.


Files

ceph-osd.log.gz (279 KB) ceph-osd.log.gz Lawrence Smith, 01/25/2019 03:14 PM
ceph-osd.log.gz (292 KB) ceph-osd.log.gz Lawrence Smith, 02/06/2019 03:52 PM
Actions

Also available in: Atom PDF