Project

General

Profile

Bug #24598

ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add

Added by Sage Weil almost 6 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-06-20 14:51:13.636 7f153c5b1c40 20 bluefs _replay 0x0:  op_alloc_add  1:0x6700000~100000
2018-06-20 14:51:13.636 7f153c5b1c40 10 fbmap_alloc 0x0x55f80fca0430 init_add_free 0x6700000~100000
2018-06-20 14:51:13.636 7f153c5b1c40 10 fbmap_alloc 0x0x55f80fca0430 init_add_free done
2018-06-20 14:51:13.636 7f153c5b1c40 20 bluefs _replay 0x0:  op_alloc_add  1:0x6900000~300000
2018-06-20 14:51:13.636 7f153c5b1c40 10 fbmap_alloc 0x0x55f80fca0430 init_add_free 0x6900000~300000
2018-06-20 14:51:13.636 7f153c5b1c40 10 fbmap_alloc 0x0x55f80fca0430 init_add_free done
2018-06-20 14:51:13.636 7f153c5b1c40 20 bluefs _replay 0x0:  op_alloc_add  1:0x6b00000~200000
2018-06-20 14:51:13.637 7f153c5b1c40 -1 *** Caught signal (Aborted) **
 in thread 7f153c5b1c40 thread_name:ceph-kvstore-to

 ceph version 14.0.0-704-gaf83759a47 (af83759a47e4a60cc7b437355c7c23c430acb5cf) nautilus (dev)
 1: (()+0x467be0) [0x55f80d637be0]
 2: (()+0x11fc0) [0x7f1531114fc0]
 3: (gsignal()+0x10b) [0x7f152f7c6f2b]
 4: (abort()+0x12b) [0x7f152f7b1561]
 5: (()+0x2f068a) [0x55f80d4c068a]
 6: (BlueFS::_replay(bool, bool)+0x11b5) [0x55f80d5cb065]
 7: (BlueFS::mount()+0xff) [0x55f80d5ce38f]
 8: (BlueStore::_open_db(bool, bool)+0x81c) [0x55f80d47954c]
 9: (BlueStore::_mount(bool, bool)+0x1a3) [0x55f80d49dee3]
...

/a/sage-2018-06-19_19:38:19-rados-wip-sage3-testing-2018-06-18-1008-distro-basic-smithi/2681595

Related issues

Copied to bluestore - Backport #38778: luminous: ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add Resolved
Copied to bluestore - Backport #38779: mimic: ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add Resolved

History

#1 Updated by Sage Weil almost 6 years ago

This looks like a BitmapAllocator bug:

First, during mount,

2018-06-21 20:18:55.667 7f2cd2963a00 10 freelist enumerate_next 0x4800000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free 0x4800000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free done
2018-06-21 20:18:55.667 7f2cd2963a00 10 freelist enumerate_next 0x4980000~c000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free 0x4980000~c000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free done
2018-06-21 20:18:55.667 7f2cd2963a00 10 freelist enumerate_next 0x4a00000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free 0x4a00000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free done
2018-06-21 20:18:55.667 7f2cd2963a00 10 freelist enumerate_next 0x4bc0000~c000

then we pull out the bluefs pieces,
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_rm_free 0x4800000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_rm_free done
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_rm_free 0x4a00000~100000
2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_rm_free done

Then, a bit later, we make an overlapping allocation:
2018-06-21 20:19:09.447 7f2c77040700 10 bluestore(bluestore.test_temp_dir) _balance_bluefs_freespace gifting 2097152 (2 MiB)
2018-06-21 20:19:09.447 7f2c77040700 10 fbmap_alloc 0x0x561a576fe490 allocate0x200000/100000,0,0
2018-06-21 20:19:09.447 7f2c77040700 10 fbmap_alloc 0x0x561a576fe490 allocate 0x4900000~200000/100000,0,0
2018-06-21 20:19:09.447 7f2c77040700  1 bluestore(bluestore.test_temp_dir) _balance_bluefs_freespace gifting 0x4900000~200000 to bluefs

/a/sage-2018-06-21_18:19:30-rados-wip-24598-distro-basic-smithi/2687069

#2 Updated by Sage Weil almost 6 years ago

A bit more detail:

0       2018-06-21 20:18:54.503 7f2c77841700 10 fbmap_alloc 0x0x561a5412e940 release 0x4980000~c000
0       2018-06-21 20:18:55.179 7f2cd2963a00 10 freelist enumerate_next 0x4980000~c000
0       2018-06-21 20:18:55.179 7f2cd2963a00 10 fbmap_alloc 0x0x561a54193590 init_add_free 0x4980000~c000
0       2018-06-21 20:18:55.243 7f2cd2963a00 10 freelist enumerate_next 0x4980000~c000
0       2018-06-21 20:18:55.667 7f2cd2963a00 10 freelist enumerate_next 0x4980000~c000
0       2018-06-21 20:18:55.667 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 init_add_free 0x4980000~c000
0       2018-06-21 20:18:55.827 7f2cd2963a00 10 fbmap_alloc 0x0x561a576fe490 allocate 0x4980000~c000/4000,24000,0
0       2018-06-21 20:18:55.827 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _do_alloc_write prealloc [0x4788000~8000,0x4980000~c000,0x4bc0000~c000,0x4be0000~4000]
0       2018-06-21 20:18:55.827 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _do_alloc_write blob Blob(0x561a608ad330 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x0 0x0) SharedBlob(0x561a608ad100 sbid 0x0))
0       2018-06-21 20:18:55.827 7f2cd2963a00 20 bluestore.blob(0x561a608ad330) get_ref 0x0~1c000 Blob(0x561a608ad330 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x0 0x0) SharedBlob(0x561a608ad100 sbid 0x0))
0       2018-06-21 20:18:55.827 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _do_alloc_write  lex 0x10000~1c000: 0x0~1c000 Blob(0x561a608ad330 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x7*0x4000 0x[4000,4000,4000,4000,4000,4000,4000]) SharedBlob(0x561a608ad100 sbid 0x0))
0       2018-06-21 20:18:55.827 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _txc_finalize_kv txc 0x561a63a90d00 allocated 0x[4788000~8000,4980000~c000,4bc0000~c000,4be0000~4000] released 0x[]
0       2018-06-21 20:18:55.827 7f2cd2963a00 10 freelist allocate 0x4980000~c000
0       2018-06-21 20:18:55.887 7f2cb50bc700 20 bluestore(bluestore.test_temp_dir) _do_read  blob Blob(0x561a608ad330 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x7*0x4000 0x[4000,4000,4000,4000,4000,4000,4000]) SharedBlob(0x561a608ad100 sbid 0x0)) need 0x0~1c000 cache has 0x[0~1c000]
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore.blob(0x561a54423100) get_ref 0x0~1c000 Blob(0x561a54423100 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x0 0x0) SharedBlob(0x561a544202a0 sbid 0x0))
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore.blob(0x561a54423100) put_ref 0x0~1c000 Blob(0x561a54423100 blob([0x478c000~4000,0x4980000~c000,0x4bc0000~c000] csum crc32c/0x1000) use_tracker(0x7*0x4000 0x[4000,4000,4000,4000,4000,4000,4000]) SharedBlob(0x561a544202a0 sbid 0x0))
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _wctx_finish  blob release [0x478c000~4000,0x4980000~c000,0x4bc0000~c000]
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _wctx_finish  release 0x4980000~c000
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _txc_finalize_kv txc 0x561a6b9c4900 allocated 0x[] released 0x[110000~4000,120000~4000,128000~4000,6e8000~8000,133c000~4000,1358000~8000,1514000~4000,2774000~8000,2fcc000~8000,3cd0000~8000,41d8000~4000,42bc000~4000,4788000~8000,4980000~c000,4bc0000~c000,4bdc000~8000,4c78000~4000,4cdc000~8000,4d00000~c000,5128000~1c000,5600000~8000,5c08000~4000,6078000~4000,60d0000~8000,6760000~8000,6a0c000~4000,6a8c000~8000,7048000~4000,7064000~8000,7170000~18000,73f8000~10000,7928000~1c000]
0       2018-06-21 20:19:02.523 7f2cd2963a00 20 bluestore(bluestore.test_temp_dir) _txc_finalize_kv release 0x4980000~c000
0       2018-06-21 20:19:02.523 7f2cd2963a00 10 freelist release 0x4980000~c000
0       2018-06-21 20:19:02.555 7f2c7683f700 10 bluestore(bluestore.test_temp_dir) _txc_release_alloc(sync) 0x561a6b9c4900 [110000~4000,120000~4000,128000~4000,6e8000~8000,133c000~4000,1358000~8000,1514000~4000,2774000~8000,2fcc000~8000,3cd0000~8000,41d8000~4000,42bc000~4000,4788000~8000,4980000~c000,4bc0000~c000,4bdc000~8000,4c78000~4000,4cdc000~8000,4d00000~c000,5128000~1c000,5600000~8000,5c08000~4000,6078000~4000,60d0000~8000,6760000~8000,6a0c000~4000,6a8c000~8000,7048000~4000,7064000~8000,7170000~18000,73f8000~10000,7928000~1c000]
0       2018-06-21 20:19:02.555 7f2c7683f700 10 fbmap_alloc 0x0x561a576fe490 release 0x4980000~c000
80000   2018-06-21 20:19:09.447 7f2c77040700 10 fbmap_alloc 0x0x561a576fe490 allocate 0x4900000~200000/100000,0,0
80000   2018-06-21 20:19:09.447 7f2c77040700  1 bluestore(bluestore.test_temp_dir) _balance_bluefs_freespace gifting 0x4900000~200000 to bluefs

(that's the output of ceph.git/src/script/bdev_grep.pl 0x4980000 on that portion of the log). That last allocation of
0x4900000~200000 looks wrong since 0x4a00000~100000 is allocated)

#3 Updated by Igor Fedotov almost 6 years ago

  • Status changed from In Progress to Fix Under Review

#4 Updated by Igor Fedotov almost 6 years ago

  • Assignee changed from Sage Weil to Igor Fedotov

#5 Updated by Sage Weil almost 6 years ago

  • Status changed from Fix Under Review to Resolved

#6 Updated by Nathan Cutler about 5 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to mimic

#7 Updated by Nathan Cutler about 5 years ago

  • Backport changed from mimic to mimic,luminous

#8 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38778: luminous: ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add added

#9 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38779: mimic: ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add added

#10 Updated by Igor Fedotov almost 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF