Project

General

Profile

Bug #22066

bluestore osd asserts repeatedly with ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)

Added by Eric Nelson over 6 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
bluestore osd rocksdb wal
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This may be a dup of http://tracker.ceph.com/issues/21932 - I'm experiencing this fairly regularly on our cachetier bluestore OSDs. 3 times in the last 2 weeks. These are have seperate wal/db partitions on nvme. Once the OSDs get into this state they will not start and just keep crashing until systemd gives up.

I'm unable to bring them up and so far have just resorted to blowing them away and reprovisioning them. This is suboptimal but not terrible for cache tier OSDs but I really want to find a way to fix this before we start seeing it on our sata OSDs as these recovery ops are far more impactful to us.

This is happening on Xenial with luminous v12.2.1 using the official ppa:
deb "https://download.ceph.com/debian-luminous/" xenial main

root@st06:~# dpkg -l | grep ceph
ii ceph 12.2.1-1xenial amd64 distributed storage and file system
pi ceph-base 12.2.1-1xenial amd64 common ceph daemon libraries and management tools
pi ceph-common 12.2.1-1xenial amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-deploy 1.5.39 all Ceph-deploy is an easy to use configuration tool
ii ceph-fuse 12.2.1-1xenial amd64 FUSE-based client for the Ceph distributed file system
ii ceph-mds 12.2.1-1xenial amd64 metadata server for the ceph distributed file system
ii ceph-mgr 12.2.1-1xenial amd64 manager for the ceph distributed storage system
ii ceph-mon 12.2.1-1xenial amd64 monitor server for the ceph storage system
ii ceph-osd 12.2.1-1xenial amd64 OSD server for the ceph storage system
ii ceph-test 12.2.1-1xenial amd64 Ceph test and benchmarking tools
ii libcephfs1 10.2.7-0ubuntu0.16.04.1 amd64 Ceph distributed file system client library
pi libcephfs2 12.2.1-1xenial amd64 Ceph distributed file system client library
pi python-cephfs 12.2.1-1xenial amd64 Python 2 libraries for the Ceph libcephfs library

2017-11-07 05:07:40.193162 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.818396 seconds old, received at 2017-11-07 05:07:09.374162: osd_op(client.32411500.0:704920 16.4 16:20c9bc55:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.261:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:40.672879 7f5695e68700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:684] reusing log 57888 from recycle list

2017-11-07 05:07:40.672990 7f5695e68700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #58654. Immutable memtables: 0.

2017-11-07 05:07:40.673110 7f5695e68700 1 bluefs _allocate failed to allocate 0x100000 on bdev 0, free 0xff000; fallback to bdev 1
2017-11-07 05:07:40.911889 7f567e639700 4 rocksdb: (Original Log Time 2017/11/07-05:07:40.678363) [/build/ceph-12.2.1/src/rocksdb/db/db_impl_compaction_flush.cc:1158] Calling FlushMemTableToOutputFile with column family [default], flush
slots available 1, compaction slots allowed 1, compaction slots scheduled 1
2017-11-07 05:07:40.911954 7f567e639700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10082] Flushing memtable with next log file: 58654

2017-11-07 05:07:40.912077 7f567e639700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510031260911964, "job": 10082, "event": "flush_started", "num_memtables": 1, "num_entries": 17207993, "num_deletes": 16854639, "memory_usage": 3244877424}
2017-11-07 05:07:40.912083 7f567e639700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10082] Level-0 flush table #58655: started
2017-11-07 05:07:41.193369 7f569f67b700 0 log_channel(cluster) log [WRN] : 6 slow requests, 5 included below; oldest blocked for > 36.431733 secs
2017-11-07 05:07:41.193380 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.801090 seconds old, received at 2017-11-07 05:07:10.392216: osd_op(client.32814103.0:158479 16.8 16:10f90c19:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.132:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193403 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 36.431733 seconds old, received at 2017-11-07 05:07:04.761574: osd_op(client.32815468.0:1795982 16.4 16:2055d324:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21723796.1.98:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193406 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 32.197643 seconds old, received at 2017-11-07 05:07:08.995664: osd_op(client.32324126.0:723025 16.5b 16:dbf7165f:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21552069.1.1021:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193410 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 31.876367 seconds old, received at 2017-11-07 05:07:09.316940: osd_op(client.32411449.0:1647283 16.5b 16:dab27a8b:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.452:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193414 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.220324 seconds old, received at 2017-11-07 05:07:10.972983: osd_op(client.32661681.0:159961 16.5f 16:fbe053dc:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.1016:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 30
2017-11-07 05:07:46.194177 7f569f67b700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.814295 secs
2017-11-07 05:07:46.194186 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.814295 seconds old, received at 2017-11-07 05:07:15.379818: osd_op(client.32814103.0:158503 25.377 25:eedc0af4:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968764662.LKPUWVcpM1yjdrdp2FNEpHmLjLfCccn2Ho6MBuBzLdVzcXdDrYhjtJgSo8yvv9XJXtYxtKE8bmvzRkFZEdWsaGitujN9MsPSHH5s28GAvRaNcoqvKTCnWswv2mFh4uA1PhZ.9884577.jpg:head [create,setxattr user.rgw.idtag (53),setxattr user.rgw.tail_tag (53),writefull 0~13797,setxattr user.rgw.manifest (609),setxattr user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e52949) currently waiting for blocked object
2017-11-07 05:07:47.194495 7f569f67b700 0 log_channel(cluster) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > 31.814620 secs
2017-11-07 05:07:47.194601 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.749378 seconds old, received at 2017-11-07 05:07:16.445060: osd_op(client.32324117.0:1357608 25.1af 25:f5963916:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968763621.m7oZhQcTjnhyAL82D9Bd6tCVLmpKhmYXNMZNpvWz8sBc4QuHEeK1xvbxp9JV45SfZHsN1iF7d55JsEtUwZjv9wLW28H4n5QhBjanATGbLFBGqky2xyrpf3Bv48Dm7C2QTYA.10184398.jpg:head [create,setxattr user.rgw.idtag (54),setxattr user.rgw.tail_tag (54),writefull 0~24398,setxattr user.rgw.manifest (610),setxattr user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e52949) currently waiting for blocked object
2017-11-07 05:07:57.314405 7f567e639700 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f567e639700 time 2017-11-07 05:07:57.235916
/build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)


Related issues

Related to Ceph - Bug #21932: OSD crash on boot with assert caused by Bluefs on flush write Resolved 10/26/2017

History

#1 Updated by jianpeng ma over 6 years ago

Could you add more debug message by add "debug bluestore = 20" in /etc/ceph/ceph.conf? Thanks!

#2 Updated by Eric Nelson over 6 years ago

Here's the stacktrace with debug bluestore = 20.

-183> 2017-11-08 16:41:10.602756 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[d958900000~4200000]
-182> 2017-11-08 16:41:10.602758 7fbdcb679e00 10 stupidalloc init_rm_free 0xe30cb00000~4200000
-181> 2017-11-08 16:41:10.602761 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e30cb00000~4200000]
-180> 2017-11-08 16:41:10.602763 7fbdcb679e00 10 stupidalloc init_rm_free 0xe261900000~4200000
-179> 2017-11-08 16:41:10.602766 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e261900000~4200000]
-178> 2017-11-08 16:41:10.602779 7fbdcb679e00 10 stupidalloc init_rm_free 0xdfbf300000~4200000
-177> 2017-11-08 16:41:10.602782 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[dfbf300000~4200000]
-176> 2017-11-08 16:41:10.602784 7fbdcb679e00 10 stupidalloc init_rm_free 0x10a3d100000~4200000
-175> 2017-11-08 16:41:10.602788 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[10a3d100000~4200000]
-174> 2017-11-08 16:41:10.602790 7fbdcb679e00 10 stupidalloc init_rm_free 0xe019f00000~4200000
-173> 2017-11-08 16:41:10.602793 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e019f00000~4200000]
-172> 2017-11-08 16:41:10.602795 7fbdcb679e00 10 stupidalloc init_rm_free 0xe795d00000~4200000
-171> 2017-11-08 16:41:10.602799 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e795d00000~4200000]
-170> 2017-11-08 16:41:10.602800 7fbdcb679e00 10 stupidalloc init_rm_free 0xe015d00000~4200000
-169> 2017-11-08 16:41:10.602804 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e015d00000~4200000]
-168> 2017-11-08 16:41:10.602805 7fbdcb679e00 10 stupidalloc init_rm_free 0xe011b00000~4200000
-167> 2017-11-08 16:41:10.602808 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[e011b00000~4200000]
-166> 2017-11-08 16:41:10.602810 7fbdcb679e00 10 stupidalloc init_rm_free 0xdd64b00000~4300000
-165> 2017-11-08 16:41:10.602813 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[dd64b00000~4300000]
-164> 2017-11-08 16:41:10.602815 7fbdcb679e00 10 stupidalloc init_rm_free 0xdd60900000~4200000
-163> 2017-11-08 16:41:10.602817 7fbdcb679e00 20 stupidalloc init_rm_free bin 9 rm 0x[dd60900000~4200000]
-162> 2017-11-08 16:41:10.602912 7fbdcb679e00 10 bluestore(/var/lib/ceph/osd/ceph-31) _open_db set rocksdb_db_paths to db,14280766259 db.slow,1824231732633
-161> 2017-11-08 16:41:10.602986 7fbdcb679e00 0 set rocksdb option compaction_readahead_size = 2097152
-160> 2017-11-08 16:41:10.602995 7fbdcb679e00 0 set rocksdb option compression = kNoCompression
-159> 2017-11-08 16:41:10.603001 7fbdcb679e00 0 set rocksdb option max_write_buffer_number = 4
-158> 2017-11-08 16:41:10.603006 7fbdcb679e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
-157> 2017-11-08 16:41:10.603010 7fbdcb679e00 0 set rocksdb option recycle_log_file_num = 4
-156> 2017-11-08 16:41:10.603013 7fbdcb679e00 0 set rocksdb option writable_file_max_buffer_size = 0
-155> 2017-11-08 16:41:10.603017 7fbdcb679e00 0 set rocksdb option write_buffer_size = 268435456
-154> 2017-11-08 16:41:10.603050 7fbdcb679e00 0 set rocksdb option compaction_readahead_size = 2097152
-153> 2017-11-08 16:41:10.603055 7fbdcb679e00 0 set rocksdb option compression = kNoCompression
-152> 2017-11-08 16:41:10.603059 7fbdcb679e00 0 set rocksdb option max_write_buffer_number = 4
-151> 2017-11-08 16:41:10.603063 7fbdcb679e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
-150> 2017-11-08 16:41:10.603066 7fbdcb679e00 0 set rocksdb option recycle_log_file_num = 4
-149> 2017-11-08 16:41:10.603069 7fbdcb679e00 0 set rocksdb option writable_file_max_buffer_size = 0
-148> 2017-11-08 16:41:10.603073 7fbdcb679e00 0 set rocksdb option write_buffer_size = 268435456
-147> 2017-11-08 16:41:10.603236 7fbdcb679e00 4 rocksdb: RocksDB version: 5.4.0
-146> 2017-11-08 16:41:10.603245 7fbdcb679e00  4 rocksdb: Git sha rocksdb_build_git_sha:0
-145> 2017-11-08 16:41:10.603247 7fbdcb679e00 4 rocksdb: Compile date Sep 26 2017
-144> 2017-11-08 16:41:10.603248 7fbdcb679e00 4 rocksdb: DB SUMMARY
-143> 2017-11-08 16:41:10.603262 7fbdcb679e00  4 rocksdb: CURRENT file:  CURRENT
-142> 2017-11-08 16:41:10.603264 7fbdcb679e00  4 rocksdb: IDENTITY file:  IDENTITY
-141> 2017-11-08 16:41:10.603271 7fbdcb679e00  4 rocksdb: MANIFEST file:  MANIFEST-000250 size: 3312194 Bytes
-140> 2017-11-08 16:41:10.603274 7fbdcb679e00  4 rocksdb: SST files in db dir, Total Num: 6, files: 011542.sst 011543.sst 011544.sst 011545.sst 011547.sst 011650.sst
-139> 2017-11-08 16:41:10.604440 7fbdcb679e00  4 rocksdb: SST files in db.slow dir, Total Num: 1417, files: 000464.sst 000495.sst 000496.sst 000499.sst 000548.sst 000549.sst 000554.sst 000649.sst 000688.sst
-138> 2017-11-08 16:41:10.604474 7fbdcb679e00  4 rocksdb: Write Ahead Log file in db.wal: 011538.log size: 2943570798 ; 011649.log size: 254817662 ; 011651.log size: 252432464 ; 011652.log size: 86759229 ;
-137> 2017-11-08 16:41:10.604476 7fbdcb679e00  4 rocksdb:                         Options.error_if_exists: 0
-136> 2017-11-08 16:41:10.604477 7fbdcb679e00 4 rocksdb: Options.create_if_missing: 0
-135> 2017-11-08 16:41:10.604479 7fbdcb679e00 4 rocksdb: Options.paranoid_checks: 1
-134> 2017-11-08 16:41:10.604480 7fbdcb679e00 4 rocksdb: Options.env: 0x55a5dadee040
-133> 2017-11-08 16:41:10.604481 7fbdcb679e00 4 rocksdb: Options.info_log: 0x55a5daf150c0
-132> 2017-11-08 16:41:10.604482 7fbdcb679e00 4 rocksdb: Options.max_open_files: -1
-131> 2017-11-08 16:41:10.604483 7fbdcb679e00 4 rocksdb: Options.max_file_opening_threads: 16
-130> 2017-11-08 16:41:10.604484 7fbdcb679e00 4 rocksdb: Options.use_fsync: 0
-129> 2017-11-08 16:41:10.604485 7fbdcb679e00 4 rocksdb: Options.max_log_file_size: 0
-128> 2017-11-08 16:41:10.604486 7fbdcb679e00 4 rocksdb: Options.max_manifest_file_size: 18446744073709551615
-127> 2017-11-08 16:41:10.604488 7fbdcb679e00 4 rocksdb: Options.log_file_time_to_roll: 0
-126> 2017-11-08 16:41:10.604488 7fbdcb679e00 4 rocksdb: Options.keep_log_file_num: 1000
-125> 2017-11-08 16:41:10.604490 7fbdcb679e00 4 rocksdb: Options.recycle_log_file_num: 4
-124> 2017-11-08 16:41:10.604491 7fbdcb679e00 4 rocksdb: Options.allow_fallocate: 1
-123> 2017-11-08 16:41:10.604492 7fbdcb679e00 4 rocksdb: Options.allow_mmap_reads: 0
-122> 2017-11-08 16:41:10.604493 7fbdcb679e00 4 rocksdb: Options.allow_mmap_writes: 0
-121> 2017-11-08 16:41:10.604494 7fbdcb679e00 4 rocksdb: Options.use_direct_reads: 0
-120> 2017-11-08 16:41:10.604495 7fbdcb679e00 4 rocksdb: Options.use_direct_io_for_flush_and_compaction: 0
-119> 2017-11-08 16:41:10.604497 7fbdcb679e00 4 rocksdb: Options.create_missing_column_families: 0
-118> 2017-11-08 16:41:10.604498 7fbdcb679e00 4 rocksdb: Options.db_log_dir:
-117> 2017-11-08 16:41:10.604499 7fbdcb679e00 4 rocksdb: Options.wal_dir: db.wal
-116> 2017-11-08 16:41:10.604500 7fbdcb679e00 4 rocksdb: Options.table_cache_numshardbits: 6
-115> 2017-11-08 16:41:10.604501 7fbdcb679e00 4 rocksdb: Options.max_subcompactions: 1
-114> 2017-11-08 16:41:10.604526 7fbdcb679e00 4 rocksdb: Options.max_background_flushes: 1
-113> 2017-11-08 16:41:10.604527 7fbdcb679e00 4 rocksdb: Options.WAL_ttl_seconds: 0
-112> 2017-11-08 16:41:10.604528 7fbdcb679e00 4 rocksdb: Options.WAL_size_limit_MB: 0
-111> 2017-11-08 16:41:10.604529 7fbdcb679e00 4 rocksdb: Options.manifest_preallocation_size: 4194304
-110> 2017-11-08 16:41:10.604531 7fbdcb679e00 4 rocksdb: Options.is_fd_close_on_exec: 1
-109> 2017-11-08 16:41:10.604531 7fbdcb679e00 4 rocksdb: Options.advise_random_on_open: 1
-108> 2017-11-08 16:41:10.604532 7fbdcb679e00 4 rocksdb: Options.db_write_buffer_size: 0
-107> 2017-11-08 16:41:10.604533 7fbdcb679e00 4 rocksdb: Options.access_hint_on_compaction_start: 1
-106> 2017-11-08 16:41:10.604535 7fbdcb679e00 4 rocksdb: Options.new_table_reader_for_compaction_inputs: 1
-105> 2017-11-08 16:41:10.604536 7fbdcb679e00 4 rocksdb: Options.compaction_readahead_size: 2097152
-104> 2017-11-08 16:41:10.604537 7fbdcb679e00 4 rocksdb: Options.random_access_max_buffer_size: 1048576
-103> 2017-11-08 16:41:10.604538 7fbdcb679e00 4 rocksdb: Options.writable_file_max_buffer_size: 0
-102> 2017-11-08 16:41:10.604539 7fbdcb679e00 4 rocksdb: Options.use_adaptive_mutex: 0
-101> 2017-11-08 16:41:10.604540 7fbdcb679e00 4 rocksdb: Options.rate_limiter: (nil)
-100> 2017-11-08 16:41:10.604541 7fbdcb679e00 4 rocksdb: Options.sst_file_manager.rate_bytes_per_sec: 0
-99> 2017-11-08 16:41:10.604542 7fbdcb679e00 4 rocksdb: Options.bytes_per_sync: 0
-98> 2017-11-08 16:41:10.604543 7fbdcb679e00 4 rocksdb: Options.wal_bytes_per_sync: 0
-97> 2017-11-08 16:41:10.604544 7fbdcb679e00 4 rocksdb: Options.wal_recovery_mode: 0
-96> 2017-11-08 16:41:10.604545 7fbdcb679e00 4 rocksdb: Options.enable_thread_tracking: 0
-95> 2017-11-08 16:41:10.604546 7fbdcb679e00 4 rocksdb: Options.allow_concurrent_memtable_write: 1
-94> 2017-11-08 16:41:10.604547 7fbdcb679e00 4 rocksdb: Options.enable_write_thread_adaptive_yield: 1
-93> 2017-11-08 16:41:10.604548 7fbdcb679e00 4 rocksdb: Options.write_thread_max_yield_usec: 100
-92> 2017-11-08 16:41:10.604549 7fbdcb679e00 4 rocksdb: Options.write_thread_slow_yield_usec: 3
-91> 2017-11-08 16:41:10.604550 7fbdcb679e00 4 rocksdb: Options.row_cache: None
-90> 2017-11-08 16:41:10.604551 7fbdcb679e00 4 rocksdb: Options.wal_filter: None
-89> 2017-11-08 16:41:10.604553 7fbdcb679e00 4 rocksdb: Options.avoid_flush_during_recovery: 0
-88> 2017-11-08 16:41:10.604554 7fbdcb679e00 4 rocksdb: Options.base_background_compactions: 1
-87> 2017-11-08 16:41:10.604555 7fbdcb679e00 4 rocksdb: Options.max_background_compactions: 1
-86> 2017-11-08 16:41:10.604556 7fbdcb679e00 4 rocksdb: Options.avoid_flush_during_shutdown: 0
-85> 2017-11-08 16:41:10.604557 7fbdcb679e00 4 rocksdb: Options.delayed_write_rate : 16777216
-84> 2017-11-08 16:41:10.604558 7fbdcb679e00 4 rocksdb: Options.max_total_wal_size: 0
-83> 2017-11-08 16:41:10.604559 7fbdcb679e00 4 rocksdb: Options.delete_obsolete_files_period_micros: 21600000000
-82> 2017-11-08 16:41:10.604560 7fbdcb679e00 4 rocksdb: Options.stats_dump_period_sec: 600
-81> 2017-11-08 16:41:10.604561 7fbdcb679e00 4 rocksdb: Compression algorithms supported:
-80> 2017-11-08 16:41:10.604562 7fbdcb679e00 4 rocksdb: Snappy supported: 0
-79> 2017-11-08 16:41:10.604564 7fbdcb679e00 4 rocksdb: Zlib supported: 0
-78> 2017-11-08 16:41:10.604565 7fbdcb679e00 4 rocksdb: Bzip supported: 0
-77> 2017-11-08 16:41:10.604566 7fbdcb679e00 4 rocksdb: LZ4 supported: 0
-76> 2017-11-08 16:41:10.604567 7fbdcb679e00 4 rocksdb: ZSTD supported: 0
-75> 2017-11-08 16:41:10.604568 7fbdcb679e00 4 rocksdb: Fast CRC32 supported: 0
-74> 2017-11-08 16:41:10.604632 7fbdcb679e00 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/version_set.cc:2609] Recovering from manifest file: MANIFEST-000250
73> 2017-11-08 16:41:10.604680 7fbdcb679e00  4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/column_family.cc:407] -------------- Options for column family [default]:
-72> 2017-11-08 16:41:10.604683 7fbdcb679e00  4 rocksdb:               Options.comparator: leveldb.BytewiseComparator
-71> 2017-11-08 16:41:10.604689 7fbdcb679e00 4 rocksdb: Options.merge_operator: .T:int64_array.b:bitwise_xor
-70> 2017-11-08 16:41:10.604690 7fbdcb679e00 4 rocksdb: Options.compaction_filter: None
-69> 2017-11-08 16:41:10.604691 7fbdcb679e00 4 rocksdb: Options.compaction_filter_factory: None
-68> 2017-11-08 16:41:10.604692 7fbdcb679e00 4 rocksdb: Options.memtable_factory: SkipListFactory
-67> 2017-11-08 16:41:10.604693 7fbdcb679e00 4 rocksdb: Options.table_factory: BlockBasedTable
-66> 2017-11-08 16:41:10.604750 7fbdcb679e00 4 rocksdb: table_factory options: flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x55a5da90e168)
cache_index_and_filter_blocks: 1
cache_index_and_filter_blocks_with_high_priority: 1
pin_l0_filter_and_index_blocks_in_cache: 1
index_type: 0
hash_index_allow_collision: 1
checksum: 1
no_block_cache: 0
block_cache: 0x55a5dac01490
block_cache_name: LRUCache
block_cache_options:
capacity : 536870912
num_shard_bits : 4
strict_capacity_limit : 0
high_pri_pool_ratio: 0.000
block_cache_compressed: (nil)
persistent_cache: (nil)
block_size: 4096
block_size_deviation: 10
block_restart_interval: 16
index_block_restart_interval: 1
filter_policy: rocksdb.BuiltinBloomFilter
whole_key_filtering: 1
format_version: 2
-65> 2017-11-08 16:41:10.604757 7fbdcb679e00  4 rocksdb:        Options.write_buffer_size: 268435456
-64> 2017-11-08 16:41:10.604758 7fbdcb679e00 4 rocksdb: Options.max_write_buffer_number: 4
-63> 2017-11-08 16:41:10.604760 7fbdcb679e00 4 rocksdb: Options.compression: NoCompression
-62> 2017-11-08 16:41:10.604761 7fbdcb679e00 4 rocksdb: Options.bottommost_compression: Disabled
-61> 2017-11-08 16:41:10.604762 7fbdcb679e00 4 rocksdb: Options.prefix_extractor: nullptr
-60> 2017-11-08 16:41:10.604764 7fbdcb679e00 4 rocksdb: Options.memtable_insert_with_hint_prefix_extractor: nullptr
-59> 2017-11-08 16:41:10.604765 7fbdcb679e00 4 rocksdb: Options.num_levels: 7
-58> 2017-11-08 16:41:10.604766 7fbdcb679e00 4 rocksdb: Options.min_write_buffer_number_to_merge: 1
-57> 2017-11-08 16:41:10.604767 7fbdcb679e00 4 rocksdb: Options.max_write_buffer_number_to_maintain: 0
-56> 2017-11-08 16:41:10.604768 7fbdcb679e00 4 rocksdb: Options.compression_opts.window_bits: -14
-55> 2017-11-08 16:41:10.604769 7fbdcb679e00 4 rocksdb: Options.compression_opts.level: -1
-54> 2017-11-08 16:41:10.604770 7fbdcb679e00 4 rocksdb: Options.compression_opts.strategy: 0
-53> 2017-11-08 16:41:10.604771 7fbdcb679e00 4 rocksdb: Options.compression_opts.max_dict_bytes: 0
-52> 2017-11-08 16:41:10.604772 7fbdcb679e00 4 rocksdb: Options.level0_file_num_compaction_trigger: 4
-51> 2017-11-08 16:41:10.604773 7fbdcb679e00 4 rocksdb: Options.level0_slowdown_writes_trigger: 20
-50> 2017-11-08 16:41:10.604783 7fbdcb679e00 4 rocksdb: Options.level0_stop_writes_trigger: 36
-49> 2017-11-08 16:41:10.604785 7fbdcb679e00 4 rocksdb: Options.target_file_size_base: 67108864
-48> 2017-11-08 16:41:10.604786 7fbdcb679e00 4 rocksdb: Options.target_file_size_multiplier: 1
-47> 2017-11-08 16:41:10.604787 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_base: 268435456
-46> 2017-11-08 16:41:10.604788 7fbdcb679e00 4 rocksdb: Options.level_compaction_dynamic_level_bytes: 0
-45> 2017-11-08 16:41:10.604789 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier: 10.000000
-44> 2017-11-08 16:41:10.604792 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[0]: 1
-43> 2017-11-08 16:41:10.604793 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[1]: 1
-42> 2017-11-08 16:41:10.604794 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[2]: 1
-41> 2017-11-08 16:41:10.604795 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[3]: 1
-40> 2017-11-08 16:41:10.604796 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[4]: 1
-39> 2017-11-08 16:41:10.604797 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[5]: 1
-38> 2017-11-08 16:41:10.604798 7fbdcb679e00 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[6]: 1
-37> 2017-11-08 16:41:10.604799 7fbdcb679e00 4 rocksdb: Options.max_sequential_skip_in_iterations: 8
-36> 2017-11-08 16:41:10.604800 7fbdcb679e00 4 rocksdb: Options.max_compaction_bytes: 1677721600
-35> 2017-11-08 16:41:10.604801 7fbdcb679e00 4 rocksdb: Options.arena_block_size: 33554432
-34> 2017-11-08 16:41:10.604802 7fbdcb679e00 4 rocksdb: Options.soft_pending_compaction_bytes_limit: 68719476736
-33> 2017-11-08 16:41:10.604803 7fbdcb679e00 4 rocksdb: Options.hard_pending_compaction_bytes_limit: 274877906944
-32> 2017-11-08 16:41:10.604804 7fbdcb679e00 4 rocksdb: Options.rate_limit_delay_max_milliseconds: 100
-31> 2017-11-08 16:41:10.604805 7fbdcb679e00 4 rocksdb: Options.disable_auto_compactions: 0
-30> 2017-11-08 16:41:10.604807 7fbdcb679e00 4 rocksdb: Options.compaction_style: kCompactionStyleLevel
-29> 2017-11-08 16:41:10.604809 7fbdcb679e00 4 rocksdb: Options.compaction_pri: kByCompensatedSize
-28> 2017-11-08 16:41:10.604810 7fbdcb679e00 4 rocksdb: Options.compaction_options_universal.size_ratio: 1
-27> 2017-11-08 16:41:10.604811 7fbdcb679e00 4 rocksdb: Options.compaction_options_universal.min_merge_width: 2
-26> 2017-11-08 16:41:10.604812 7fbdcb679e00 4 rocksdb: Options.compaction_options_universal.max_merge_width: 4294967295
-25> 2017-11-08 16:41:10.604813 7fbdcb679e00 4 rocksdb: Options.compaction_options_universal.max_size_amplification_percent: 200
-24> 2017-11-08 16:41:10.604814 7fbdcb679e00 4 rocksdb: Options.compaction_options_universal.compression_size_percent: -1
-23> 2017-11-08 16:41:10.604815 7fbdcb679e00 4 rocksdb: Options.compaction_options_fifo.max_table_files_size: 1073741824
-22> 2017-11-08 16:41:10.604817 7fbdcb679e00 4 rocksdb: Options.table_properties_collectors:
-21> 2017-11-08 16:41:10.604818 7fbdcb679e00 4 rocksdb: Options.inplace_update_support: 0
-20> 2017-11-08 16:41:10.604819 7fbdcb679e00 4 rocksdb: Options.inplace_update_num_locks: 10000
-19> 2017-11-08 16:41:10.604820 7fbdcb679e00 4 rocksdb: Options.memtable_prefix_bloom_size_ratio: 0.000000
-18> 2017-11-08 16:41:10.604822 7fbdcb679e00 4 rocksdb: Options.memtable_huge_page_size: 0
-17> 2017-11-08 16:41:10.604823 7fbdcb679e00 4 rocksdb: Options.bloom_locality: 0
-16> 2017-11-08 16:41:10.604824 7fbdcb679e00 4 rocksdb: Options.max_successive_merges: 0
-15> 2017-11-08 16:41:10.604825 7fbdcb679e00 4 rocksdb: Options.optimize_filters_for_hits: 0
-14> 2017-11-08 16:41:10.604826 7fbdcb679e00 4 rocksdb: Options.paranoid_file_checks: 0
-13> 2017-11-08 16:41:10.604827 7fbdcb679e00 4 rocksdb: Options.force_consistency_checks: 0
-12> 2017-11-08 16:41:10.604828 7fbdcb679e00 4 rocksdb: Options.report_bg_io_stats: 0
-11> 2017-11-08 16:41:10.709799 7fbdcb679e00 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/version_set.cc:2859] Recovered from manifest file:db/MANIFEST-000250 succeeded,manifest_file_number is 250, next_file_number is 11650, last_sequence is 283290515, log_number is 0,prev_log_number is 0,max_column_family is 0
-10> 2017-11-08 16:41:10.709815 7fbdcb679e00  4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/version_set.cc:2867] Column family [default] (ID 0), log number is 11538
-9> 2017-11-08 16:41:10.712511 7fbdcb679e00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510159270712506, "job": 1, "event": "recovery_started", "log_files": [11538, 11649, 11651, 11652]}
-8> 2017-11-08 16:41:10.712519 7fbdcb679e00 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:482] Recovering log #11538 mode 0
-7> 2017-11-08 16:41:25.770790 7fbdc4880700 5 asok(0x55a5da95f2c0) AdminSocket: request 'get_command_descriptions' '' to 0x55a5da912550 returned 2206 bytes
-6> 2017-11-08 16:41:25.778924 7fbdc4880700 1 do_command 'perf dump' '
-5> 2017-11-08 16:41:25.779287 7fbdc4880700 1 do_command 'perf dump' 'result is 16090 bytes
-4> 2017-11-08 16:41:25.779294 7fbdc4880700 5 asok(0x55a5da95f2c0) AdminSocket: request 'perf dump' '' to 0x55a5da9121b0 returned 16090 bytes
-3> 2017-11-08 16:41:52.381627 7fbdcb679e00 5 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:815] [default] [WriteLevel0TableForRecovery] Level-0 table #11650: started
-2> 2017-11-08 16:42:06.754043 7fbdcb679e00 10 stupidalloc reserve need 0x92c00000 num_free 0x36f9fe000 num_reserved 0x0
-1> 2017-11-08 16:42:06.754073 7fbdcb679e00 10 stupidalloc allocate_int want_size 0x92c00000 alloc_unit 0x100000 hint 0x0
0> 2017-11-08 16:42:06.760491 7fbdcb679e00 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector&lt;iovec, 4ul&gt;]' thread 7fbdcb679e00 time 2017-11-08 16:42:06.754488
/build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55a5cfe213f2]
2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x1598) [0x55a5cfdc6a78]
3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x9c0) [0x55a5cfda12c0]
4: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x124) [0x55a5cfda2cf4]
5: (BlueRocksWritableFile::Flush()+0x3d) [0x55a5cfdb8e6d]
6: (rocksdb::WritableFileWriter::Flush()+0x24c) [0x55a5d01f13ac]
7: (rocksdb::WritableFileWriter::Sync(bool)+0x3e) [0x55a5d01f25ce]
8: (rocksdb::BuildTable(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIterator*, std::unique_ptr&lt;rocksdb::InternalIterator, std::default_delete&lt;rocksdb::InternalIterator&gt; >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; >, std::allocator&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; > > > const*, unsigned int, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&, std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; >, unsigned long, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int)+0x190f) [0x55a5d021104f]
9: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xad9) [0x55a5d013cdd9]
10: (rocksdb::DBImpl::RecoverLogFiles(std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; > const&, unsigned long*, bool)+0x17ec) [0x55a5d013ee2c]
11: (rocksdb::DBImpl::Recover(std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, bool, bool, bool)+0x8c4) [0x55a5d0140394]
12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; >, rocksdb::DB*)+0xedc) [0x55a5d014175c]
13: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&, rocksdb::DB**)+0x6b1) [0x55a5d0142ff1]
14: (RocksDBStore::do_open(std::ostream&, bool)+0x8ff) [0x55a5cfd244df]
15: (BlueStore::_open_db(bool)+0xf73) [0x55a5cfcb0623]
16: (BlueStore::fsck(bool)+0x3e7) [0x55a5cfce0de7]
17: (BlueStore::_mount(bool)+0x1ee) [0x55a5cfce040e]
18: (OSD::init()+0x3df) [0x55a5cf858cff]
19: (main()+0x2eb8) [0x55a5cf76b1f8]
20: (__libc_start_main()+0xf0) [0x7fbdc8ae7830]
21: (_start()+0x29) [0x55a5cf7f6a59]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

#3 Updated by jianpeng ma over 6 years ago

Eric Nelson , i think i set the wrong debug. It should "debug bdev = 20 && debug bluefs = 20". rather than "debug bluestore = 20:
Could you try again with "debug bdev = 20; debug bluefs = 20". You can add a attachment if message too larger. Thanks!

And i this the reason because data too larger and after rebuild_aligned_size_and_memory(). The buffers.size still larger than IOV_MAX. So i push this PR:
https://github.com/ceph/ceph/pull/18828.

#4 Updated by Eric Nelson over 6 years ago

No problem, for the time being some of these have been converted to bluestore osds with wal/db on them since it's happening only on our SSDs. I will update configs with the above and I imagine we'll see another crash within the next few days, at which point I can get you another stacktrace with the info you need.

Thanks for the update! We may be an outlier in terms of data size since we're pushing large video files as well as many small files so we cover the gamut (cluster is about 2 months old and largest bucket size is 62TB, max objects bucket is ~180m). We typically burst traffic into cachetier then keep our backend storage osds maxed out writing that data that is constantly evicting to disk. Indexes are also pinned to our cachetier ruleset since they are critical in this use case.

Following that PR in hopes of a fix soon.

#5 Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)

#6 Updated by Sage Weil about 6 years ago

  • Status changed from New to Duplicate

see #21932, pending backport, should be in 12.2.3

#7 Updated by Sage Weil about 6 years ago

  • Related to Bug #21932: OSD crash on boot with assert caused by Bluefs on flush write added

#8 Updated by Ist Gab over 2 years ago

Eric Nelson wrote:

No problem, for the time being some of these have been converted to bluestore osds with wal/db on them since it's happening only on our SSDs. I will update configs with the above and I imagine we'll see another crash within the next few days, at which point I can get you another stacktrace with the info you need.

Thanks for the update! We may be an outlier in terms of data size since we're pushing large video files as well as many small files so we cover the gamut (cluster is about 2 months old and largest bucket size is 62TB, max objects bucket is ~180m). We typically burst traffic into cachetier then keep our backend storage osds maxed out writing that data that is constantly evicting to disk. Indexes are also pinned to our cachetier ruleset since they are critical in this use case.

Following that PR in hopes of a fix soon.

Hi Eric,
What was the solution for you finally please? We are facing similar issue with octopus 15.2.14, slow request is coming and it is is blocked too long, it simply kills the gateway until the slow ops gone.

Our looks like this:

2021-10-18T03:24:36.673674+0700 osd.35 (osd.35) 1403 : cluster [WRN] slow request osd_op(client.125814959.0:tel:156386555 28.5ds0 28:bb23af14:::9213182a-14ba-48ad-bde9-289a1c0c0de8.6034919.1_%2fWHITELABEL-1%2fPAGETPYE-7%2fDEVICE-4%2fLANGUAGE-47%2fSUBTYPE-0%2f327314:head [getxattrs out=945b,stat out=16b,read 0~55491] snapc 0=[] ondisk+read+known_if_redirected e35201) initiated 2021-10-18T03:24:06.533542+0700 currently started

Also available in: Atom PDF