Feature #13942
closedceph-disk: support bluestore
0%
Description
bluestore (newstore) is based on a very small file system (with osd metadata, like the keyring, features, etc.) and one or more block devices. These devices are symlinked from the data directory, similar to how 'journal' is a symlink for the current FileStore.
ceph-disk create:
- create a small partition for osd_data
- create a large partition (remainder of disk, by default) for data
- symlink from $osd_data/block
- [optional] create a mid-size partition for metadata (rocksdb). use probably needs to specify this, since it'll probably be 1/Nth of their available SSD space on the host.
- symlink from $osd_data/block.db
- [optional] create a small partition for the write-ahead-log (basically the journal). default size of 128MB is sufficient.
- symlink from $osd_data/block.wal
(note that block.db is preferable to block.wal as the space will be used for both the wal and sst files. both would be used if the host has HDD, SSD, and NVME or NVRAM.)
- ceph-disk activate:
I think we can fully generalize this to re-use the journal UUID for any subsidiary block device (s/journal/block/ or similar). Then, make activate simply require that all symlinks in $osd_data resolve to devices before activating the OSD. The missing piece is that ceph-disk needs to figure out the uuid from a journal device in order to map it back to the parent osd_data device. Right now it does
out = _check_output(
args=[
'ceph-osd',
'-i', '0', # this is ignored
'--get-journal-uuid',
'--osd-journal',
path,
],
close_fds=True,
)
but I think we need to replace this with some generic-ish way to identifying which OSD the device belongs too. For bluestore I can just stuff the uuid in the first block of the device? And then we can make a --get-device-uuid command that either parses the FileJournal header or a bluestore first-block-has-uuid header?
Updated by Sage Weil over 8 years ago
- Status changed from New to 12
https://github.com/ceph/ceph/pull/6759 fixes the block device probing part
Updated by Loïc Dachary over 8 years ago
- Status changed from 12 to In Progress
- Assignee set to Loïc Dachary
Updated by Sage Weil over 8 years ago
- Status changed from In Progress to 12
- Assignee deleted (
Loïc Dachary)
For the 'ceph-disk prepare' part, I think we should keep it simple initially:
ceph-disk --osd-objectstore bluestore maindev[:dbdev[:waldev]]
and teach ceph-disk how to do the partitioning for bluestore (no generic way to ask ceph-osd that). We can leave off the db/wal devices initially, and then make activate work, so that there is something functional. Then add dbdev and waldev support last.
Updated by Loïc Dachary over 8 years ago
encryption support will need to extend to block as well as osd-data since the data is no longer in the osd-data partition
Updated by Loïc Dachary over 8 years ago
- Status changed from 12 to In Progress
- Assignee set to Loïc Dachary
Updated by Loïc Dachary over 8 years ago
Rebase to master complete, make check passes, working on ceph-disk suite problems now.
Updated by Loïc Dachary over 8 years ago
bluestore fails to initialize on a ceph-disk prepared device (no external journal).
Updated by Loïc Dachary over 8 years ago
ceph.conf has
[global] enable experimental unrecoverable data corrupting features = * bluestore fsck on mount = true bluestore block db size = 67108864 bluestore block wal size = 134217728 bluestore block size = 5368709120 osd objectstore = bluestore
ceph-prepare + activate via udev lead to /var/lib/ceph/osd/ceph-2
-rw-r--r--. 1 root root 187 Jan 28 06:27 activate.monmap -rw-r--r--. 1 ceph ceph 3 Jan 28 06:27 active lrwxrwxrwx. 1 ceph ceph 58 Jan 28 06:27 block -> /dev/disk/by-partuuid/5596ac81-0651-4523-a896-6a21d3d78c6e -rw-r--r--. 1 ceph ceph 67108864 Jan 28 06:27 block.db -rw-r--r--. 1 ceph ceph 37 Jan 28 06:27 block_uuid -rw-r--r--. 1 ceph ceph 134217728 Jan 28 06:27 block.wal -rw-r--r--. 1 ceph ceph 2 Jan 28 06:27 bluefs -rw-r--r--. 1 ceph ceph 37 Jan 28 06:27 ceph_fsid -rw-r--r--. 1 ceph ceph 37 Jan 28 06:27 fsid -rw-------. 1 ceph ceph 56 Jan 28 06:27 keyring -rw-r--r--. 1 ceph ceph 8 Jan 28 06:27 kv_backend -rw-r--r--. 1 ceph ceph 21 Jan 28 06:27 magic -rw-r--r--. 1 ceph ceph 6 Jan 28 06:27 ready -rw-r--r--. 1 root root 0 Jan 28 06:27 systemd -rw-r--r--. 1 ceph ceph 10 Jan 28 06:27 type -rw-r--r--. 1 ceph ceph 2 Jan 28 06:27 whoami
which shows as expected with ceph-disk list
/dev/vda : /dev/vda1 other, xfs, mounted on / /dev/vdb : /dev/vdb3 ceph block, for /dev/vdb1 /dev/vdb1 ceph data, active, cluster ceph, osd.2, block /dev/vdb3 /dev/vdc other, unknown /dev/vdd other, unknown <pre> but the osd fails with <pre> 2016-01-27 07:03:50.862489 7f9278afc7c0 0 ceph version 10.0.2-1092-gffcedda (ffcedda1c4986ab66bbf4d57609b05304c70fe89), process ceph-osd, pid 27432 2016-01-27 07:03:50.862617 7f9278afc7c0 5 object store type is bluestore 2016-01-27 07:03:50.862636 7f9278afc7c0 -1 WARNING: experimental feature 'bluestore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2016-01-27 07:03:50.863361 7f9278afc7c0 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/27432 need_addr=1 2016-01-27 07:03:50.863389 7f9278afc7c0 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/27432 need_addr=1 2016-01-27 07:03:50.863403 7f9278afc7c0 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/27432 need_addr=1 2016-01-27 07:03:50.863426 7f9278afc7c0 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6803/27432 need_addr=1 2016-01-27 07:03:50.863435 7f9278afc7c0 -1 write_pid_file: failed to open pid file 'osd.2.pid': (13) Permission denied 2016-01-27 07:03:50.888664 7f9278afc7c0 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-01-27 07:03:50.894635 7f9278afc7c0 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 2016-01-27 07:03:50.898899 7f9278afc7c0 10 load: jerasure load: lrc load: isa 2016-01-27 07:03:50.899202 7f9278afc7c0 1 bluestore(/var/lib/ceph/osd/ceph-2) _open_path using fs driver 'generic' 2016-01-27 07:03:50.899229 7f9278afc7c0 1 -- 0.0.0.0:6800/27432 messenger.start 2016-01-27 07:03:50.899288 7f9278afc7c0 1 -- :/0 messenger.start 2016-01-27 07:03:50.899325 7f9278afc7c0 1 -- 0.0.0.0:6803/27432 messenger.start 2016-01-27 07:03:50.899366 7f9278afc7c0 1 -- 0.0.0.0:6802/27432 messenger.start 2016-01-27 07:03:50.899403 7f9278afc7c0 1 -- 0.0.0.0:6801/27432 messenger.start 2016-01-27 07:03:50.899430 7f9278afc7c0 1 -- :/0 messenger.start 2016-01-27 07:03:50.899571 7f9278afc7c0 2 osd.2 0 mounting /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal 2016-01-27 07:03:50.899580 7f9278afc7c0 1 bluestore(/var/lib/ceph/osd/ceph-2) mount path /var/lib/ceph/osd/ceph-2 2016-01-27 07:03:50.899581 7f9278afc7c0 1 bluestore(/var/lib/ceph/osd/ceph-2) fsck 2016-01-27 07:03:50.899587 7f9278afc7c0 1 bluestore(/var/lib/ceph/osd/ceph-2) _open_path using fs driver 'generic' 2016-01-27 07:03:50.900353 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block 2016-01-27 07:03:50.900429 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) open size 10737418240 (10240 MB) block_size 4096 (4096 B) 2016-01-27 07:03:50.902459 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.db) open path /var/lib/ceph/osd/ceph-2/block.db 2016-01-27 07:03:50.902532 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.db) open size 67108864 (65536 kB) block_size 4096 (4096 B) 2016-01-27 07:03:50.902540 7f9278afc7c0 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-2/block.db size 65536 kB 2016-01-27 07:03:50.904531 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block 2016-01-27 07:03:50.904618 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) open size 10737418240 (10240 MB) block_size 4096 (4096 B) 2016-01-27 07:03:50.904623 7f9278afc7c0 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 10240 MB 2016-01-27 07:03:50.905330 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) open path /var/lib/ceph/osd/ceph-2/block.wal 2016-01-27 07:03:50.905395 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) open size 134217728 (128 MB) block_size 4096 (4096 B) 2016-01-27 07:03:50.905399 7f9278afc7c0 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-2/block.wal size 128 MB 2016-01-27 07:03:50.905449 7f9278afc7c0 1 bluefs mount 2016-01-27 07:03:50.907798 7f9278afc7c0 -1 WARNING: experimental feature 'rocksdb' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. 2016-01-27 07:03:50.907887 7f9278afc7c0 0 set rocksdb option compression = kNoCompression 2016-01-27 07:03:50.907893 7f9278afc7c0 0 set rocksdb option max_write_buffer_number = 16 2016-01-27 07:03:50.907902 7f9278afc7c0 0 set rocksdb option min_write_buffer_number_to_merge = 3 2016-01-27 07:03:50.907905 7f9278afc7c0 0 set rocksdb option recycle_log_file_num = 16 2016-01-27 07:03:50.907926 7f9278afc7c0 0 set rocksdb option compression = kNoCompression 2016-01-27 07:03:50.907929 7f9278afc7c0 0 set rocksdb option max_write_buffer_number = 16 2016-01-27 07:03:50.907932 7f9278afc7c0 0 set rocksdb option min_write_buffer_number_to_merge = 3 2016-01-27 07:03:50.907934 7f9278afc7c0 0 set rocksdb option recycle_log_file_num = 16 2016-01-27 07:03:50.908017 7f9278afc7c0 4 rocksdb: RocksDB version: 4.3.0 2016-01-27 07:03:50.908021 7f9278afc7c0 4 rocksdb: Git sha rocksdb_build_git_sha: 2016-01-27 07:03:50.908029 7f9278afc7c0 4 rocksdb: Compile date Jan 27 2016 2016-01-27 07:03:50.908030 7f9278afc7c0 4 rocksdb: DB SUMMARY 2016-01-27 07:03:50.908046 7f9278afc7c0 4 rocksdb: CURRENT file: CURRENT 2016-01-27 07:03:50.908048 7f9278afc7c0 4 rocksdb: IDENTITY file: IDENTITY 2016-01-27 07:03:50.908053 7f9278afc7c0 4 rocksdb: MANIFEST file: MANIFEST-000008 size: 110 Bytes 2016-01-27 07:03:50.908060 7f9278afc7c0 2 rocksdb: Error when reading /var/lib/ceph/osd/ceph-2/db dir 2016-01-27 07:03:50.908062 7f9278afc7c0 2 rocksdb: Error when reading /var/lib/ceph/osd/ceph-2/db.slow dir 2016-01-27 07:03:50.908068 7f9278afc7c0 4 rocksdb: Write Ahead Log file in db.wal: 000009.log size: 253 ; 2016-01-27 07:03:50.908069 7f9278afc7c0 4 rocksdb: Options.error_if_exists: 0 2016-01-27 07:03:50.908070 7f9278afc7c0 4 rocksdb: Options.create_if_missing: 0 2016-01-27 07:03:50.908071 7f9278afc7c0 4 rocksdb: Options.paranoid_checks: 1 2016-01-27 07:03:50.908072 7f9278afc7c0 4 rocksdb: Options.env: 0x7f928507c060 2016-01-27 07:03:50.908073 7f9278afc7c0 4 rocksdb: Options.info_log: 0x7f928507c340 2016-01-27 07:03:50.908074 7f9278afc7c0 4 rocksdb: Options.max_open_files: 5000 2016-01-27 07:03:50.908075 7f9278afc7c0 4 rocksdb: Options.max_file_opening_threads: 1 2016-01-27 07:03:50.908076 7f9278afc7c0 4 rocksdb: Options.max_total_wal_size: 0 2016-01-27 07:03:50.908077 7f9278afc7c0 4 rocksdb: Options.disableDataSync: 0 2016-01-27 07:03:50.908077 7f9278afc7c0 4 rocksdb: Options.use_fsync: 0 2016-01-27 07:03:50.908078 7f9278afc7c0 4 rocksdb: Options.max_log_file_size: 0 2016-01-27 07:03:50.908079 7f9278afc7c0 4 rocksdb: Options.max_manifest_file_size: 18446744073709551615 2016-01-27 07:03:50.908080 7f9278afc7c0 4 rocksdb: Options.log_file_time_to_roll: 0 2016-01-27 07:03:50.908081 7f9278afc7c0 4 rocksdb: Options.keep_log_file_num: 1000 2016-01-27 07:03:50.908082 7f9278afc7c0 4 rocksdb: Options.recycle_log_file_num: 16 2016-01-27 07:03:50.908083 7f9278afc7c0 4 rocksdb: Options.allow_os_buffer: 1 2016-01-27 07:03:50.908083 7f9278afc7c0 4 rocksdb: Options.allow_mmap_reads: 0 2016-01-27 07:03:50.908084 7f9278afc7c0 4 rocksdb: Options.allow_fallocate: 1 2016-01-27 07:03:50.908085 7f9278afc7c0 4 rocksdb: Options.allow_mmap_writes: 0 2016-01-27 07:03:50.908086 7f9278afc7c0 4 rocksdb: Options.create_missing_column_families: 0 2016-01-27 07:03:50.908087 7f9278afc7c0 4 rocksdb: Options.db_log_dir: 2016-01-27 07:03:50.908088 7f9278afc7c0 4 rocksdb: Options.wal_dir: db.wal 2016-01-27 07:03:50.908088 7f9278afc7c0 4 rocksdb: Options.table_cache_numshardbits: 4 2016-01-27 07:03:50.908089 7f9278afc7c0 4 rocksdb: Options.delete_obsolete_files_period_micros: 21600000000 2016-01-27 07:03:50.908090 7f9278afc7c0 4 rocksdb: Options.max_background_compactions: 1 2016-01-27 07:03:50.908091 7f9278afc7c0 4 rocksdb: Options.max_subcompactions: 1 2016-01-27 07:03:50.908092 7f9278afc7c0 4 rocksdb: Options.max_background_flushes: 1 2016-01-27 07:03:50.908092 7f9278afc7c0 4 rocksdb: Options.WAL_ttl_seconds: 0 2016-01-27 07:03:50.908093 7f9278afc7c0 4 rocksdb: Options.WAL_size_limit_MB: 0 2016-01-27 07:03:50.908094 7f9278afc7c0 4 rocksdb: Options.manifest_preallocation_size: 4194304 2016-01-27 07:03:50.908095 7f9278afc7c0 4 rocksdb: Options.allow_os_buffer: 1 2016-01-27 07:03:50.908096 7f9278afc7c0 4 rocksdb: Options.allow_mmap_reads: 0 2016-01-27 07:03:50.908096 7f9278afc7c0 4 rocksdb: Options.allow_mmap_writes: 0 2016-01-27 07:03:50.908097 7f9278afc7c0 4 rocksdb: Options.is_fd_close_on_exec: 1 2016-01-27 07:03:50.908098 7f9278afc7c0 4 rocksdb: Options.stats_dump_period_sec: 600 2016-01-27 07:03:50.908099 7f9278afc7c0 4 rocksdb: Options.advise_random_on_open: 1 2016-01-27 07:03:50.908100 7f9278afc7c0 4 rocksdb: Options.db_write_buffer_size: 0d 2016-01-27 07:03:50.908101 7f9278afc7c0 4 rocksdb: Options.access_hint_on_compaction_start: NORMAL 2016-01-27 07:03:50.908102 7f9278afc7c0 4 rocksdb: Options.new_table_reader_for_compaction_inputs: 0 2016-01-27 07:03:50.908102 7f9278afc7c0 4 rocksdb: Options.compaction_readahead_size: 0d 2016-01-27 07:03:50.908103 7f9278afc7c0 4 rocksdb: Options.random_access_max_buffer_size: 1048576d 2016-01-27 07:03:50.908104 7f9278afc7c0 4 rocksdb: Options.writable_file_max_buffer_size: 1048576d 2016-01-27 07:03:50.908105 7f9278afc7c0 4 rocksdb: Options.use_adaptive_mutex: 0 2016-01-27 07:03:50.908105 7f9278afc7c0 4 rocksdb: Options.rate_limiter: (nil) 2016-01-27 07:03:50.908107 7f9278afc7c0 4 rocksdb: Options.delete_scheduler.rate_bytes_per_sec: 0 2016-01-27 07:03:50.908109 7f9278afc7c0 4 rocksdb: Options.bytes_per_sync: 0 2016-01-27 07:03:50.908110 7f9278afc7c0 4 rocksdb: Options.wal_bytes_per_sync: 0 2016-01-27 07:03:50.908110 7f9278afc7c0 4 rocksdb: Options.wal_recovery_mode: 0 2016-01-27 07:03:50.908111 7f9278afc7c0 4 rocksdb: Options.enable_thread_tracking: 0 2016-01-27 07:03:50.908112 7f9278afc7c0 4 rocksdb: Options.row_cache: None 2016-01-27 07:03:50.908113 7f9278afc7c0 4 rocksdb: Options.wal_filter: None 2016-01-27 07:03:50.908114 7f9278afc7c0 4 rocksdb: Compression algorithms supported: 2016-01-27 07:03:50.908115 7f9278afc7c0 4 rocksdb: Snappy supported: 1 2016-01-27 07:03:50.908116 7f9278afc7c0 4 rocksdb: Zlib supported: 1 2016-01-27 07:03:50.908117 7f9278afc7c0 4 rocksdb: Bzip supported: 0 2016-01-27 07:03:50.908118 7f9278afc7c0 4 rocksdb: LZ4 supported: 0 2016-01-27 07:03:50.908119 7f9278afc7c0 4 rocksdb: Fast CRC32 supported: 0 2016-01-27 07:03:50.909772 7f9278afc7c0 4 rocksdb: Recovering from manifest file: MANIFEST-000008 2016-01-27 07:03:50.909836 7f9278afc7c0 4 rocksdb: --------------- Options for column family [default]: 2016-01-27 07:03:50.909844 7f9278afc7c0 4 rocksdb: Options.comparator: rocksdb.InternalKeyComparator:leveldb.BytewiseComparator 2016-01-27 07:03:50.909846 7f9278afc7c0 4 rocksdb: Options.merge_operator: None 2016-01-27 07:03:50.909848 7f9278afc7c0 4 rocksdb: Options.compaction_filter: None 2016-01-27 07:03:50.909866 7f9278afc7c0 4 rocksdb: Options.compaction_filter_factory: None 2016-01-27 07:03:50.909868 7f9278afc7c0 4 rocksdb: Options.memtable_factory: SkipListFactory 2016-01-27 07:03:50.909869 7f9278afc7c0 4 rocksdb: Options.table_factory: BlockBasedTable 2016-01-27 07:03:50.909896 7f9278afc7c0 4 rocksdb: table_factory options: flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x7f9284a5c0f0) cache_index_and_filter_blocks: 0 index_type: 0 hash_index_allow_collision: 1 checksum: 1 no_block_cache: 0 block_cache: 0x7f9284a79728 block_cache_size: 8388608 block_cache_compressed: (nil) block_size: 4096 block_size_deviation: 10 block_restart_interval: 16 filter_policy: nullptr whole_key_filtering: 1 skip_table_builder_flush: 0 format_version: 0 2016-01-27 07:03:50.909902 7f9278afc7c0 4 rocksdb: Options.write_buffer_size: 4194304 2016-01-27 07:03:50.909903 7f9278afc7c0 4 rocksdb: Options.max_write_buffer_number: 16 2016-01-27 07:03:50.909905 7f9278afc7c0 4 rocksdb: Options.compression: NoCompression 2016-01-27 07:03:50.909906 7f9278afc7c0 4 rocksdb: Options.prefix_extractor: nullptr 2016-01-27 07:03:50.909907 7f9278afc7c0 4 rocksdb: Options.num_levels: 7 2016-01-27 07:03:50.909908 7f9278afc7c0 4 rocksdb: Options.min_write_buffer_number_to_merge: 3 2016-01-27 07:03:50.909909 7f9278afc7c0 4 rocksdb: Options.max_write_buffer_number_to_maintain: 0 2016-01-27 07:03:50.909910 7f9278afc7c0 4 rocksdb: Options.compression_opts.window_bits: -14 2016-01-27 07:03:50.909911 7f9278afc7c0 4 rocksdb: Options.compression_opts.level: -1 2016-01-27 07:03:50.909912 7f9278afc7c0 4 rocksdb: Options.compression_opts.strategy: 0 2016-01-27 07:03:50.909913 7f9278afc7c0 4 rocksdb: Options.level0_file_num_compaction_trigger: 4 2016-01-27 07:03:50.909914 7f9278afc7c0 4 rocksdb: Options.level0_slowdown_writes_trigger: 20 2016-01-27 07:03:50.909915 7f9278afc7c0 4 rocksdb: Options.level0_stop_writes_trigger: 24 2016-01-27 07:03:50.909916 7f9278afc7c0 4 rocksdb: Options.target_file_size_base: 2097152 2016-01-27 07:03:50.909917 7f9278afc7c0 4 rocksdb: Options.target_file_size_multiplier: 1 2016-01-27 07:03:50.909918 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_base: 10485760 2016-01-27 07:03:50.909919 7f9278afc7c0 4 rocksdb: Options.level_compaction_dynamic_level_bytes: 0 2016-01-27 07:03:50.909921 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier: 10 2016-01-27 07:03:50.909921 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[0]: 1 2016-01-27 07:03:50.909923 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[1]: 1 2016-01-27 07:03:50.909924 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[2]: 1 2016-01-27 07:03:50.909925 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[3]: 1 2016-01-27 07:03:50.909926 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[4]: 1 2016-01-27 07:03:50.909927 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[5]: 1 2016-01-27 07:03:50.909928 7f9278afc7c0 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[6]: 1 2016-01-27 07:03:50.909929 7f9278afc7c0 4 rocksdb: Options.max_sequential_skip_in_iterations: 8 2016-01-27 07:03:50.909930 7f9278afc7c0 4 rocksdb: Options.expanded_compaction_factor: 25 2016-01-27 07:03:50.909931 7f9278afc7c0 4 rocksdb: Options.source_compaction_factor: 1 2016-01-27 07:03:50.909932 7f9278afc7c0 4 rocksdb: Options.max_grandparent_overlap_factor: 10 2016-01-27 07:03:50.909933 7f9278afc7c0 4 rocksdb: Options.arena_block_size: 524288 2016-01-27 07:03:50.909934 7f9278afc7c0 4 rocksdb: Options.soft_pending_compaction_bytes_limit: 0 2016-01-27 07:03:50.909935 7f9278afc7c0 4 rocksdb: Options.hard_pending_compaction_bytes_limit: 0 2016-01-27 07:03:50.909936 7f9278afc7c0 4 rocksdb: Options.rate_limit_delay_max_milliseconds: 1000 2016-01-27 07:03:50.909937 7f9278afc7c0 4 rocksdb: Options.disable_auto_compactions: 0 2016-01-27 07:03:50.909938 7f9278afc7c0 4 rocksdb: Options.filter_deletes: 0 2016-01-27 07:03:50.909939 7f9278afc7c0 4 rocksdb: Options.verify_checksums_in_compaction: 1 2016-01-27 07:03:50.909940 7f9278afc7c0 4 rocksdb: Options.compaction_style: 0 2016-01-27 07:03:50.909941 7f9278afc7c0 4 rocksdb: Options.compaction_pri: 0 2016-01-27 07:03:50.909942 7f9278afc7c0 4 rocksdb: Options.compaction_options_universal.size_ratio: 1 2016-01-27 07:03:50.909943 7f9278afc7c0 4 rocksdb: Options.compaction_options_universal.min_merge_width: 2 2016-01-27 07:03:50.909944 7f9278afc7c0 4 rocksdb: Options.compaction_options_universal.max_merge_width: 4294967295 2016-01-27 07:03:50.909945 7f9278afc7c0 4 rocksdb: Options.compaction_options_universal.max_size_amplification_percent: 200 2016-01-27 07:03:50.909946 7f9278afc7c0 4 rocksdb: Options.compaction_options_universal.compression_size_percent: -1 2016-01-27 07:03:50.909948 7f9278afc7c0 4 rocksdb: Options.compaction_options_fifo.max_table_files_size: 1073741824 2016-01-27 07:03:50.909949 7f9278afc7c0 4 rocksdb: Options.table_properties_collectors: 2016-01-27 07:03:50.909950 7f9278afc7c0 4 rocksdb: Options.inplace_update_support: 0 2016-01-27 07:03:50.909951 7f9278afc7c0 4 rocksdb: Options.inplace_update_num_locks: 10000 2016-01-27 07:03:50.909952 7f9278afc7c0 4 rocksdb: Options.min_partial_merge_operands: 2 2016-01-27 07:03:50.909953 7f9278afc7c0 4 rocksdb: Options.memtable_prefix_bloom_bits: 0 2016-01-27 07:03:50.909954 7f9278afc7c0 4 rocksdb: Options.memtable_prefix_bloom_probes: 6 2016-01-27 07:03:50.909954 7f9278afc7c0 4 rocksdb: Options.memtable_prefix_bloom_huge_page_tlb_size: 0 2016-01-27 07:03:50.909955 7f9278afc7c0 4 rocksdb: Options.bloom_locality: 0 2016-01-27 07:03:50.909956 7f9278afc7c0 4 rocksdb: Options.max_successive_merges: 0 2016-01-27 07:03:50.909957 7f9278afc7c0 4 rocksdb: Options.optimize_fllters_for_hits: 0 2016-01-27 07:03:50.909958 7f9278afc7c0 4 rocksdb: Options.paranoid_file_checks: 0 2016-01-27 07:03:50.909959 7f9278afc7c0 4 rocksdb: Options.compaction_measure_io_stats: 0 2016-01-27 07:03:50.911049 7f9278afc7c0 2 rocksdb: Unable to load table properties for file 4 --- NotFound: 2016-01-27 07:03:50.911077 7f9278afc7c0 4 rocksdb: Recovered from manifest file:db/MANIFEST-000008 succeeded,manifest_file_number is 8, next_file_number is 10, last_sequence is 2, log_number is 0,prev_log_number is 0,max_column_family is 0 2016-01-27 07:03:50.911083 7f9278afc7c0 4 rocksdb: Column family [default] (ID 0), log number is 7 2016-01-27 07:03:50.911162 7f9278afc7c0 -1 rocksdb: Corruption: Can't access /000004.sst: NotFound: 2016-01-27 07:03:50.911170 7f9278afc7c0 -1 bluestore(/var/lib/ceph/osd/ceph-2) _open_db erroring opening db: 2016-01-27 07:03:50.911174 7f9278afc7c0 1 bluefs umount 2016-01-27 07:03:50.921629 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.db) close 2016-01-27 07:03:51.165045 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) close 2016-01-27 07:03:51.409172 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) close 2016-01-27 07:03:51.659355 7f9278afc7c0 1 bdev(/var/lib/ceph/osd/ceph-2/block) close 2016-01-27 07:03:51.907277 7f9278afc7c0 -1 osd.2 0 OSD:init: unable to mount object store 2016-01-27 07:03:51.907317 7f9278afc7c0 -1 [0;31m ** ERROR: osd init failed: (5) Input/output error[0m </pre>
Updated by Loïc Dachary over 8 years ago
Now fails with http://paste.debian.net/377343/
ceph.conf has
enable experimental unrecoverable data corrupting features = * bluestore fsck on mount = true bluestore block size = 5368709120 osd objectstore = bluestore
the data was populated with
-rw-r--r--. 1 root root 187 Jan 29 06:18 activate.monmap lrwxrwxrwx. 1 ceph ceph 58 Jan 29 06:18 block -> /dev/disk/by-partuuid/f04cc152-13bd-4ef0-b4c1-940d564cfa58 -rw-r--r--. 1 ceph ceph 37 Jan 29 06:18 block_uuid -rw-r--r--. 1 ceph ceph 2 Jan 29 06:18 bluefs -rw-r--r--. 1 ceph ceph 37 Jan 29 06:18 ceph_fsid -rw-r--r--. 1 ceph ceph 37 Jan 29 06:18 fsid -rw-r--r--. 1 ceph ceph 8 Jan 29 06:18 kv_backend -rw-r--r--. 1 ceph ceph 21 Jan 29 06:18 magic -rw-r--r--. 1 ceph ceph 10 Jan 29 06:18 type -rw-r--r--. 1 ceph ceph 2 Jan 29 06:18 whoami
where the block symlink was done by ceph-disk, not ceph-osd mkfs.
command_check_call( [ 'ceph-osd', '--cluster', cluster, '--mkfs', '--mkkey', '-i', osd_id, '--monmap', monmap, '--osd-data', path, '--osd-uuid', fsid, '--keyring', os.path.join(path, 'keyring'), '--setuser', get_ceph_user(), '--setgroup', get_ceph_user(), ], )
# ceph-disk list /dev/vda : /dev/vda1 other, xfs, mounted on / /dev/vdb : /dev/vdb3 ceph block, for /dev/vdb1 /dev/vdb1 ceph data, active, cluster ceph, osd.2, block /dev/vdb3 /dev/vdc other, unknown /dev/vdd other, unknown
# sgdisk --print /dev/vdb Disk /dev/vdb: 20971520 sectors, 10.0 GiB Logical sector size: 512 bytes Disk identifier (GUID): CADD3707-6432-4C7C-8608-182417821543 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 20971486 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 10487808 20971486 5.0 GiB FFFF ceph data 3 2048 10487807 5.0 GiB FFFF ceph block
Updated by Loïc Dachary over 8 years ago
<frickler> loicd: regarding http://tracker.ceph.com/issues/13942#note-12, I'm seeing the same error in current master with my CBT-based-testing <frickler> loicd: jewel is working fine for me, however, at least in that regard <loicd> frickler: ah, interesting ! thanks for sharing. Did you ask sage about it ? <frickler> loicd: not yet, I just tested that reverting https://github.com/ceph/ceph/pull/7223 seems to fix it, though <loicd> frickler: good intel :-)
Updated by Loïc Dachary over 8 years ago
- Blocked by Bug #14559: bluestore broken in current master added
Updated by Sage Weil about 8 years ago
- Status changed from In Progress to Resolved