Project

General

Profile

Actions

Feature #13942

closed

ceph-disk: support bluestore

Added by Sage Weil over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

bluestore (newstore) is based on a very small file system (with osd metadata, like the keyring, features, etc.) and one or more block devices. These devices are symlinked from the data directory, similar to how 'journal' is a symlink for the current FileStore.

ceph-disk create:

- create a small partition for osd_data
- create a large partition (remainder of disk, by default) for data
- symlink from $osd_data/block
- [optional] create a mid-size partition for metadata (rocksdb). use probably needs to specify this, since it'll probably be 1/Nth of their available SSD space on the host.
- symlink from $osd_data/block.db
- [optional] create a small partition for the write-ahead-log (basically the journal). default size of 128MB is sufficient.
- symlink from $osd_data/block.wal
(note that block.db is preferable to block.wal as the space will be used for both the wal and sst files. both would be used if the host has HDD, SSD, and NVME or NVRAM.)

- ceph-disk activate:

I think we can fully generalize this to re-use the journal UUID for any subsidiary block device (s/journal/block/ or similar). Then, make activate simply require that all symlinks in $osd_data resolve to devices before activating the OSD. The missing piece is that ceph-disk needs to figure out the uuid from a journal device in order to map it back to the parent osd_data device. Right now it does

out = _check_output(
args=[
'ceph-osd',
'-i', '0', # this is ignored
'--get-journal-uuid',
'--osd-journal',
path,
],
close_fds=True,
)

but I think we need to replace this with some generic-ish way to identifying which OSD the device belongs too. For bluestore I can just stuff the uuid in the first block of the device? And then we can make a --get-device-uuid command that either parses the FileJournal header or a bluestore first-block-has-uuid header?


Related issues 1 (0 open1 closed)

Blocked by Ceph - Bug #14559: bluestore broken in current masterResolved01/29/2016

Actions
Actions #1

Updated by Sage Weil over 8 years ago

  • Status changed from New to 12

https://github.com/ceph/ceph/pull/6759 fixes the block device probing part

Actions #2

Updated by Loïc Dachary over 8 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Loïc Dachary
Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from In Progress to 12
  • Assignee deleted (Loïc Dachary)

For the 'ceph-disk prepare' part, I think we should keep it simple initially:

ceph-disk --osd-objectstore bluestore maindev[:dbdev[:waldev]]

and teach ceph-disk how to do the partitioning for bluestore (no generic way to ask ceph-osd that). We can leave off the db/wal devices initially, and then make activate work, so that there is something functional. Then add dbdev and waldev support last.

Actions #4

Updated by Loïc Dachary over 8 years ago

encryption support will need to extend to block as well as osd-data since the data is no longer in the osd-data partition

Actions #5

Updated by Loïc Dachary over 8 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Loïc Dachary
Actions #6

Updated by Loïc Dachary over 8 years ago

  • Assignee deleted (Loïc Dachary)
Actions #7

Updated by Loïc Dachary over 8 years ago

  • Assignee set to Loïc Dachary
Actions #9

Updated by Loïc Dachary over 8 years ago

Rebase to master complete, make check passes, working on ceph-disk suite problems now.

Actions #10

Updated by Loïc Dachary about 8 years ago

bluestore fails to initialize on a ceph-disk prepared device (no external journal).

Actions #11

Updated by Loïc Dachary about 8 years ago

ceph.conf has

[global]
        enable experimental unrecoverable data corrupting features = *
        bluestore fsck on mount = true
        bluestore block db size = 67108864
        bluestore block wal size = 134217728
        bluestore block size = 5368709120
        osd objectstore = bluestore

ceph-prepare + activate via udev lead to /var/lib/ceph/osd/ceph-2

-rw-r--r--. 1 root root       187 Jan 28 06:27 activate.monmap
-rw-r--r--. 1 ceph ceph         3 Jan 28 06:27 active
lrwxrwxrwx. 1 ceph ceph        58 Jan 28 06:27 block -> /dev/disk/by-partuuid/5596ac81-0651-4523-a896-6a21d3d78c6e
-rw-r--r--. 1 ceph ceph  67108864 Jan 28 06:27 block.db
-rw-r--r--. 1 ceph ceph        37 Jan 28 06:27 block_uuid
-rw-r--r--. 1 ceph ceph 134217728 Jan 28 06:27 block.wal
-rw-r--r--. 1 ceph ceph         2 Jan 28 06:27 bluefs
-rw-r--r--. 1 ceph ceph        37 Jan 28 06:27 ceph_fsid
-rw-r--r--. 1 ceph ceph        37 Jan 28 06:27 fsid
-rw-------. 1 ceph ceph        56 Jan 28 06:27 keyring
-rw-r--r--. 1 ceph ceph         8 Jan 28 06:27 kv_backend
-rw-r--r--. 1 ceph ceph        21 Jan 28 06:27 magic
-rw-r--r--. 1 ceph ceph         6 Jan 28 06:27 ready
-rw-r--r--. 1 root root         0 Jan 28 06:27 systemd
-rw-r--r--. 1 ceph ceph        10 Jan 28 06:27 type
-rw-r--r--. 1 ceph ceph         2 Jan 28 06:27 whoami

which shows as expected with ceph-disk list

/dev/vda :
 /dev/vda1 other, xfs, mounted on /
/dev/vdb :
 /dev/vdb3 ceph block, for /dev/vdb1
 /dev/vdb1 ceph data, active, cluster ceph, osd.2, block /dev/vdb3
/dev/vdc other, unknown
/dev/vdd other, unknown
<pre>

but the osd fails with

<pre>
2016-01-27 07:03:50.862489 7f9278afc7c0  0 ceph version 10.0.2-1092-gffcedda (ffcedda1c4986ab66bbf4d57609b05304c70fe89), process ceph-osd, pid 27432
2016-01-27 07:03:50.862617 7f9278afc7c0  5 object store type is bluestore
2016-01-27 07:03:50.862636 7f9278afc7c0 -1 WARNING: experimental feature 'bluestore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2016-01-27 07:03:50.863361 7f9278afc7c0  1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/27432 need_addr=1
2016-01-27 07:03:50.863389 7f9278afc7c0  1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/27432 need_addr=1
2016-01-27 07:03:50.863403 7f9278afc7c0  1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/27432 need_addr=1
2016-01-27 07:03:50.863426 7f9278afc7c0  1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6803/27432 need_addr=1
2016-01-27 07:03:50.863435 7f9278afc7c0 -1 write_pid_file: failed to open pid file 'osd.2.pid': (13) Permission denied
2016-01-27 07:03:50.888664 7f9278afc7c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-01-27 07:03:50.894635 7f9278afc7c0 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 
2016-01-27 07:03:50.898899 7f9278afc7c0 10 load: jerasure load: lrc load: isa 
2016-01-27 07:03:50.899202 7f9278afc7c0  1 bluestore(/var/lib/ceph/osd/ceph-2) _open_path using fs driver 'generic'
2016-01-27 07:03:50.899229 7f9278afc7c0  1 -- 0.0.0.0:6800/27432 messenger.start
2016-01-27 07:03:50.899288 7f9278afc7c0  1 -- :/0 messenger.start
2016-01-27 07:03:50.899325 7f9278afc7c0  1 -- 0.0.0.0:6803/27432 messenger.start
2016-01-27 07:03:50.899366 7f9278afc7c0  1 -- 0.0.0.0:6802/27432 messenger.start
2016-01-27 07:03:50.899403 7f9278afc7c0  1 -- 0.0.0.0:6801/27432 messenger.start
2016-01-27 07:03:50.899430 7f9278afc7c0  1 -- :/0 messenger.start
2016-01-27 07:03:50.899571 7f9278afc7c0  2 osd.2 0 mounting /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2016-01-27 07:03:50.899580 7f9278afc7c0  1 bluestore(/var/lib/ceph/osd/ceph-2) mount path /var/lib/ceph/osd/ceph-2
2016-01-27 07:03:50.899581 7f9278afc7c0  1 bluestore(/var/lib/ceph/osd/ceph-2) fsck
2016-01-27 07:03:50.899587 7f9278afc7c0  1 bluestore(/var/lib/ceph/osd/ceph-2) _open_path using fs driver 'generic'
2016-01-27 07:03:50.900353 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2016-01-27 07:03:50.900429 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) open size 10737418240 (10240 MB) block_size 4096 (4096 B)
2016-01-27 07:03:50.902459 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.db) open path /var/lib/ceph/osd/ceph-2/block.db
2016-01-27 07:03:50.902532 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.db) open size 67108864 (65536 kB) block_size 4096 (4096 B)
2016-01-27 07:03:50.902540 7f9278afc7c0  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-2/block.db size 65536 kB
2016-01-27 07:03:50.904531 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2016-01-27 07:03:50.904618 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) open size 10737418240 (10240 MB) block_size 4096 (4096 B)
2016-01-27 07:03:50.904623 7f9278afc7c0  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 10240 MB
2016-01-27 07:03:50.905330 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) open path /var/lib/ceph/osd/ceph-2/block.wal
2016-01-27 07:03:50.905395 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) open size 134217728 (128 MB) block_size 4096 (4096 B)
2016-01-27 07:03:50.905399 7f9278afc7c0  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-2/block.wal size 128 MB
2016-01-27 07:03:50.905449 7f9278afc7c0  1 bluefs mount
2016-01-27 07:03:50.907798 7f9278afc7c0 -1 WARNING: experimental feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster.  Do not use
feature with important data.

2016-01-27 07:03:50.907887 7f9278afc7c0  0  set rocksdb option compression = kNoCompression
2016-01-27 07:03:50.907893 7f9278afc7c0  0  set rocksdb option max_write_buffer_number = 16
2016-01-27 07:03:50.907902 7f9278afc7c0  0  set rocksdb option min_write_buffer_number_to_merge = 3
2016-01-27 07:03:50.907905 7f9278afc7c0  0  set rocksdb option recycle_log_file_num = 16
2016-01-27 07:03:50.907926 7f9278afc7c0  0  set rocksdb option compression = kNoCompression
2016-01-27 07:03:50.907929 7f9278afc7c0  0  set rocksdb option max_write_buffer_number = 16
2016-01-27 07:03:50.907932 7f9278afc7c0  0  set rocksdb option min_write_buffer_number_to_merge = 3
2016-01-27 07:03:50.907934 7f9278afc7c0  0  set rocksdb option recycle_log_file_num = 16
2016-01-27 07:03:50.908017 7f9278afc7c0  4 rocksdb: RocksDB version: 4.3.0

2016-01-27 07:03:50.908021 7f9278afc7c0  4 rocksdb: Git sha rocksdb_build_git_sha:
2016-01-27 07:03:50.908029 7f9278afc7c0  4 rocksdb: Compile date Jan 27 2016
2016-01-27 07:03:50.908030 7f9278afc7c0  4 rocksdb: DB SUMMARY

2016-01-27 07:03:50.908046 7f9278afc7c0  4 rocksdb: CURRENT file:  CURRENT

2016-01-27 07:03:50.908048 7f9278afc7c0  4 rocksdb: IDENTITY file:  IDENTITY

2016-01-27 07:03:50.908053 7f9278afc7c0  4 rocksdb: MANIFEST file:  MANIFEST-000008 size: 110 Bytes

2016-01-27 07:03:50.908060 7f9278afc7c0  2 rocksdb: Error when reading /var/lib/ceph/osd/ceph-2/db dir

2016-01-27 07:03:50.908062 7f9278afc7c0  2 rocksdb: Error when reading /var/lib/ceph/osd/ceph-2/db.slow dir

2016-01-27 07:03:50.908068 7f9278afc7c0  4 rocksdb: Write Ahead Log file in db.wal: 000009.log size: 253 ; 

2016-01-27 07:03:50.908069 7f9278afc7c0  4 rocksdb:          Options.error_if_exists: 0
2016-01-27 07:03:50.908070 7f9278afc7c0  4 rocksdb:        Options.create_if_missing: 0
2016-01-27 07:03:50.908071 7f9278afc7c0  4 rocksdb:          Options.paranoid_checks: 1
2016-01-27 07:03:50.908072 7f9278afc7c0  4 rocksdb:                      Options.env: 0x7f928507c060
2016-01-27 07:03:50.908073 7f9278afc7c0  4 rocksdb:                 Options.info_log: 0x7f928507c340
2016-01-27 07:03:50.908074 7f9278afc7c0  4 rocksdb:           Options.max_open_files: 5000
2016-01-27 07:03:50.908075 7f9278afc7c0  4 rocksdb: Options.max_file_opening_threads: 1
2016-01-27 07:03:50.908076 7f9278afc7c0  4 rocksdb:       Options.max_total_wal_size: 0
2016-01-27 07:03:50.908077 7f9278afc7c0  4 rocksdb:        Options.disableDataSync: 0
2016-01-27 07:03:50.908077 7f9278afc7c0  4 rocksdb:              Options.use_fsync: 0
2016-01-27 07:03:50.908078 7f9278afc7c0  4 rocksdb:      Options.max_log_file_size: 0
2016-01-27 07:03:50.908079 7f9278afc7c0  4 rocksdb: Options.max_manifest_file_size: 18446744073709551615
2016-01-27 07:03:50.908080 7f9278afc7c0  4 rocksdb:      Options.log_file_time_to_roll: 0
2016-01-27 07:03:50.908081 7f9278afc7c0  4 rocksdb:      Options.keep_log_file_num: 1000
2016-01-27 07:03:50.908082 7f9278afc7c0  4 rocksdb:   Options.recycle_log_file_num: 16
2016-01-27 07:03:50.908083 7f9278afc7c0  4 rocksdb:        Options.allow_os_buffer: 1
2016-01-27 07:03:50.908083 7f9278afc7c0  4 rocksdb:       Options.allow_mmap_reads: 0
2016-01-27 07:03:50.908084 7f9278afc7c0  4 rocksdb:       Options.allow_fallocate: 1
2016-01-27 07:03:50.908085 7f9278afc7c0  4 rocksdb:      Options.allow_mmap_writes: 0
2016-01-27 07:03:50.908086 7f9278afc7c0  4 rocksdb:          Options.create_missing_column_families: 0
2016-01-27 07:03:50.908087 7f9278afc7c0  4 rocksdb:                              Options.db_log_dir: 
2016-01-27 07:03:50.908088 7f9278afc7c0  4 rocksdb:                                 Options.wal_dir: db.wal
2016-01-27 07:03:50.908088 7f9278afc7c0  4 rocksdb:                Options.table_cache_numshardbits: 4
2016-01-27 07:03:50.908089 7f9278afc7c0  4 rocksdb:     Options.delete_obsolete_files_period_micros: 21600000000
2016-01-27 07:03:50.908090 7f9278afc7c0  4 rocksdb:              Options.max_background_compactions: 1
2016-01-27 07:03:50.908091 7f9278afc7c0  4 rocksdb:                      Options.max_subcompactions: 1
2016-01-27 07:03:50.908092 7f9278afc7c0  4 rocksdb:                  Options.max_background_flushes: 1
2016-01-27 07:03:50.908092 7f9278afc7c0  4 rocksdb:                         Options.WAL_ttl_seconds: 0
2016-01-27 07:03:50.908093 7f9278afc7c0  4 rocksdb:                       Options.WAL_size_limit_MB: 0
2016-01-27 07:03:50.908094 7f9278afc7c0  4 rocksdb:             Options.manifest_preallocation_size: 4194304
2016-01-27 07:03:50.908095 7f9278afc7c0  4 rocksdb:                          Options.allow_os_buffer: 1
2016-01-27 07:03:50.908096 7f9278afc7c0  4 rocksdb:                         Options.allow_mmap_reads: 0
2016-01-27 07:03:50.908096 7f9278afc7c0  4 rocksdb:                        Options.allow_mmap_writes: 0
2016-01-27 07:03:50.908097 7f9278afc7c0  4 rocksdb:                      Options.is_fd_close_on_exec: 1
2016-01-27 07:03:50.908098 7f9278afc7c0  4 rocksdb:                    Options.stats_dump_period_sec: 600
2016-01-27 07:03:50.908099 7f9278afc7c0  4 rocksdb:                    Options.advise_random_on_open: 1
2016-01-27 07:03:50.908100 7f9278afc7c0  4 rocksdb:                     Options.db_write_buffer_size: 0d
2016-01-27 07:03:50.908101 7f9278afc7c0  4 rocksdb:          Options.access_hint_on_compaction_start: NORMAL
2016-01-27 07:03:50.908102 7f9278afc7c0  4 rocksdb:   Options.new_table_reader_for_compaction_inputs: 0
2016-01-27 07:03:50.908102 7f9278afc7c0  4 rocksdb:                Options.compaction_readahead_size: 0d
2016-01-27 07:03:50.908103 7f9278afc7c0  4 rocksdb:                Options.random_access_max_buffer_size: 1048576d
2016-01-27 07:03:50.908104 7f9278afc7c0  4 rocksdb:               Options.writable_file_max_buffer_size: 1048576d
2016-01-27 07:03:50.908105 7f9278afc7c0  4 rocksdb:                       Options.use_adaptive_mutex: 0
2016-01-27 07:03:50.908105 7f9278afc7c0  4 rocksdb:                             Options.rate_limiter: (nil)
2016-01-27 07:03:50.908107 7f9278afc7c0  4 rocksdb:      Options.delete_scheduler.rate_bytes_per_sec: 0
2016-01-27 07:03:50.908109 7f9278afc7c0  4 rocksdb:                           Options.bytes_per_sync: 0
2016-01-27 07:03:50.908110 7f9278afc7c0  4 rocksdb:                       Options.wal_bytes_per_sync: 0
2016-01-27 07:03:50.908110 7f9278afc7c0  4 rocksdb:                        Options.wal_recovery_mode: 0
2016-01-27 07:03:50.908111 7f9278afc7c0  4 rocksdb:                   Options.enable_thread_tracking: 0
2016-01-27 07:03:50.908112 7f9278afc7c0  4 rocksdb:                                Options.row_cache: None
2016-01-27 07:03:50.908113 7f9278afc7c0  4 rocksdb:        Options.wal_filter: None
2016-01-27 07:03:50.908114 7f9278afc7c0  4 rocksdb: Compression algorithms supported:
2016-01-27 07:03:50.908115 7f9278afc7c0  4 rocksdb:     Snappy supported: 1
2016-01-27 07:03:50.908116 7f9278afc7c0  4 rocksdb:     Zlib supported: 1
2016-01-27 07:03:50.908117 7f9278afc7c0  4 rocksdb:     Bzip supported: 0
2016-01-27 07:03:50.908118 7f9278afc7c0  4 rocksdb:     LZ4 supported: 0
2016-01-27 07:03:50.908119 7f9278afc7c0  4 rocksdb: Fast CRC32 supported: 0
2016-01-27 07:03:50.909772 7f9278afc7c0  4 rocksdb: Recovering from manifest file: MANIFEST-000008

2016-01-27 07:03:50.909836 7f9278afc7c0  4 rocksdb: --------------- Options for column family [default]:

2016-01-27 07:03:50.909844 7f9278afc7c0  4 rocksdb:               Options.comparator: rocksdb.InternalKeyComparator:leveldb.BytewiseComparator
2016-01-27 07:03:50.909846 7f9278afc7c0  4 rocksdb:           Options.merge_operator: None
2016-01-27 07:03:50.909848 7f9278afc7c0  4 rocksdb:        Options.compaction_filter: None
2016-01-27 07:03:50.909866 7f9278afc7c0  4 rocksdb:        Options.compaction_filter_factory: None
2016-01-27 07:03:50.909868 7f9278afc7c0  4 rocksdb:         Options.memtable_factory: SkipListFactory
2016-01-27 07:03:50.909869 7f9278afc7c0  4 rocksdb:            Options.table_factory: BlockBasedTable
2016-01-27 07:03:50.909896 7f9278afc7c0  4 rocksdb:            table_factory options:   flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x7f9284a5c0f0)
  cache_index_and_filter_blocks: 0
  index_type: 0
  hash_index_allow_collision: 1
  checksum: 1
  no_block_cache: 0
  block_cache: 0x7f9284a79728
  block_cache_size: 8388608
  block_cache_compressed: (nil)
  block_size: 4096
  block_size_deviation: 10
  block_restart_interval: 16
  filter_policy: nullptr
  whole_key_filtering: 1
  skip_table_builder_flush: 0
  format_version: 0

2016-01-27 07:03:50.909902 7f9278afc7c0  4 rocksdb:        Options.write_buffer_size: 4194304
2016-01-27 07:03:50.909903 7f9278afc7c0  4 rocksdb:  Options.max_write_buffer_number: 16
2016-01-27 07:03:50.909905 7f9278afc7c0  4 rocksdb:          Options.compression: NoCompression
2016-01-27 07:03:50.909906 7f9278afc7c0  4 rocksdb:       Options.prefix_extractor: nullptr
2016-01-27 07:03:50.909907 7f9278afc7c0  4 rocksdb:             Options.num_levels: 7
2016-01-27 07:03:50.909908 7f9278afc7c0  4 rocksdb:        Options.min_write_buffer_number_to_merge: 3
2016-01-27 07:03:50.909909 7f9278afc7c0  4 rocksdb:     Options.max_write_buffer_number_to_maintain: 0
2016-01-27 07:03:50.909910 7f9278afc7c0  4 rocksdb:            Options.compression_opts.window_bits: -14
2016-01-27 07:03:50.909911 7f9278afc7c0  4 rocksdb:                  Options.compression_opts.level: -1
2016-01-27 07:03:50.909912 7f9278afc7c0  4 rocksdb:               Options.compression_opts.strategy: 0
2016-01-27 07:03:50.909913 7f9278afc7c0  4 rocksdb:      Options.level0_file_num_compaction_trigger: 4
2016-01-27 07:03:50.909914 7f9278afc7c0  4 rocksdb:          Options.level0_slowdown_writes_trigger: 20
2016-01-27 07:03:50.909915 7f9278afc7c0  4 rocksdb:              Options.level0_stop_writes_trigger: 24
2016-01-27 07:03:50.909916 7f9278afc7c0  4 rocksdb:                   Options.target_file_size_base: 2097152
2016-01-27 07:03:50.909917 7f9278afc7c0  4 rocksdb:             Options.target_file_size_multiplier: 1
2016-01-27 07:03:50.909918 7f9278afc7c0  4 rocksdb:                Options.max_bytes_for_level_base: 10485760
2016-01-27 07:03:50.909919 7f9278afc7c0  4 rocksdb: Options.level_compaction_dynamic_level_bytes: 0
2016-01-27 07:03:50.909921 7f9278afc7c0  4 rocksdb:          Options.max_bytes_for_level_multiplier: 10
2016-01-27 07:03:50.909921 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[0]: 1
2016-01-27 07:03:50.909923 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[1]: 1
2016-01-27 07:03:50.909924 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[2]: 1
2016-01-27 07:03:50.909925 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[3]: 1
2016-01-27 07:03:50.909926 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[4]: 1
2016-01-27 07:03:50.909927 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[5]: 1
2016-01-27 07:03:50.909928 7f9278afc7c0  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[6]: 1
2016-01-27 07:03:50.909929 7f9278afc7c0  4 rocksdb:       Options.max_sequential_skip_in_iterations: 8
2016-01-27 07:03:50.909930 7f9278afc7c0  4 rocksdb:              Options.expanded_compaction_factor: 25
2016-01-27 07:03:50.909931 7f9278afc7c0  4 rocksdb:                Options.source_compaction_factor: 1
2016-01-27 07:03:50.909932 7f9278afc7c0  4 rocksdb:          Options.max_grandparent_overlap_factor: 10
2016-01-27 07:03:50.909933 7f9278afc7c0  4 rocksdb:                        Options.arena_block_size: 524288
2016-01-27 07:03:50.909934 7f9278afc7c0  4 rocksdb:   Options.soft_pending_compaction_bytes_limit: 0
2016-01-27 07:03:50.909935 7f9278afc7c0  4 rocksdb:   Options.hard_pending_compaction_bytes_limit: 0
2016-01-27 07:03:50.909936 7f9278afc7c0  4 rocksdb:       Options.rate_limit_delay_max_milliseconds: 1000
2016-01-27 07:03:50.909937 7f9278afc7c0  4 rocksdb:                Options.disable_auto_compactions: 0
2016-01-27 07:03:50.909938 7f9278afc7c0  4 rocksdb:                           Options.filter_deletes: 0
2016-01-27 07:03:50.909939 7f9278afc7c0  4 rocksdb:           Options.verify_checksums_in_compaction: 1
2016-01-27 07:03:50.909940 7f9278afc7c0  4 rocksdb:                         Options.compaction_style: 0
2016-01-27 07:03:50.909941 7f9278afc7c0  4 rocksdb:                           Options.compaction_pri: 0
2016-01-27 07:03:50.909942 7f9278afc7c0  4 rocksdb:  Options.compaction_options_universal.size_ratio: 1
2016-01-27 07:03:50.909943 7f9278afc7c0  4 rocksdb: Options.compaction_options_universal.min_merge_width: 2
2016-01-27 07:03:50.909944 7f9278afc7c0  4 rocksdb: Options.compaction_options_universal.max_merge_width: 4294967295
2016-01-27 07:03:50.909945 7f9278afc7c0  4 rocksdb: Options.compaction_options_universal.max_size_amplification_percent: 200
2016-01-27 07:03:50.909946 7f9278afc7c0  4 rocksdb: Options.compaction_options_universal.compression_size_percent: -1
2016-01-27 07:03:50.909948 7f9278afc7c0  4 rocksdb: Options.compaction_options_fifo.max_table_files_size: 1073741824
2016-01-27 07:03:50.909949 7f9278afc7c0  4 rocksdb:                   Options.table_properties_collectors: 
2016-01-27 07:03:50.909950 7f9278afc7c0  4 rocksdb:                   Options.inplace_update_support: 0
2016-01-27 07:03:50.909951 7f9278afc7c0  4 rocksdb:                 Options.inplace_update_num_locks: 10000
2016-01-27 07:03:50.909952 7f9278afc7c0  4 rocksdb:               Options.min_partial_merge_operands: 2
2016-01-27 07:03:50.909953 7f9278afc7c0  4 rocksdb:               Options.memtable_prefix_bloom_bits: 0
2016-01-27 07:03:50.909954 7f9278afc7c0  4 rocksdb:             Options.memtable_prefix_bloom_probes: 6
2016-01-27 07:03:50.909954 7f9278afc7c0  4 rocksdb:   Options.memtable_prefix_bloom_huge_page_tlb_size: 0
2016-01-27 07:03:50.909955 7f9278afc7c0  4 rocksdb:                           Options.bloom_locality: 0
2016-01-27 07:03:50.909956 7f9278afc7c0  4 rocksdb:                    Options.max_successive_merges: 0
2016-01-27 07:03:50.909957 7f9278afc7c0  4 rocksdb:                Options.optimize_fllters_for_hits: 0
2016-01-27 07:03:50.909958 7f9278afc7c0  4 rocksdb:                Options.paranoid_file_checks: 0
2016-01-27 07:03:50.909959 7f9278afc7c0  4 rocksdb:                Options.compaction_measure_io_stats: 0
2016-01-27 07:03:50.911049 7f9278afc7c0  2 rocksdb: Unable to load table properties for file 4 --- NotFound: 

2016-01-27 07:03:50.911077 7f9278afc7c0  4 rocksdb: Recovered from manifest file:db/MANIFEST-000008 succeeded,manifest_file_number is 8, next_file_number is 10, last_sequence is 2, log_number is 0,prev_log_number is 0,max_column_family is 0

2016-01-27 07:03:50.911083 7f9278afc7c0  4 rocksdb: Column family [default] (ID 0), log number is 7

2016-01-27 07:03:50.911162 7f9278afc7c0 -1 rocksdb: Corruption: Can't access /000004.sst: NotFound: 

2016-01-27 07:03:50.911170 7f9278afc7c0 -1 bluestore(/var/lib/ceph/osd/ceph-2) _open_db erroring opening db: 
2016-01-27 07:03:50.911174 7f9278afc7c0  1 bluefs umount
2016-01-27 07:03:50.921629 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.db) close
2016-01-27 07:03:51.165045 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) close
2016-01-27 07:03:51.409172 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block.wal) close
2016-01-27 07:03:51.659355 7f9278afc7c0  1 bdev(/var/lib/ceph/osd/ceph-2/block) close
2016-01-27 07:03:51.907277 7f9278afc7c0 -1 osd.2 0 OSD:init: unable to mount object store
2016-01-27 07:03:51.907317 7f9278afc7c0 -1 [0;31m ** ERROR: osd init failed: (5) Input/output error[0m

</pre>

Actions #12

Updated by Loïc Dachary about 8 years ago

Now fails with http://paste.debian.net/377343/

ceph.conf has

        enable experimental unrecoverable data corrupting features = *
        bluestore fsck on mount = true
        bluestore block size = 5368709120
        osd objectstore = bluestore

the data was populated with

-rw-r--r--. 1 root root 187 Jan 29 06:18 activate.monmap
lrwxrwxrwx. 1 ceph ceph  58 Jan 29 06:18 block -> /dev/disk/by-partuuid/f04cc152-13bd-4ef0-b4c1-940d564cfa58
-rw-r--r--. 1 ceph ceph  37 Jan 29 06:18 block_uuid
-rw-r--r--. 1 ceph ceph   2 Jan 29 06:18 bluefs
-rw-r--r--. 1 ceph ceph  37 Jan 29 06:18 ceph_fsid
-rw-r--r--. 1 ceph ceph  37 Jan 29 06:18 fsid
-rw-r--r--. 1 ceph ceph   8 Jan 29 06:18 kv_backend
-rw-r--r--. 1 ceph ceph  21 Jan 29 06:18 magic
-rw-r--r--. 1 ceph ceph  10 Jan 29 06:18 type
-rw-r--r--. 1 ceph ceph   2 Jan 29 06:18 whoami

where the block symlink was done by ceph-disk, not ceph-osd mkfs.

        command_check_call(
            [
                'ceph-osd',
                '--cluster', cluster,
                '--mkfs',
                '--mkkey',
                '-i', osd_id,
                '--monmap', monmap,
                '--osd-data', path,
                '--osd-uuid', fsid,
                '--keyring', os.path.join(path, 'keyring'),
                '--setuser', get_ceph_user(),
                '--setgroup', get_ceph_user(),
            ],
        )
# ceph-disk list
/dev/vda :
 /dev/vda1 other, xfs, mounted on /
/dev/vdb :
 /dev/vdb3 ceph block, for /dev/vdb1
 /dev/vdb1 ceph data, active, cluster ceph, osd.2, block /dev/vdb3
/dev/vdc other, unknown
/dev/vdd other, unknown
# sgdisk --print /dev/vdb
Disk /dev/vdb: 20971520 sectors, 10.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): CADD3707-6432-4C7C-8608-182417821543
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 20971486
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1        10487808        20971486   5.0 GiB     FFFF  ceph data
   3            2048        10487807   5.0 GiB     FFFF  ceph block
Actions #13

Updated by Loïc Dachary about 8 years ago

<frickler> loicd: regarding http://tracker.ceph.com/issues/13942#note-12, I'm seeing the same error in current master with my CBT-based-testing
<frickler> loicd: jewel is working fine for me, however, at least in that regard
<loicd> frickler: ah, interesting ! thanks for sharing. Did you ask sage about it ? 
<frickler> loicd: not yet, I just tested that reverting https://github.com/ceph/ceph/pull/7223 seems to fix it, though
<loicd> frickler: good intel :-)
Actions #14

Updated by Loïc Dachary about 8 years ago

  • Blocked by Bug #14559: bluestore broken in current master added
Actions #15

Updated by Sage Weil about 8 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF