Project

General

Profile

Actions

Bug #47985

open

When WAL is closed, osd cannot be restarted

Added by jiaxu li over 3 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Compile the master branch source code, use vstart to deploy the cluster, close bluestore wal during deployment, and place bluestore wal/db and block on different disks. After the deployment is complete, restart the osd, the osd restart fails, and the error message: unable to read osd superblock. The problem is not necessarily present, but the probability is high. The cluster does not use vstart deployment also has this problem.

The problem reproduced as follows:
1. compile and install

2. deploy the cluster

3. bluestore wal info
  1. ./bin/ceph daemon osd.0 config show | grep rocksdb
    • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
      "bluestore_kvbackend": "rocksdb",
      "bluestore_rocksdb_cf": "true",
      "bluestore_rocksdb_cfs": "m(3) O(3,0-13) L",
      "bluestore_rocksdb_options": "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,disableWAL=true",
      ......
4. bluestore path info
  1. tree dev/
    dev/
    ├── mgr.x
    │   └── keyring
    ├── mon.a
    │   ├── kv_backend
    │   ├── min_mon_release
    │   └── store.db
    │   ├── 000039.log
    │   ├── 000041.sst
    │   ├── CURRENT
    │   ├── IDENTITY
    │   ├── LOCK
    │   ├── MANIFEST-000020
    │   ├── OPTIONS-000008
    │   └── OPTIONS-000023
    ├── osd0
    │   ├── bfm_blocks
    │   ├── bfm_blocks_per_key
    │   ├── bfm_bytes_per_block
    │   ├── bfm_size
    │   ├── block -> /dev/sda
    │   ├── block.db -> /dev/sdg6
    │   ├── block.wal -> /dev/sdg5
    │   ├── bluefs
    │   ├── ceph_fsid
    │   ├── fsid
    │   ├── keyring
    │   ├── kv_backend
    │   ├── magic
    │   ├── mkfs_done
    │   ├── ready
    │   ├── require_osd_release
    │   ├── type
    │   └── whoami
    ├── osd1
    │   ├── bfm_blocks
    │   ├── bfm_blocks_per_key
    │   ├── bfm_bytes_per_block
    │   ├── bfm_size
    │   ├── block -> /dev/sdb
    │   ├── block.db -> /dev/sdg8
    │   ├── block.wal -> /dev/sdg7
    │   ├── bluefs
    │   ├── ceph_fsid
    │   ├── fsid
    │   ├── keyring
    │   ├── kv_backend
    │   ├── magic
    │   ├── mkfs_done
    │   ├── ready
    │   ├── require_osd_release
    │   ├── type
    │   └── whoami
    ├── osd2
    │   ├── bfm_blocks
    │   ├── bfm_blocks_per_key
    │   ├── bfm_bytes_per_block
    │   ├── bfm_size
    │   ├── block -> /dev/sdc
    │   ├── block.db -> /dev/sdg10
    │   ├── block.wal -> /dev/sdg9
    │   ├── bluefs
    │   ├── ceph_fsid
    │   ├── fsid
    │   ├── keyring
    │   ├── kv_backend
    │   ├── magic
    │   ├── mkfs_done
    │   ├── ready
    │   ├── require_osd_release
    │   ├── type
    │   └── whoami
    ├── osd3
    │   ├── bfm_blocks
    │   ├── bfm_blocks_per_key
    │   ├── bfm_bytes_per_block
    │   ├── bfm_size
    │   ├── block -> /dev/sdd
    │   ├── block.db -> /dev/sdg12
    │   ├── block.wal -> /dev/sdg11
    │   ├── bluefs
    │   ├── ceph_fsid
    │   ├── fsid
    │   ├── keyring
    │   ├── kv_backend
    │   ├── magic
    │   ├── mkfs_done
    │   ├── ready
    │   ├── require_osd_release
    │   ├── type
    │   └── whoami
    └── osd4
    ├── bfm_blocks
    ├── bfm_blocks_per_key
    ├── bfm_bytes_per_block
    ├── bfm_size
    ├── block -> /dev/sde
    ├── block.db -> /dev/sdg14
    ├── block.wal -> /dev/sdg13
    ├── bluefs
    ├── ceph_fsid
    ├── fsid
    ├── keyring
    ├── kv_backend
    ├── magic
    ├── mkfs_done
    ├── ready
    ├── require_osd_release
    ├── type
    └── whoami
    8 directories, 101 files
5.get the osd process id
  1. ps -ef | grep ceph
    root 2719 1 1 16:36 ? 00:00:09 /home/ljx/ceph-master/build/bin/ceph-mgr -i x -c /home/ljx/ceph-master/build/ceph.conf
    root 2801 1 0 16:36 ? 00:00:06 /home/ljx/ceph-master/build/bin/ceph-mon -i a -c /home/ljx/ceph-master/build/ceph.conf
    root 3792 1 0 16:39 ? 00:00:02 ./bin/ceph-osd -i 0
    root 5262 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 1
    root 6733 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 2
    root 8203 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 3
    root 9673 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 4
    root 10384 1549 0 16:48 pts/0 00:00:00 grep --color=auto ceph
6.restart osd.1
  1. kill -9 5262
  2. ps -ef | grep ceph
    root 2719 1 1 16:36 ? 00:00:09 /home/ljx/ceph-master/build/bin/ceph-mgr -i x -c /home/ljx/ceph-master/build/ceph.conf
    root 2801 1 0 16:36 ? 00:00:06 /home/ljx/ceph-master/build/bin/ceph-mon -i a -c /home/ljx/ceph-master/build/ceph.conf
    root 3792 1 0 16:39 ? 00:00:02 ./bin/ceph-osd -i 0
    root 6733 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 2
    root 8203 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 3
    root 9673 1 0 16:40 ? 00:00:02 ./bin/ceph-osd -i 4
    root 10389 1549 0 16:49 pts/0 00:00:00 grep --color=auto ceph
  1. ./bin/ceph-osd -i 1
    2020-10-26T16:49:09.456+0800 7f1eadb01f40 -1 WARNING: all dangerous and experimental features are enabled.
    2020-10-26T16:49:09.462+0800 7f1eadb01f40 -1 WARNING: all dangerous and experimental features are enabled.
    2020-10-26T16:49:09.466+0800 7f1eadb01f40 -1 WARNING: all dangerous and experimental features are enabled.
    2020-10-26T16:49:10.547+0800 7f1eadb01f40 -1 Falling back to public interface
    2020-10-26T16:49:13.552+0800 7f1eadb01f40 -1 bluestore(/home/ljx/ceph-master/build/dev/osd1/) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x4319887b, expected 0xa1f41464, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
    2020-10-26T16:49:13.552+0800 7f1eadb01f40 -1 bluestore(/home/ljx/ceph-master/build/dev/osd1/) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x4319887b, expected 0xa1f41464, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
    2020-10-26T16:49:13.553+0800 7f1eadb01f40 -1 bluestore(/home/ljx/ceph-master/build/dev/osd1/) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x4319887b, expected 0xa1f41464, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
    2020-10-26T16:49:13.562+0800 7f1eadb01f40 -1 bluestore(/home/ljx/ceph-master/build/dev/osd1/) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x4319887b, expected 0xa1f41464, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
    2020-10-26T16:49:13.562+0800 7f1eadb01f40 -1 osd.1 0 OSD::init() : unable to read osd superblock
    2020-10-26T16:49:14.333+0800 7f1eadb01f40 -1 ** ERROR: osd init failed: (22) Invalid argument
Operating system version and source code version
  1. cat /etc/redhat-release
    CentOS Linux release 8.2.2004 (Core)
  2. ./bin/ceph -v
    • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
      ceph version 16.0.0-6584-gcdf596c8ca (cdf596c8ca3646908c04f922261a22058fa8730e) pacific (dev)

Files

rgw-P99.9lantecy-before-after-disable-wal.png (116 KB) rgw-P99.9lantecy-before-after-disable-wal.png rgw put P99.9 lantecy before after osd disable wal Jiaying Ren, 10/28/2020 06:00 AM
Actions

Also available in: Atom PDF