Bug #44072
openAdd new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors=missing)
0%
Description
Hi,
I sat severity=Critical for attention grabbing because i think is serious problem!
We have two different Luminous clusters (12.2.12). All osd pools are replicated with size=3 min_size=2. Clusters used as S3 (RadosGW).
Upgrade to Luminous has completed about 1.5 years ago. All recommended flags were set ('sortbitwise' etc.).
Before now all OSDs were filestore (journal on SSD) and everything was fine.
About 3 month ago we added first BS OSDs to our small cluster.
After some time we got issue 'pgs inconsistent': https://tracker.ceph.com/issues/43174
After facing this issue we try to add first BS OSDs to our second cluster but we set for them primary-affinity=0. We thought this can help.
After about a month i have saw that many of PGs on this BS have successfully scrabed without any errors.
But today we have first 'pgs inconsistent' error on second cluster. One OSD is BS.
Some info about our clusters settings:
ceph osd dump | head -n 12
epoch 290315
fsid {truncated}
created 2015-07-31 16:05:27.389478
modified 2020-02-11 10:03:24.517865
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 1946
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
ceph.conf
[osd]
debug filestore = 0
debug journal = 0
debug ms = 0
debug osd = 0
filestore fd cache size = 512
filestore op threads = 6
# TODO: reduce these timeouts after enable autoresharding
filestore op thread timeout = 180
filestore op thread suicide timeout = 240
osd enable op tracker = false
osd journal size = 1000
osd max backfills = 1
osd recovery max active = 1
osd recovery sleep hdd = 0.2
osd scrub begin hour = 0
osd scrub end hour = 8
osd scrub sleep = 2
osd scrub chunk min = 1
osd scrub chunk max = 2
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
osd disk threads = 6
# formerly known as 'osd op threads'
osd peering wq threads = 6
# TODO: reduce this timeout after enable autoresharding
osd op thread timeout = 120
throttler perf counter = false
# filestore OSD
[osd.0]
osd uuid = {truncated}-89fb-000000000000
host = xx
public addr = xx.xx.xx.xx
osd journal = /dev/ceph1/journal-0
# blustore OSD
[osd.490]
osd uuid = {truncated}-89fb-000000000490
host = xxx
public addr = xx.xx.xx.xx
I don't write more details because all usefull information have already written here: https://tracker.ceph.com/issues/43174
It's issue very important for us because we can't continue migrate to BS. One of our cluster is big (more then 1 PB data) and we can't quickly and easy migrate all FS OSDs to BS. And we can't ignore scrub during migration because scrub is very important for data consistency.
Updated by David Zafman about 4 years ago
Two questions:
Do all the objects with missing copies have names that included multi-byte characters?
Are the OSDs with missing copies always filestore or always bluestore?
Updated by Aleksandr Rudenko about 4 years ago
Hi, David
Do all the objects with missing copies have names that included multi-byte characters?
yes, most of missing objects have names that included multi-byte characters.
But there are objects with ASCII-only names, for example:
ovBck/SEKRETERYA 13.09.2019/SEVGI belgelerim/FATURA/OXETTE ucret.doc
This name was checked by grep
Overall statistic is ~90 missing object's names included multi-byte characters and ~3 object's names NOT included multi-byte characters.
Are the OSDs with missing copies always filestore or always bluestore?
No. For most PGs i can see missing objects on FS OSDs but sometimes i can see missing objects on BS too.
Updated by Aleksandr Rudenko about 4 years ago
grep for checking ASCII-only names:
grep -v -P "[^\x00-\x7F]"