Bug #64444: No Valid allocation info on disk (empty file) - bluestore - Ceph

Actions

Copy link

Bug #64444

open

No Valid allocation info on disk (empty file)

Added by Pablo Higueras 3 months ago. Updated 3 months ago.

Status:

Triaged

Priority:

Normal

Assignee:

Target version:

Ceph - v17.2.8

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.9, Ceph - v17.2.6, Ceph - v17.2.7

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi!

For over a year now we are suffering from high start-up times in our OSDs (more than an hour).

This is something that does not always happen, but it is frequent.

Here is a trace of the log file when this happens:

2024-02-12T10:32:22.378+0000 7f3f0e837540  1 freelist init
2024-02-12T10:32:22.378+0000 7f3f0e837540  1 freelist _read_cfg
2024-02-12T10:32:22.498+0000 7f3f0e837540 -1 bluestore::NCB::__restore_allocator::No Valid allocation info on disk (empty file)
2024-02-12T10:32:22.498+0000 7f3f0e837540  0 bluestore(/var/lib/ceph/osd/ceph-54) _init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while) ...
2024-02-12T11:55:42.607+0000 7f3f0e837540  1 HybridAllocator _spillover_range constructing fallback allocator
2024-02-12T11:56:00.555+0000 7f3f0e837540  1 bluestore::NCB::read_allocation_from_drive_on_startup::::Allocation Recovery was completed in 5018.058711 seconds, extent_count=585636576
2024-02-12T11:56:00.555+0000 7f3f0e837540  1 bluestore(/var/lib/ceph/osd/ceph-54) _init_alloc loaded 0 B in 0 extents, allocator type hybrid, capacity 0x6fc7c000000, block size 0x1000, free 0x30e6e1f6000, fragmentation 0.45
2024-02-12T11:56:00.559+0000 7f3f0e837540  4 rocksdb: [db/db_impl/db_impl.cc:446] Shutdown: canceling all background work
2024-02-12T11:56:00.807+0000 7f3f0e837540  4 rocksdb: [db/db_impl/db_impl.cc:625] Shutdown complete
2024-02-12T11:56:00.895+0000 7f3f0e837540  1 bluefs umount
2024-02-12T11:56:00.911+0000 7f3f0e837540  1 bdev(0x55f93a1b9800 /var/lib/ceph/osd/ceph-54/block) close
2024-02-12T11:56:01.131+0000 7f3f0e837540  1 bdev(0x55f93a1b9800 /var/lib/ceph/osd/ceph-54/block) open path /var/lib/ceph/osd/ceph-54/block
2024-02-12T11:56:01.135+0000 7f3f0e837540  1 bdev(0x55f93a1b9800 /var/lib/ceph/osd/ceph-54/block) open size 7681481900032 (0x6fc7c000000, 7.0 TiB) block_size 4096 (4 KiB) non-rotational device, discard supported
2024-02-12T11:56:01.135+0000 7f3f0e837540  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-54/block size 7.0 TiB
2024-02-12T11:56:01.135+0000 7f3f0e837540  1 bluefs mount
2024-02-12T11:56:01.135+0000 7f3f0e837540  1 bluefs _init_alloc shared, id 1, capacity 0x6fc7c000000, block size 0x10000
2024-02-12T11:56:04.311+0000 7f3f0e837540  1 bluefs mount shared_bdev_used = 1407900450816
2024-02-12T11:56:04.311+0000 7f3f0e837540  1 bluestore(/var/lib/ceph/osd/ceph-54) _prepare_db_environment set db_paths to db,7297407805030 db.slow,7297407805030
2024-02-12T11:56:04.427+0000 7f3f0e837540  4 rocksdb: RocksDB version: 6.15.5

There is not much information about this problem, we have been investigating this issue (https://tracker.ceph.com/issues/48276), but it does not seem to help us.

We are looking for any kind of help, we don't know what to do.

Thank you!

Actions

Copy link

Updated by Igor Fedotov 3 months ago

Hi Pablo!

Long startup is caused by the need to rebuild allocation map after non-graceful OSD shutdown. So apparently your OSD had crashed before that recovery comes to a scene. Please check/share the logs. You might also want to compact BlueStore's DB using ceph-kvstore-tool's compact command - this could provide some improvement for DB scanning times. Not guaranteed and could have temporary effect though.

Actions

Copy link

Updated by Pablo Higueras 3 months ago

Igor Fedotov wrote:

Hi Pablo!

Long startup is caused by the need to rebuild allocation map after non-graceful OSD shutdown. So apparently your OSD had crashed before that recovery comes to a scene. Please check/share the logs. You might also want to compact BlueStore's DB using ceph-kvstore-tool's compact command - this could provide some improvement for DB scanning times. Not guaranteed and could have temporary effect though.

Hi Igor,

we see this issue after an OSD crash. Here are the logs of the crash:

** Compaction Stats [P] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       6.1      6.1       0.0   1.0      0.0     76.2     81.96             77.32       288    0.285       0      0
  L1      2/0   87.51 MB   0.3     28.7     6.1     22.6      22.6     -0.0       0.0   3.7    149.5    117.7    196.25            127.45       263    0.746    145M    72M
 Sum      2/0   87.51 MB   0.0     28.7     6.1     22.6      28.7      6.1       0.0   4.7    105.5    105.5 
2024-02-12T10:31:40.566+0000 7f042dd91700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f042dd91700 thread_name:safe_timer

 ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f043b1b4cf0]
 2: /usr/bin/ceph-osd(+0x90f29b) [0x55b23d30029b]
 3: /usr/bin/ceph-osd(+0x90f346) [0x55b23d300346]
 4: (ScrubQueue::collect_ripe_jobs(std::vector<boost::intrusive_ptr<ScrubQueue::ScrubJob>, std::allocator<boost::intrusive_ptr<ScrubQueue::ScrubJob> > >&, utime_t)+0x193) [0x55b23d301e03]
 5: (ScrubQueue::select_pg_and_scrub(Scrub::ScrubPreconds&)+0x11d) [0x55b23d305a5d]
 6: (OSD::sched_scrub()+0x21f) [0x55b23d08669f]
 7: (OSD::tick_without_osd_lock()+0x578) [0x55b23d0bb928]
 8: (Context::complete(int)+0xd) [0x55b23d0d6ebd]
 9: (CommonSafeTimer<std::mutex>::timer_thread()+0x12f) [0x55b23d80553f]
 10: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x55b23d806631]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7f043b1aa1ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
 -9999> 2024-02-12T10:31:30.034+0000 7f0418d67700  1 get compressor zstd = 0x55b29e5a80c0
 -9998> 2024-02-12T10:31:30.034+0000 7f0418d67700  1 get compressor zstd = 0x55b29e5a80c0
 -9997> 2024-02-12T10:31:30.034+0000 7f0419d69700  2 osd.54 pg_epoch: 82022 pg[6.349( v 82022'155550050 (81531'155544902,82022'155550050] local-lis/les=81363/81364 n=10342068 ec=43333/180 lis/c=81363/81363 les/c/f=81364/81364/79096 sis=81363) [54,43,14] r=0 lpr=81363 crt=82022'155550050 lcod 82022'155550049 mlcod 82022'155550049 active+clean+scrubbing [ 6.349:  ] ] scrubber <WaitReplicas/>: scrub_compare_maps replica 14 has 25 items
 -9996> 2024-02-12T10:31:30.034+0000 7f0419d69700  2 osd.54 pg_epoch: 82022 pg[6.349( v 82022'155550050 (81531'155544902,82022'155550050] local-lis/les=81363/81364 n=10342068 ec=43333/180 lis/c=81363/81363 les/c/f=81364/81364/79096 sis=81363) [54,43,14] r=0 lpr=81363 crt=82022'155550050 lcod 82022'155550049 mlcod 82022'155550049 active+clean+scrubbing [ 6.349:  ] ] scrubber <WaitReplicas/>: scrub_compare_maps replica 43 has 25 items

Those crashes happens sporadically and we don't know the root cause of them. Maybe you can guide us on how to track the problem.

Regards

Actions

Copy link

Updated by Igor Fedotov 3 months ago

Unfortunately I'm unaware of any similar issue with ScrubQueue. Neither I can find any recent fixes around so likely that's a new bug. Please file another ticket under RADOS subproject.

And likely you'll suffer from long OSD startup after crashes until the above is fixed. Additionally I'm planning to make a feature which enables non-persistent allocation map (and hence needs for its recovery) optional - actually it was introduced in Quincy IIRC. But that would take a while..

Actions

Copy link

Updated by Igor Fedotov 3 months ago

Status changed from New to Triaged

Actions

Copy link

Updated by Igor Fedotov 3 months ago

And yeah - you might want to do offline DB compaction for an arbitrary OSD and learn if startup time improves after that - just "kill -9" OSD's process started after compaction - this will trigger allocation map rebuild as well.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #64444

No Valid allocation info on disk (empty file)

Updated by Igor Fedotov 3 months ago

Updated by Pablo Higueras 3 months ago

Updated by Igor Fedotov 3 months ago

Updated by Igor Fedotov 3 months ago

Updated by Igor Fedotov 3 months ago