Bug #50992
opensrc/common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0)
0%
Description
When I use ceph-volume to add a new OSD, the following error appears
root@yujiang-performance-dev:~/github/ceph/build# ceph-volume lvm create --bluestore --data /dev/nvme0n1 Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 88631385-1de2-4168-9c4e-8b3b3e292355 Running command: /sbin/vgcreate --force --yes ceph-f32a7f97-3199-4cce-80f1-31923e35ca1f /dev/nvme0n1 stdout: Physical volume "/dev/nvme0n1" successfully created. stdout: Volume group "ceph-f32a7f97-3199-4cce-80f1-31923e35ca1f" successfully created Running command: /sbin/lvcreate --yes -l 476932 -n osd-block-88631385-1de2-4168-9c4e-8b3b3e292355 ceph-f32a7f97-3199-4cce-80f1-31923e35ca1f stdout: Logical volume "osd-block-88631385-1de2-4168-9c4e-8b3b3e292355" created. Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0 --> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin Running command: /bin/chown -h ceph:ceph /dev/ceph-f32a7f97-3199-4cce-80f1-31923e35ca1f/osd-block-88631385-1de2-4168-9c4e-8b3b3e292355 Running command: /bin/chown -R ceph:ceph /dev/dm-0 Running command: /bin/ln -s /dev/ceph-f32a7f97-3199-4cce-80f1-31923e35ca1f/osd-block-88631385-1de2-4168-9c4e-8b3b3e292355 /var/lib/ceph/osd/ceph-0/block Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap stderr: got monmap epoch 1 Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQAPVq9gdL/3MxAALNvWGRgndR/7H0i/cc6b+g== stdout: creating /var/lib/ceph/osd/ceph-0/keyring added entity osd.0 auth(key=AQAPVq9gdL/3MxAALNvWGRgndR/7H0i/cc6b+g==) Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/ Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 88631385-1de2-4168-9c4e-8b3b3e292355 --setuser ceph --setgroup ceph stderr: 2021-05-27T08:19:29.280+0000 7f6f2beb3f00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid stderr: /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: In function 'void PriorityCache::Manager::balance()' thread 7f6f0a079700 time 2021-05-27T08:19:30.129967+0000 stderr: /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0) stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x5650082ff179] stderr: 2: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 3: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 4: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 6: clone() stderr: *** Caught signal (Aborted) ** stderr: in thread 7f6f0a079700 thread_name:bstore_mempool stderr: 2021-05-27T08:19:30.136+0000 7f6f0a079700 -1 /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: In function 'void PriorityCache::Manager::balance()' thread 7f6f0a079700 time 2021-05-27T08:19:30.129967+0000 stderr: /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0) stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x5650082ff179] stderr: 2: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 3: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 4: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 6: clone() stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6f2a01d980] stderr: 2: gsignal() stderr: 3: abort() stderr: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5650082ff1d4] stderr: 5: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 6: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 7: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 9: clone() stderr: 2021-05-27T08:19:30.140+0000 7f6f0a079700 -1 *** Caught signal (Aborted) ** stderr: in thread 7f6f0a079700 thread_name:bstore_mempool stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6f2a01d980] stderr: 2: gsignal() stderr: 3: abort() stderr: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5650082ff1d4] stderr: 5: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 6: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 7: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 9: clone() stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. stderr: -2827> 2021-05-27T08:19:29.280+0000 7f6f2beb3f00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid stderr: -1> 2021-05-27T08:19:30.136+0000 7f6f0a079700 -1 /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: In function 'void PriorityCache::Manager::balance()' thread 7f6f0a079700 time 2021-05-27T08:19:30.129967+0000 stderr: /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0) stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x5650082ff179] stderr: 2: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 3: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 4: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 6: clone() stderr: 0> 2021-05-27T08:19:30.140+0000 7f6f0a079700 -1 *** Caught signal (Aborted) ** stderr: in thread 7f6f0a079700 thread_name:bstore_mempool stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6f2a01d980] stderr: 2: gsignal() stderr: 3: abort() stderr: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5650082ff1d4] stderr: 5: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 6: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 7: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 9: clone() stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. stderr: *** Caught signal (Segmentation fault) ** stderr: in thread 7f6f0a079700 thread_name:bstore_mempool stderr: ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) stderr: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6f2a01d980] stderr: 2: pthread_getname_np() stderr: 3: (ceph::logging::Log::dump_recent()+0x510) [0x565008cc5920] stderr: 4: /usr/bin/ceph-osd(+0x127bc40) [0x565008ab7c40] stderr: 5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6f2a01d980] stderr: 6: gsignal() stderr: 7: abort() stderr: 8: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5650082ff1d4] stderr: 9: /usr/bin/ceph-osd(+0xac3363) [0x5650082ff363] stderr: 10: (PriorityCache::Manager::balance()+0x444) [0x565008f43164] stderr: 11: (BlueStore::MempoolThread::entry()+0x831) [0x565008992df1] stderr: 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f6f2a0126db] stderr: 13: clone() --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it stderr: purged osd.0 --> RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 88631385-1de2-4168-9c4e-8b3b3e292355 --setuser ceph --setgroup ceph
Updated by Jiang Yu almost 3 years ago
osd log
-7> 2021-05-27T09:33:13.955+0000 7fead16ecf00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1622107993956506, "job": 1, "event": "recovery_started", "log_files": [3]} -6> 2021-05-27T09:33:13.955+0000 7fead16ecf00 4 rocksdb: [db_impl/db_impl_open.cc:758] Recovering log #3 mode 2 -5> 2021-05-27T09:33:13.955+0000 7fead16ecf00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1622107993956669, "job": 1, "event": "recovery_finished"} -4> 2021-05-27T09:33:13.955+0000 7fead16ecf00 1 bluestore(/var/lib/ceph/osd/ceph-0/) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB -3> 2021-05-27T09:33:13.955+0000 7fead16ecf00 1 bluestore(/var/lib/ceph/osd/ceph-0/) _fsck_on_open <<<START>>> check (regular) start -2> 2021-05-27T09:33:13.955+0000 7feaaf8b2700 5 prioritycache tune_memory target: 8589934592 mapped: 76144640 unmapped: 7626752 heap: 83771392 old mem: 134217728 new mem: 6439743411 -1> 2021-05-27T09:33:13.959+0000 7feaaf8b2700 -1 /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: In function 'void PriorityCache::Manager::balance()' thread 7feaaf8b2700 time 2021-05-27T09:33:13.957585+0000 /tmp/release/Ubuntu/WORKDIR/ceph-16.2.4-1-g574376ae42/src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0) ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x557b1553e179] 2: /usr/bin/ceph-osd(+0xac3363) [0x557b1553e363] 3: (PriorityCache::Manager::balance()+0x444) [0x557b16182164] 4: (BlueStore::MempoolThread::entry()+0x831) [0x557b15bd1df1] 5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7feacf84b6db] 6: clone() 0> 2021-05-27T09:33:13.967+0000 7feaaf8b2700 -1 *** Caught signal (Aborted) ** in thread 7feaaf8b2700 thread_name:bstore_mempool ceph version 16.2.4-1-g574376ae42 (574376ae42a35e132c77048bed3f4e36fc7cf68c) pacific (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7feacf856980] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x557b1553e1d4] 5: /usr/bin/ceph-osd(+0xac3363) [0x557b1553e363] 6: (PriorityCache::Manager::balance()+0x444) [0x557b16182164] 7: (BlueStore::MempoolThread::entry()+0x831) [0x557b15bd1df1] 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7feacf84b6db] 9: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads ---
Updated by Jiang Yu almost 3 years ago
I have enough memory
total used free shared buff/cache available Mem: 131922856 1921116 49080468 56620 80921272 116780212 Swap: 8388604 6400 8382204
Updated by Neha Ojha almost 3 years ago
- Project changed from RADOS to bluestore
- Subject changed from osd: bluestore _read_fsid unparsable uuid to src/common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0)
- Assignee set to Mark Nelson
Mark, can you help take a look at this.
Updated by Mira Limbeck over 2 years ago
A user in our forum had a similar issue [0]. This is with version 16.2.6.
We could narrow it down to an overcommitment of the memory by having the sum of all ratios be > 1.
None of the caches wanted any memory, so all the memory was still there at the last priority. This meant the memory was distributed based on the ratios.
But the implementation at the end doesn't take the sum of all ratios into account, which leads to the overcommitment.
From the logs:
-6> 2021-09-29T14:17:55.689+0200 7f6060fdf700 10 prioritycache balance assigning cache bytes for PRI: 11 -5> 2021-09-29T14:17:55.689+0200 7f6060fdf700 10 prioritycache balance_priority kv_onode pri: 11 round: 0 wanted: 0 ratio: 0.04 cur_ratios: 1.04 fair_share: 93220570 mem_avail: 2423734834 -4> 2021-09-29T14:17:55.689+0200 7f6060fdf700 10 prioritycache balance_priority data pri: 11 round: 0 wanted: -95 ratio: 0 cur_ratios: 1.04 fair_share: 0 mem_avail: 2423734834 -3> 2021-09-29T14:17:55.689+0200 7f6060fdf700 10 prioritycache balance_priority meta pri: 11 round: 0 wanted: -95 ratio: 0.8 cur_ratios: 1.04 fair_share: 1864411410 mem_avail: 2423734834 -2> 2021-09-29T14:17:55.689+0200 7f6060fdf700 10 prioritycache balance_priority kv pri: 11 round: 0 wanted: 0 ratio: 0.2 cur_ratios: 1.04 fair_share: 466102852 mem_avail: 2423734834 -1> 2021-09-29T14:17:55.689+0200 7f6060fdf700 -1 ./src/common/PriorityCache.cc: In function 'void PriorityCache::Manager::balance()' thread 7f6060fdf700 time 2021-09-29T14:17:55.690660+0200 ./src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0)
[0] https://forum.proxmox.com/threads/ceph-recreate-osd-failed.97044/ (German)
Updated by Radoslaw Zarzynski 11 months ago
- Has duplicate Bug #54657: crash: void PriorityCache::Manager::balance(): assert(mem_avail >= 0) added
Updated by Raimund Sacherer 11 months ago
PR https://github.com/ceph/ceph/pull/51784
The formula checking if all the cache sizes sum up to <= 1 did not take
cache_kv_onode_ratio into account.
The last dout line did also not take cache_kv_onode_ratio into account.
I think this is a part of a fix for this issue.
Updated by Igor Fedotov 3 months ago
- Is duplicate of Bug #64053: Assertion in PriorityCache::Manager::balance() at OSD startup added
Updated by Igor Fedotov 3 months ago
- Is duplicate of deleted (Bug #64053: Assertion in PriorityCache::Manager::balance() at OSD startup)
Updated by Igor Fedotov 3 months ago
- Has duplicate Bug #64053: Assertion in PriorityCache::Manager::balance() at OSD startup added
Updated by Igor Fedotov 3 months ago
- Status changed from New to Pending Backport
- Pull request ID set to 51784
Updated by Igor Fedotov 3 months ago
- Assignee changed from Mark Nelson to Igor Fedotov
Updated by Backport Bot 3 months ago
- Copied to Backport #64071: reef: src/common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) added
Updated by Backport Bot 3 months ago
- Copied to Backport #64072: quincy: src/common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) added