Project

General

Profile

Actions

Bug #18926

closed

Why osds do not release memory?

Added by yongqiang guo about 7 years ago. Updated almost 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Tags:
bluestore
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Version: K11.2.0, bluestore, two replication.

test: testing cluster with fio, with parmeters "-direct=1 -iodepth 64 -thread -rw=write -ioengine=libaio -bs=1M -numjobs=1".

The memory occupied by osd is growing, upto more than 2GB by every osd without recovery process. I'm not sure whether it was a memory leak issue or not.
The cluster runs about an hour, the client io will be aborted. At the time, some osds are marked as down, I find the process ceph-osd related with
osd has been killed. According to the osd log, I can see some assert info as following:

Traceback (most recent call last):
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory

osd.8
ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
1: (()+0x8f451f) [0x7fbf662f451f]
2: (()+0xf130) [0x7fbf636a8130]
3: (gsignal()+0x37) [0x7fbf626d45d7]
4: (abort()+0x148) [0x7fbf626d5cc8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fbf62fd89b5]
6: (()+0x5e926) [0x7fbf62fd6926]
7: (()+0x5e953) [0x7fbf62fd6953]
8: (()+0xb5275) [0x7fbf6302d275]
9: (()+0x7df5) [0x7fbf636a0df5]
10: (clone()+0x6d) [0x7fbf627951ad]

osd.9
2017-02-10 12:35:29.980996 7fda40b70700 0 -- 14.1.2.2:6816/6834 >> 14.1.2.2:6818/16198 conn(0x7fda53e2d800 :-1 s=STATE_OPEN pgs=2 cs=1 l=0).fault initiating reconnect
2017-02-10 12:35:38.802506 7fda39989700 -1 /home/sda4/g00352/ceph-12.0.0/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fda39989700 time 2017-02-10 12:35:38.800554
/home/sda4/g00352/ceph-12.0.0/src/os/bluestore/KernelDevice.cc: 267: FAILED assert(r >= 0)
ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fda455ce940]
2: (KernelDevice::_aio_thread()+0x49c) [0x7fda4555a75c]
3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x7fda4555f79d]
4: (Thread::entry_wrapper()+0x75) [0x7fda4569d3d5]
5: (()+0x7df5) [0x7fda42921df5]
6: (clone()+0x6d) [0x7fda41a161ad]

osd.11
ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
1: (()+0x8f451f) [0x7f9174c7951f]
2: (()+0xf130) [0x7f917202d130]
3: (gsignal()+0x37) [0x7f91710595d7]
4: (abort()+0x148) [0x7f917105acc8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7f917195d9b5]
6: (()+0x5e926) [0x7f917195b926]
7: (()+0x5e953) [0x7f917195b953]
8: (()+0x5eb73) [0x7f917195bb73]
9: (()+0x37ded5) [0x7f9174702ed5]
10: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x2eb) [0x7f9174c804db]
11: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x203) [0x7f9174c811c3]
12: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x11a) [0x7f9174c60a4a]
13: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, BlueStore::WriteContext*)+0xe76) [0x7f9174b9f876]
14: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x725) [0x7f9174ba1235]
15: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x10b) [0x7f9174ba1fcb]
16: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x15b7) [0x7f9174ba5367]
17: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x599) [0x7f9174ba6389]
18: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0xa5) [0x7f9174969215]
19: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x3be) [0x7f9174a2b26e]
20: (ReplicatedBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x3a1) [0x7f9174a3a4f1]
21: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xc4) [0x7f9174905374]
22: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x206) [0x7f91747b93a6]
23: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f91747b9747]
24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x640) [0x7f91747decc0]
25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x98c) [0x7f9174cd50bc]
26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f9174cda520]
27: (Thread::entry_wrapper()+0x75) [0x7f9174da13d5]
28: (()+0x7df5) [0x7f9172025df5]
29: (clone()+0x6d) [0x7f917111a1ad]


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issueResolved02/14/2017

Actions
Actions

Also available in: Atom PDF