Project

General

Profile

Actions

Bug #8776

closed

osd: runaway memory on dumpling

Added by Sage Weil almost 10 years ago. Updated over 9 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

specific osd in a large cluster eats ram when started (normal is 1-2 gb, it hits 8 before the host starts swapping and badness ensues).

core file at teuthology:core.8776.gz

Actions #1

Updated by Sage Weil almost 10 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil over 9 years ago

it's all here:

2014-07-30 23:38:41.260511 40349c0 10 filestore(/srv/ceph/osd/88) _do_transaction on 0x7d99c10
2014-07-30 23:38:41.269145 40349c0 20 filestore(/srv/ceph/osd/88) _check_global_replay_guard no xattr
2014-07-30 23:38:41.294781 40349c0 10 filestore(/srv/ceph/osd/88) _check_replay_guard object has 44230036.0.2 < current pos 44873239.0.0, in past, will replay
2014-07-30 23:38:41.297676 40349c0 15 filestore(/srv/ceph/osd/88) remove 12.2_head/73bee172/usage.18/head//12
2014-07-30 23:38:41.301477 40349c0 20 filestore(/srv/ceph/osd/88) lfn_unlink: clearing omap on 73bee172/usage.18/head//12 in cid 12.2_head
2014-07-30 23:38:41.318290 40349c0 10 filestore hoid: 73bee172/usage.18/head//12 not skipping op, *spos 44873239.0.0
2014-07-30 23:38:41.319436 40349c0 10 filestore  > header.spos 44230036.0.3
2014-07-30 23:38:41.321388 40349c0 20 filestore remove_map_header: removing 2873902 hoid 73bee172/usage.18/head//12
2014-07-30 23:38:41.326390 40349c0 20 filestore clear_header: clearing seq 2873902

90.26% (278,452,045B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->75.14% (231,799,826B) 0x62A5A87: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| ->74.86% (230,934,610B) 0x62A67F9: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | ->74.86% (230,932,400B) 0x62A68DE: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | | ->43.51% (134,224,562B) 0x62A6C0E: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | | | ->43.51% (134,217,696B) 0x7DF1C4: LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | ->43.51% (134,217,696B) 0x7D081B: DBObjectMap::clear_header(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |   ->43.51% (134,217,696B) 0x7D49CF: DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |     ->43.51% (134,217,696B) 0x7D7B55: DBObjectMap::clear(hobject_t const&, SequencerPosition const*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |       ->43.51% (134,217,696B) 0x796A4A: FileStore::lfn_unlink(coll_t, hobject_t const&, SequencerPosition const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |         ->43.51% (134,217,696B) 0x796C64: FileStore::_remove(coll_t, hobject_t const&, SequencerPosition const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |           ->43.51% (134,217,696B) 0x7A6F62: FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |             ->43.51% (134,217,696B) 0x7A9D3F: FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |               ->43.51% (134,217,696B) 0x7BF454: JournalingObjectStore::journal_replay(unsigned long) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |                 ->43.51% (134,217,696B) 0x791B28: FileStore::mount() (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |                   ->43.51% (134,217,696B) 0x65A498: OSD::do_convertfs(ObjectStore*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |                     ->43.51% (134,217,696B) 0x65AF15: OSD::convertfs(std::string const&, std::string const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |                       ->43.51% (134,217,696B) 0x5BB9C1: main (in /home/sage/sage/usr/bin/ceph-osd)
| | | | |                         
| | | | ->00.00% (6,866B) in 1+ places, all below ms_print's threshold (01.00%)
| | | | 
| | | ->31.35% (96,701,528B) 0x62A6E0B: std::string::append(std::string const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | | | ->31.35% (96,700,953B) 0x7DD890: LevelDBStore::combine_strings(std::string const&, std::string const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | ->31.35% (96,700,884B) 0x7DF15E: LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | | ->31.35% (96,700,884B) 0x7D081B: DBObjectMap::clear_header(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |   ->31.35% (96,700,884B) 0x7D49CF: DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |     ->31.35% (96,700,884B) 0x7D7B55: DBObjectMap::clear(hobject_t const&, SequencerPosition const*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |       ->31.35% (96,700,884B) 0x796A4A: FileStore::lfn_unlink(coll_t, hobject_t const&, SequencerPosition const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |         ->31.35% (96,700,884B) 0x796C64: FileStore::_remove(coll_t, hobject_t const&, SequencerPosition const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |           ->31.35% (96,700,884B) 0x7A6F62: FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |             ->31.35% (96,700,884B) 0x7A9D3F: FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |               ->31.35% (96,700,884B) 0x7BF454: JournalingObjectStore::journal_replay(unsigned long) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |                 ->31.35% (96,700,884B) 0x791B28: FileStore::mount() (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |                   ->31.35% (96,700,884B) 0x65A498: OSD::do_convertfs(ObjectStore*) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |                     ->31.35% (96,700,884B) 0x65AF15: OSD::convertfs(std::string const&, std::string const&) (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |                       ->31.35% (96,700,884B) 0x5BB9C1: main (in /home/sage/sage/usr/bin/ceph-osd)
| | | | | |                         
| | | | | ->00.00% (69B) in 1+ places, all below ms_print's threshold (01.00%)
| | | | | 
| | | | ->00.00% (575B) in 1+ places, all below ms_print's threshold (01.00%)
| | | | 
| | | ->00.00% (6,310B) in 1+ places, all below ms_print's threshold (01.00%)
| | | 
| | ->00.00% (2,210B) in 1+ places, all below ms_print's threshold (01.00%)
| | 
| ->00.28% (865,216B) in 1+ places, all below ms_print's threshold (01.00%)
Actions #3

Updated by Samuel Just over 9 years ago

Argh, it's building up a leveldb operation to atomically remove all of the keys associated with the object. I think we can get away with doing it non-atomically, but the real problem is that the usage object got that big.

Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Won't Fix

this is a result of a very large omap object and us building a transaction to delete the keys. the problem is the big object, not much we can/should do about it.

Actions

Also available in: Atom PDF