Project

General

Profile

Bug #49618

Bluestore OSD crash with tcmalloc::allocate_full_cpp_throw_oom in multisite setup with PG_DAMAGED cluster error

Added by Ist Gab 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Multisite, sync, osd
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):

9643c370a20c0d34f5e8965ae4461e2a7cf709ab4183929239bc263d0e1eef94

Crash signature (v2):

Description

Hi,

I have a 3 DC multisite setup.

The replication is directional like HKG->SGP->US so the bucket is replicated from HKG to SGP and the same bucket is replicated further from SGP to US.

The HKG > SGP connection is pretty fast 12.5millions objects (600GB) transferred in 6.5 hours. Once the OSD crashed in SGP, it stopped the complete chain replication and made PG_DAMAGED cluster error.
The pg can be repaired but the sync never started back only bucket sync disable/enable helped.
I got OSD crash also in HKG BUT in ASH. ASH no any error, the replication speed is 2 millions objects in 6.5 hours which is like 90GB of data.

This is the crash of the osd:

{
"backtrace": [
"(()+0x12b20) [0x7f597d3fbb20]",
"(gsignal()+0x10f) [0x7f597c0667ff]",
"(abort()+0x127) [0x7f597c050c35]",
"(()+0x9009b) [0x7f597ca1c09b]",
"(()+0x9653c) [0x7f597ca2253c]",
"(()+0x96597) [0x7f597ca22597]",
"(()+0x967f8) [0x7f597ca227f8]",
"(()+0x19d24) [0x7f597e168d24]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) [0x7f597e18b0d6]",
"(rocksdb::Arena::AllocateNewBlock(unsigned long)+0x43) [0x5632f08ccb93]",
"(rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x5632f08ccc3b]",
"(rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0xb4) [0x5632f07fae94]",
"(()+0x1103e7e) [0x5632f085ae7e]",
"(rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0xcf) [0x5632f07f6f8f]",
"(rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::ValueType)+0x452) [0x5632f08520e2]",
"(rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17) [0x5632f0852e97]",
"(rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x480) [0x5632f084ac20]",
"(rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x149) [0x5632f084ebe9]",
"(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1acd) [0x5632f078a03d]",
"(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x5632f078ac11]",
"(RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x8c) [0x5632f074180c]",
"(RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x87) [0x5632f0742027]",
"(BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x426) [0x5632f0226376]",
"(BlueStore::_kv_sync_thread()+0x176f) [0x5632f024bc1f]",
"(BlueStore::KVSyncThread::entry()+0x11) [0x5632f0273791]",
"(()+0x814a) [0x7f597d3f114a]",
"(clone()+0x43) [0x7f597c12bf23]"
],
"ceph_version": "15.2.9",
"crash_id": "2021-03-04T14:55:45.094048Z_3d481fd3-7573-4cb7-9b22-20784b418e64",
"entity_name": "osd.5",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": "9643c370a20c0d34f5e8965ae4461e2a7cf709ab4183929239bc263d0e1eef94",
"timestamp": "2021-03-04T14:55:45.094048Z",
"utsname_hostname": "hostname",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-240.10.1.el8_3.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Jan 18 17:05:51 UTC 2021"
}

And couple of hours before it was am RGW crash also in HKG with tmalloc issue too. {
"backtrace": [
"(()+0x12b20) [0x7ff137833b20]",
"(gsignal()+0x10f) [0x7ff135e7b7ff]",
"(abort()+0x127) [0x7ff135e65c35]",
"(()+0x9009b) [0x7ff13683109b]",
"(()+0x9653c) [0x7ff13683753c]",
"(()+0x96597) [0x7ff136837597]",
"(()+0x967f8) [0x7ff1368377f8]",
"(()+0x19d24) [0x7ff141006d24]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) [0x7ff1410290d6]",
"(RGWRados::Bucket::List::list_objects_ordered(long, std::vector<rgw_bucket_dir_entry, std::allocator<rgw_bucket_dir_entry> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, bool> > >, bool*, optional_yield)+0xd81) [0x7ff142917861]",
"(RGWListBucket::execute()+0x55c) [0x7ff142881a7c]",
"(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x11f2) [0x7ff142551962]",
"(process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x3488) [0x7ff142555c48]",
"(()+0x417476) [0x7ff1424a4476]",
"(()+0x418db3) [0x7ff1424a5db3]",
"(make_fcontext()+0x2f) [0x7ff142b60d4f]"
],
"ceph_version": "15.2.9",
"crash_id": "2021-03-04T08:56:12.698456Z_5a592b7d-b183-449e-89cf-fbbbe660a3be",
"entity_name": "rgw0",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "radosgw",
"stack_sig": "4a1ba651a84b77f02fb3896f39a46230d1852d4d9a3bcd468a2ab3c6ff042d61",
"timestamp": "2021-03-04T08:56:12.698456Z",
"utsname_hostname": "hostname",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-240.10.1.el8_3.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Jan 18 17:05:51 UTC 2021"
}

Any idea what I should tune or is it a bug?

Thank you.

History

#1 Updated by Igor Fedotov 4 months ago

It look like out-of-memory case. Do you have any understanding what was the RAM usage prior to that? Aren't you observing unexpectedly high RAM usage (primarily for ceph-osd processes) since then?

#2 Updated by Ist Gab 4 months ago

Igor Fedotov wrote:

It look like out-of-memory case. Do you have any understanding what was the RAM usage prior to that? Aren't you observing unexpectedly high RAM usage (primarily for ceph-osd processes) since then?

Hi Igor,

Thank you very much to pickup my issue.

Some background information about the setup, maybe something wrong.
SGP and HKG has 6 osd nodes, each nodes have 6x15TB SSD, 1 NVME for index pool and each nodes have 2 nvme for wal+db (3 SSD for 1 NVME WAL+rocksdb).

The servers have 256GB memory and 88 Core cpu.
In the

/etc/sysctl.d/ceph-tuning.conf these are set based on redhat and ansible playbook:

fs.aio-max-nr=1048576
fs.file-max=26234859
vm.zone_reclaim_mode=0
vm.swappiness=10
vm.min_free_kbytes=4194303

In the /etc/sysconfig/ceph the tcmalloc 128MB

TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728

This is the ceph.conf on all nodes included mgr, mon, osd. OSD memory target is around 31GB.

[global]
cluster network = 192.168.198.0/24
fsid = 347b7a4f-9225-4d96-8b28-3f866e28c424
mon host = [v2:10.104.198.101:3300,v1:10.104.198.101:6789],[v2:10.104.198.102:3300,v1:10.104.198.102:6789],[v2:10.104.198.103:3300,v1:10.104.198.103:6789]
mon initial members = server,server-6s02,server-6s03
osd pool default crush rule = -1
public network = 10.104.198.0/24
rgw_relaxed_s3_bucket_names = true

rgw_dynamic_resharding = false

[mon]
mon_allow_pool_delete = true
mon_pg_warn_max_object_skew = 0
mon_osd_nearfull_ratio = 70

[osd]
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
osd_memory_target = 31490694621

[client.rgw.server.rgw0]
host = server
keyring = /var/lib/ceph/radosgw/ceph-rgw.server.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-server.rgw0.log
rgw frontends = beast endpoint=10.104.198.101:8080
rgw thread pool size = 512
rgw_zone=sin

[client.rgw.server-6s02.rgw0]
host = server-6s02
keyring = /var/lib/ceph/radosgw/ceph-rgw.server-6s02.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-server-6s02.rgw0.log
rgw frontends = beast endpoint=10.104.198.102:8080
rgw thread pool size = 512
rgw_zone=sin

[client.rgw.server-6s03.rgw0]
host = server-6s03
keyring = /var/lib/ceph/radosgw/ceph-rgw.server-6s03.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-server-6s03.rgw0.log
rgw frontends = beast endpoint=10.104.198.103:8080
rgw thread pool size = 512
rgw_zone=sin

As you've asked, please have a look the grafana dashboard, you can see has plenty of free memories all nodes. (each column is 1 node). You can see how it is smashing the SSDs and nvmes. Yellow and green the rocksdb+wal nvme drives.
This is SGP between March 4 0:00 and March 5 11:00pm


This is HKG between March 4 0:00 and March 5 11:00pm

And the SGP cluster after many times repaired the pg to take out the cluster from health error, now I have this:
HEALTH_WARN Too many repaired reads on 1 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs
osd.41 had 16 reads repaired

This is the index pool nvme just FYI.I need to restart this osd to clear out this error. The cluster is new.

#3 Updated by David Orman 4 months ago

Igor Fedotov wrote:

It look like out-of-memory case. Do you have any understanding what was the RAM usage prior to that? Aren't you observing unexpectedly high RAM usage (primarily for ceph-osd processes) since then?

I think we are seeing the same issue with 15.2.9, on a non-RGW deployment. We noticed on deploy OOM-killer would go wild and kill all of our servers, requiring a reboot. With the cluster up and running, we see periodic OSD crashes with this:

root@ceph01:~# ceph crash info 2021-03-08T07:17:43.057598Z_2604bc50-e7b3-4482-84cc-2bfd09e3fc98
Inferring fsid 508bbbd0-7067-11eb-baa9-bf10c01557e4
Inferring config /var/lib/ceph/508bbbd0-7067-11eb-baa9-bf10c01557e4/mon.ceph01/config
Using recent ceph image docker.io/ceph/ceph@sha256:4e710662986cf366c282323bfb4c4ca507d7e117c5ccf691a8273732073297e5
{
    "backtrace": [
        "(()+0x12b20) [0x7f2d00041b20]",
        "(gsignal()+0x10f) [0x7f2cfeca97ff]",
        "(abort()+0x127) [0x7f2cfec93c35]",
        "(()+0x9009b) [0x7f2cff65f09b]",
        "(()+0x9653c) [0x7f2cff66553c]",
        "(()+0x96597) [0x7f2cff665597]",
        "(()+0x967f8) [0x7f2cff6657f8]",
        "(()+0x19d24) [0x7f2d00daed24]",
        "(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) [0x7f2d00dd10d6]",
        "(rocksdb::Arena::AllocateNewBlock(unsigned long)+0x43) [0x563c8f378b93]",
        "(rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x563c8f378c3b]",
        "(rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0xb4) [0x563c8f2a6e94]",
        "(()+0x1103e7e) [0x563c8f306e7e]",
        "(rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0xcf) [0x563c8f2a2f8f]",
        "(rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::ValueType)+0x452) [0x563c8f2fe0e2]",
        "(rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17) [0x563c8f2fee97]",
        "(rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x480) [0x563c8f2f6c20]",
        "(rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x149) [0x563c8f2fabe9]",
        "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1acd) [0x563c8f23603d]",
        "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x563c8f236c11]",
        "(RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x8c) [0x563c8f1ed80c]",
        "(RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x87) [0x563c8f1ee027]",
        "(BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x426) [0x563c8ecd2376]",
        "(BlueStore::_kv_sync_thread()+0x176f) [0x563c8ecf7c1f]",
        "(BlueStore::KVSyncThread::entry()+0x11) [0x563c8ed1f791]",
        "(()+0x814a) [0x7f2d0003714a]",
        "(clone()+0x43) [0x7f2cfed6ef23]" 
    ],
    "ceph_version": "15.2.9",
    "crash_id": "2021-03-08T07:17:43.057598Z_2604bc50-e7b3-4482-84cc-2bfd09e3fc98",
    "entity_name": "osd.272",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "9643c370a20c0d34f5e8965ae4461e2a7cf709ab4183929239bc263d0e1eef94",
    "timestamp": "2021-03-08T07:17:43.057598Z",
    "utsname_hostname": "ceph14",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-66-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021" 
}
root@ceph01:~# 

These servers have plenty of RAM (128G, for 24x OSDs with 2x NVME for DB/WAL), and graphs do not show it being completely exhausted except during deploy. Our memory target is default:

root@ceph01:~# ceph config get osd osd_memory_target
Inferring fsid 508bbbd0-7067-11eb-baa9-bf10c01557e4
Inferring config /var/lib/ceph/508bbbd0-7067-11eb-baa9-bf10c01557e4/mon.ceph01/config
Using recent ceph image docker.io/ceph/ceph@sha256:4e710662986cf366c282323bfb4c4ca507d7e117c5ccf691a8273732073297e5
4294967296
root@ceph01:~# 

This is happening across multiple clusters deployed with 15.2.9, and does repeat; I'd suggest this is a critical bug.

Our deployment is via cephadm, using containers (Podman 3.0.1). Host OS is Ubuntu 20.04.2.

#4 Updated by Dan van der Ster 4 months ago

https://tracker.ceph.com/issues/49387 is relevant, no?

Was 15.2.9 built against gperftools 2.8 ?

#5 Updated by David Orman 4 months ago

Dan van der Ster wrote:

https://tracker.ceph.com/issues/49387 is relevant, no?

Was 15.2.9 built against gperftools 2.8 ?

It's definitely what's installed in the 15.2.9 container:

root@ceph01:~# cephadm shell
Inferring fsid 012540a0-670e-11eb-b1f8-79b447895d28
Inferring config /var/lib/ceph/012540a0-670e-11eb-b1f8-79b447895d28/mon.ceph01/config
Using recent ceph image docker.io/ceph/ceph@sha256:4e710662986cf366c282323bfb4c4ca507d7e117c5ccf691a8273732073297e5
root@ceph01:/# rpm -qa |grep gperftools
gperftools-libs-2.8-1.el8.x86_64
root@ceph01:/# 

#6 Updated by Neha Ojha 4 months ago

  • Priority changed from Normal to Urgent

#7 Updated by Ken Dreyer 4 months ago

This build is in epel-testing: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb . Can you test it and provide feedback in Bodhi?

#8 Updated by Ist Gab 4 months ago

Ken Dreyer wrote:

This build is in epel-testing: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb . Can you test it and provide feedback in Bodhi?

You mean replace the current one, restart osds ... ?

#9 Updated by David Orman 3 months ago

Ken Dreyer wrote:

This build is in epel-testing: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb . Can you test it and provide feedback in Bodhi?

We've tested this against multiple tools that depend on it without issue (and left feedback). We use the container images for Ceph, so will have to wait for those to be rebuilt to validate it fixes our frequent OSD crashes - we definitely agree this is a urgent priority/critical fix - so hopefully new images will be built quickly and updated re: docker hub.

#10 Updated by David Orman 3 months ago

The new (old) version of gperftools has been promoted to stable in EPEL8 (thanks Ken!!). Now we just need a new build of the docker images with this version and hopefully we will stop seeing the OSDs fail.

#11 Updated by Nathan Cutler 3 months ago

So this is a bug in tcmalloc/gperftools 2.8? Has it been reported somewhere?

#12 Updated by Kenneth Waegeman 3 months ago

We are also seeing something that looks like this issue:

   -43> 2021-03-18T13:17:27.866+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3299598336 unmapped: 10960896 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -42> 2021-03-18T13:17:28.128+0000 7efe89e09700 10 monclient: tick
   -41> 2021-03-18T13:17:28.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:16:58.129834+0000)
   -40> 2021-03-18T13:17:28.868+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3299041280 unmapped: 11517952 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -39> 2021-03-18T13:17:29.128+0000 7efe89e09700 10 monclient: tick
   -38> 2021-03-18T13:17:29.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:16:59.129925+0000)
   -37> 2021-03-18T13:17:29.869+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3302498304 unmapped: 8060928 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -36> 2021-03-18T13:17:30.128+0000 7efe89e09700 10 monclient: tick
   -35> 2021-03-18T13:17:30.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:00.130029+0000)
   -34> 2021-03-18T13:17:30.821+0000 7efe8da37700  5 bluestore.MempoolThread(0x557d106aca98) _resize_shards cache_size: 2845415832 kv_alloc: 2097152000 kv_used: 2097112656 meta_alloc: 436207616 meta_used: 167025670 data_alloc: 268435456 data_used: 0
   -33> 2021-03-18T13:17:30.871+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3298582528 unmapped: 11976704 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -32> 2021-03-18T13:17:31.077+0000 7efe7a416700  5 osd.454 364578 heartbeat osd_stat(store_statfs(0x208345c0000/0x251180000/0x3a381200000, data 0x1735c87ffb7/0x198fbab0000, compress 0x0/0x0/0x0, omap 0x5df0efc3, meta 0x1f327103d), peers [4,29,102,126,127,158,171,175,185,216,257,262,264,287,316,368,399,438,453,455,544,587,589,632,644,660] op hist [])
   -31> 2021-03-18T13:17:31.128+0000 7efe89e09700 10 monclient: tick
   -30> 2021-03-18T13:17:31.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:01.130113+0000)
   -29> 2021-03-18T13:17:31.873+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3299115008 unmapped: 11444224 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -28> 2021-03-18T13:17:32.128+0000 7efe89e09700 10 monclient: tick
   -27> 2021-03-18T13:17:32.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:02.130198+0000)
   -26> 2021-03-18T13:17:32.874+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3300278272 unmapped: 10280960 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -25> 2021-03-18T13:17:33.128+0000 7efe89e09700 10 monclient: tick
   -24> 2021-03-18T13:17:33.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:03.130284+0000)
   -23> 2021-03-18T13:17:33.377+0000 7efe7a416700  5 osd.454 364578 heartbeat osd_stat(store_statfs(0x20833750000/0x2511c0000/0x3a381200000, data 0x1735d5b5fb7/0x198fc8e0000, compress 0x0/0x0/0x0, omap 0x5df0efc3, meta 0x1f32b103d), peers [4,29,102,126,127,158,171,175,185,216,257,262,264,287,316,368,399,438,453,455,544,587,589,632,644,660] op hist [])
   -22> 2021-03-18T13:17:33.876+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3298582528 unmapped: 11976704 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -21> 2021-03-18T13:17:34.128+0000 7efe89e09700 10 monclient: tick
   -20> 2021-03-18T13:17:34.128+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:04.130366+0000)
   -19> 2021-03-18T13:17:34.878+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3298959360 unmapped: 11599872 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -18> 2021-03-18T13:17:35.129+0000 7efe89e09700 10 monclient: tick
   -17> 2021-03-18T13:17:35.129+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:05.130458+0000)
   -16> 2021-03-18T13:17:35.829+0000 7efe8da37700  5 bluestore.MempoolThread(0x557d106aca98) _resize_shards cache_size: 2845415832 kv_alloc: 2097152000 kv_used: 2097117616 meta_alloc: 436207616 meta_used: 167344047 data_alloc: 268435456 data_used: 0
   -15> 2021-03-18T13:17:35.879+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3298230272 unmapped: 12328960 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
   -14> 2021-03-18T13:17:36.129+0000 7efe89e09700 10 monclient: tick
   -13> 2021-03-18T13:17:36.129+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:06.130547+0000)
   -12> 2021-03-18T13:17:36.266+0000 7efe8922e700  5 bluestore(/var/lib/ceph/osd/ceph-454) _kv_sync_thread utilization: idle 3.515473814s of 10.023804225s, submitted: 532
   -11> 2021-03-18T13:17:36.863+0000 7efe98cc4700 10 monclient: handle_auth_request added challenge on 0x557d55de2c00
   -10> 2021-03-18T13:17:36.863+0000 7efe98cc4700 10 monclient: handle_auth_request added challenge on 0x557d56917800
    -9> 2021-03-18T13:17:36.881+0000 7efe8da37700  5 prioritycache tune_memory target: 4294967296 mapped: 3298557952 unmapped: 12001280 heap: 3310559232 old mem: 2845415832 new mem: 2845415832
    -8> 2021-03-18T13:17:37.129+0000 7efe89e09700 10 monclient: tick
    -7> 2021-03-18T13:17:37.129+0000 7efe89e09700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-03-18T13:17:07.130632+0000)
    -6> 2021-03-18T13:17:37.400+0000 7efe8922e700  4 rocksdb: [db/db_impl_write.cc:1470] [default] New memtable created with log file: #18706. Immutable memtables: 0.

    -5> 2021-03-18T13:17:37.400+0000 7efe8f440700  4 rocksdb: (Original Log Time 2021/03/18-13:17:37.402100) [db/db_impl_compaction_flush.cc:2073] Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots available 1, flush slots scheduled 1, compaction slots scheduled 0
    -4> 2021-03-18T13:17:37.400+0000 7efe8f440700  4 rocksdb: [db/flush_job.cc:322] [default] [JOB 3] Flushing memtable with next log file: 18706

    -3> 2021-03-18T13:17:37.400+0000 7efe8f440700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1616073457402164, "job": 3, "event": "flush_started", "num_memtables": 1, "num_entries": 378327, "num_deletes": 71668, "total_data_size": 253217583, "memory_usage": 260049840, "flush_reason": "Write Buffer Full"}
    -2> 2021-03-18T13:17:37.400+0000 7efe8f440700  4 rocksdb: [db/flush_job.cc:351] [default] [JOB 3] Level-0 flush table #18707: started
    -1> 2021-03-18T13:17:37.477+0000 7efe7a416700  5 osd.454 364578 heartbeat osd_stat(store_statfs(0x20831240000/0x251290000/0x3a381200000, data 0x1735f76d5b7/0x198fed20000, compress 0x0/0x0/0x0, omap 0x5df0efc3, meta 0x1f338103d), peers [4,29,102,126,127,158,171,175,185,216,257,262,264,287,316,368,399,438,453,455,544,587,589,632,644,660] op hist [0,1,0,0,0,1,1])
     0> 2021-03-18T13:17:37.483+0000 7efe8922e700 -1 *** Caught signal (Aborted) **
 in thread 7efe8922e700 thread_name:bstore_kv_sync

 ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) octopus (stable)
 1: (()+0x12b20) [0x7efe9ccb3b20]
 2: (gsignal()+0x10f) [0x7efe9b91b7ff]
 3: (abort()+0x127) [0x7efe9b905c35]
 4: (()+0x9009b) [0x7efe9c2d109b]
 5: (()+0x9653c) [0x7efe9c2d753c]
 6: (()+0x96597) [0x7efe9c2d7597]
 7: (()+0x967f8) [0x7efe9c2d77f8]
 8: (()+0x19d24) [0x7efe9da20d24]
 9: (tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) [0x7efe9da430d6]
 10: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x43) [0x557d060a7b93]
 11: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x557d060a7c3b]
 12: (rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0xb4) [0x557d05fd5e94]
 13: (()+0x1103e7e) [0x557d06035e7e]
 14: (rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0xcf) [0x557d05fd1f8f]
 15: (rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::ValueType)+0x452) [0x557d0602d0e2]
 16: (rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17) [0x557d0602de97]
 17: (rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x480) [0x557d06025c20]
 18: (rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x149) [0x557d06029be9]
 19: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1acd) [0x557d05f6503d]
 20: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x557d05f65c11]
 21: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x8c) [0x557d05f1c80c]
 22: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x87) [0x557d05f1d027]
 23: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x426) [0x557d05a01376]
 24: (BlueStore::_kv_sync_thread()+0x176f) [0x557d05a26c1f]
 25: (BlueStore::KVSyncThread::entry()+0x11) [0x557d05a4e791]
 26: (()+0x814a) [0x7efe9cca914a]
 27: (clone()+0x43) [0x7efe9b9e0f23]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

OSDs are crashing now and then..

Thanks!

#13 Updated by Josh Durgin 3 months ago

  • Status changed from New to Resolved

This is fixed by the downgraded version of tcmalloc now in EPEL and the 15.2.10 container images.

#14 Updated by Sage Weil 3 months ago

  • Crash signature (v1) updated (diff)

Also available in: Atom PDF