Project

General

Profile

Actions

Bug #21332

closed

OSD Caught signal with StupidAllocator on 12.2.0

Added by Eric Eastman over 6 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was originally posted to to the ceph email list: https://www.spinics.net/lists/ceph-users/msg38646.html

I am seeing OOM issues with some of my OSD nodes that I am testing with Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it has a smaller memory footprint, by setting the following in my ceph.conf:

bluefs_allocator = stupid
bluestore_cache_size_hdd = 1073741824
bluestore_cache_size_ssd = 1073741824

With these settings I am no longer seeing OOM errors, but on the node with these setting, I have seen multiple Aborted messages in my log files:

grep  Abort *log
ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **
ceph-osd.10.log:     0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **
ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) **
ceph-osd.11.log:     0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log:     0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught signal (Aborted) **
ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) **
ceph-osd.7.log:     0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) **

All log files can be downloaded from: ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/osd-logs-9-Sep-17.tar.gz

A couple of examples of the Aborts are:

2017-09-09 12:39:27.044074 7f27f5f20700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504975167035909, "job": 86, "event": "flush_started", "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553, "memory_usage": 260049176}
2017-09-09 12:39:27.044088 7f27f5f20700  4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86] Level-0 flush table #1825: started
2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7( v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842 ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0 lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184 active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap head
2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **
 in thread 7f2816f45700 thread_name:msgr-worker-2

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
 1: (()+0xa562f4) [0x5634e14882f4]
 2: (()+0x11390) [0x7f281b2c5390]
 3: (gsignal()+0x38) [0x7f281a261428]
 4: (abort()+0x16a) [0x7f281a26302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d]
 6: (()+0x8d6b6) [0x7f281aba16b6]
 7: (()+0x8d701) [0x7f281aba1701]
 8: (()+0xb8d38) [0x7f281abccd38]
 9: (()+0x76ba) [0x7f281b2bb6ba]
 10: (clone()+0x6d) [0x7f281a33282d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2017-09-09 12:39:05.878006 7f2817746700  1 -- 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 ==== osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2 ==== 998+0+46 (52256346 0 1629833233) 0x56359eb29000 con 0x563510c02000
 -9999> 2017-09-09 12:39:05.878065 7f2816f45700  1 -- 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 ==== osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded) ondisk+write+known_if_redirected+full_force e3521) v8 ==== 305+0+366 (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000

Second example:

2017-09-09 07:10:58.135527 7fa2d56a0700  4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10] Flushing memtable with next log file: 2773

2017-09-09 07:10:58.262058 7fa2d56a0700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504955458135538, "job": 10, "event": "flush_started", "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946, "memory_usage": 260049888}
2017-09-09 07:10:58.262077 7fa2d56a0700  4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10] Level-0 flush table #2774: started
2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **
 in thread 7fa2e96c8700 thread_name:bstore_kv_sync

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
 1: (()+0xa562f4) [0x5579585362f4]
 2: (()+0x11390) [0x7fa2faa45390]
 3: (gsignal()+0x38) [0x7fa2f99e1428]
 4: (abort()+0x16a) [0x7fa2f99e302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d]
 6: (()+0x8d6b6) [0x7fa2fa3216b6]
 7: (()+0x8d701) [0x7fa2fa321701]
 8: (()+0x8d919) [0x7fa2fa321919]
 9: (()+0x1230f) [0x7fa2fb60b30f]
 10: (operator new[](unsigned long)+0x4e7) [0x7fa2fb62f4b7]
 11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x70) [0x557958939150]
 12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x45) [0x5579589392d5]
 13: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x100) [0x557958939460]
 14: (rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x175) [0x5579588a80d5]
 15: (()+0xe01d03) [0x5579588e1d03]
 16: (()+0xe024dd) [0x5579588e24dd]
 17: (rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0x109) [0x5579588a3629]
 18: (rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x39c) [0x5579588db6cc]
 19: (rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x5b7) [0x5579588d4fa7]
 20: (rocksdb::WriteBatchInternal::InsertInto(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool)+0x14b) [0x5579588d8dcb]
 21: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x14a7) [0x55795899e2c7]
 22: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x2a) [0x55795899ed2a]
 23: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xaf) [0x55795847751f]
 24: (BlueStore::_kv_sync_thread()+0x23dc) [0x55795841150c]
 25: (BlueStore::KVSyncThread::entry()+0xd) [0x55795845453d]
 26: (()+0x76ba) [0x7fa2faa3b6ba]
 27: (clone()+0x6d) [0x7fa2f9ab282d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

System info:
ceph -v   
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)   

OS Ubuntu 16.04 using 4.13.0 kernel   
uname -a   
Linux ede-c2-osd04 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux   

free   
              total        used        free      shared  buff/cache   available   
Mem:       16427036    15095176     1190364          44      141496      101164   
Swap:        999420      737216      262204

lsblk
NAME           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
cciss/c0d5     104:80   0 68.3G  0 disk 
|-cciss/c0d5p2 104:82   0 68.2G  0 part 
`-cciss/c0d5p1 104:81   0  100M  0 part /var/lib/ceph/osd/ceph-10
cciss/c0d3     104:48   0 68.3G  0 disk 
|-cciss/c0d3p1 104:49   0  100M  0 part /var/lib/ceph/osd/ceph-7
`-cciss/c0d3p2 104:50   0 68.2G  0 part 
cciss/c0d1     104:16   0 68.3G  0 disk 
|-cciss/c0d1p1 104:17   0  100M  0 part /var/lib/ceph/osd/ceph-3
`-cciss/c0d1p2 104:18   0 68.2G  0 part 
sr0             11:0    1 1024M  0 rom  
cciss/c0d4     104:64   0 68.3G  0 disk 
|-cciss/c0d4p1 104:65   0  100M  0 part /var/lib/ceph/osd/ceph-11
`-cciss/c0d4p2 104:66   0 68.2G  0 part 
cciss/c0d2     104:32   0 68.3G  0 disk 
|-cciss/c0d2p2 104:34   0 68.2G  0 part 
`-cciss/c0d2p1 104:33   0  100M  0 part /var/lib/ceph/osd/ceph-9
cciss/c0d0     104:0    0 68.3G  0 disk 
|-cciss/c0d0p5 104:5    0  976M  0 part [SWAP]
|-cciss/c0d0p1 104:1    0 19.1G  0 part /
|-cciss/c0d0p6 104:6    0 48.3G  0 part /var
`-cciss/c0d0p2 104:2    0    1K  0 part 
root@ede-c2-osd04:~# 

ceph -s
  cluster:
    id:     6d8aae1e-1125-11e5-a708-001b78e265be
    health: HEALTH_WARN
            1 MDSs behind on trimming

  services:
    mon: 3 daemons, quorum ede-c2-mon01,ede-c2-mon02,ede-c2-mon03
    mgr: ede-c2-mon03(active), standbys: ede-c2-mon01, ede-c2-mon02
    mds: cephfs-1/1/1 up  {0=ede-c2-mon03=up:active}, 2 up:standby
    osd: 19 osds: 19 up, 19 in

  data:
    pools:   4 pools, 1792 pgs
    objects: 1628k objects, 221 GB
    usage:   719 GB used, 577 GB / 1296 GB avail
    pgs:     1792 active+clean

  io:
    client:   54130 B/s rd, 4254 kB/s wr, 24 op/s rd, 718 op/s wr

ceph health detail
HEALTH_WARN 1 MDSs behind on trimming
MDS_TRIM 1 MDSs behind on trimming
    mdsede-c2-mon03(mds.0): Behind on trimming (49174/30)max_segments: 30, num_segments: 49174


Work load: 3 cephfs clients, using kernel mount cephfs (kernel 4.13.0), all creating, checking, and deleting files. Cephfs snapshot taken hourly, with cron based process that deletes snapshot as needed to keep the file system less then 70% full.
NOTE: CephFS snapshots are enabled.

Actions

Also available in: Atom PDF