Actions
Bug #21332
closedOSD Caught signal with StupidAllocator on 12.2.0
Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This was originally posted to to the ceph email list: https://www.spinics.net/lists/ceph-users/msg38646.html
I am seeing OOM issues with some of my OSD nodes that I am testing with Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it has a smaller memory footprint, by setting the following in my ceph.conf:
bluefs_allocator = stupid
bluestore_cache_size_hdd = 1073741824
bluestore_cache_size_ssd = 1073741824
With these settings I am no longer seeing OOM errors, but on the node with these setting, I have seen multiple Aborted messages in my log files:
grep Abort *log ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) ** ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught signal (Aborted) ** ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) ** ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) **
All log files can be downloaded from: ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/osd-logs-9-Sep-17.tar.gz
A couple of examples of the Aborts are:
2017-09-09 12:39:27.044074 7f27f5f20700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504975167035909, "job": 86, "event": "flush_started", "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553, "memory_usage": 260049176} 2017-09-09 12:39:27.044088 7f27f5f20700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86] Level-0 flush table #1825: started 2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7( v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842 ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0 lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184 active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap head 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** in thread 7f2816f45700 thread_name:msgr-worker-2 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) 1: (()+0xa562f4) [0x5634e14882f4] 2: (()+0x11390) [0x7f281b2c5390] 3: (gsignal()+0x38) [0x7f281a261428] 4: (abort()+0x16a) [0x7f281a26302a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d] 6: (()+0x8d6b6) [0x7f281aba16b6] 7: (()+0x8d701) [0x7f281aba1701] 8: (()+0xb8d38) [0x7f281abccd38] 9: (()+0x76ba) [0x7f281b2bb6ba] 10: (clone()+0x6d) [0x7f281a33282d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -10000> 2017-09-09 12:39:05.878006 7f2817746700 1 -- 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 ==== osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2 ==== 998+0+46 (52256346 0 1629833233) 0x56359eb29000 con 0x563510c02000 -9999> 2017-09-09 12:39:05.878065 7f2816f45700 1 -- 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 ==== osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded) ondisk+write+known_if_redirected+full_force e3521) v8 ==== 305+0+366 (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000
Second example:
2017-09-09 07:10:58.135527 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10] Flushing memtable with next log file: 2773 2017-09-09 07:10:58.262058 7fa2d56a0700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504955458135538, "job": 10, "event": "flush_started", "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946, "memory_usage": 260049888} 2017-09-09 07:10:58.262077 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10] Level-0 flush table #2774: started 2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) ** in thread 7fa2e96c8700 thread_name:bstore_kv_sync ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) 1: (()+0xa562f4) [0x5579585362f4] 2: (()+0x11390) [0x7fa2faa45390] 3: (gsignal()+0x38) [0x7fa2f99e1428] 4: (abort()+0x16a) [0x7fa2f99e302a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d] 6: (()+0x8d6b6) [0x7fa2fa3216b6] 7: (()+0x8d701) [0x7fa2fa321701] 8: (()+0x8d919) [0x7fa2fa321919] 9: (()+0x1230f) [0x7fa2fb60b30f] 10: (operator new[](unsigned long)+0x4e7) [0x7fa2fb62f4b7] 11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x70) [0x557958939150] 12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x45) [0x5579589392d5] 13: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x100) [0x557958939460] 14: (rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x175) [0x5579588a80d5] 15: (()+0xe01d03) [0x5579588e1d03] 16: (()+0xe024dd) [0x5579588e24dd] 17: (rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0x109) [0x5579588a3629] 18: (rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x39c) [0x5579588db6cc] 19: (rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x5b7) [0x5579588d4fa7] 20: (rocksdb::WriteBatchInternal::InsertInto(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool)+0x14b) [0x5579588d8dcb] 21: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x14a7) [0x55795899e2c7] 22: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x2a) [0x55795899ed2a] 23: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xaf) [0x55795847751f] 24: (BlueStore::_kv_sync_thread()+0x23dc) [0x55795841150c] 25: (BlueStore::KVSyncThread::entry()+0xd) [0x55795845453d] 26: (()+0x76ba) [0x7fa2faa3b6ba] 27: (clone()+0x6d) [0x7fa2f9ab282d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
System info:
ceph -v ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) OS Ubuntu 16.04 using 4.13.0 kernel uname -a Linux ede-c2-osd04 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux free total used free shared buff/cache available Mem: 16427036 15095176 1190364 44 141496 101164 Swap: 999420 737216 262204 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT cciss/c0d5 104:80 0 68.3G 0 disk |-cciss/c0d5p2 104:82 0 68.2G 0 part `-cciss/c0d5p1 104:81 0 100M 0 part /var/lib/ceph/osd/ceph-10 cciss/c0d3 104:48 0 68.3G 0 disk |-cciss/c0d3p1 104:49 0 100M 0 part /var/lib/ceph/osd/ceph-7 `-cciss/c0d3p2 104:50 0 68.2G 0 part cciss/c0d1 104:16 0 68.3G 0 disk |-cciss/c0d1p1 104:17 0 100M 0 part /var/lib/ceph/osd/ceph-3 `-cciss/c0d1p2 104:18 0 68.2G 0 part sr0 11:0 1 1024M 0 rom cciss/c0d4 104:64 0 68.3G 0 disk |-cciss/c0d4p1 104:65 0 100M 0 part /var/lib/ceph/osd/ceph-11 `-cciss/c0d4p2 104:66 0 68.2G 0 part cciss/c0d2 104:32 0 68.3G 0 disk |-cciss/c0d2p2 104:34 0 68.2G 0 part `-cciss/c0d2p1 104:33 0 100M 0 part /var/lib/ceph/osd/ceph-9 cciss/c0d0 104:0 0 68.3G 0 disk |-cciss/c0d0p5 104:5 0 976M 0 part [SWAP] |-cciss/c0d0p1 104:1 0 19.1G 0 part / |-cciss/c0d0p6 104:6 0 48.3G 0 part /var `-cciss/c0d0p2 104:2 0 1K 0 part root@ede-c2-osd04:~# ceph -s cluster: id: 6d8aae1e-1125-11e5-a708-001b78e265be health: HEALTH_WARN 1 MDSs behind on trimming services: mon: 3 daemons, quorum ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mgr: ede-c2-mon03(active), standbys: ede-c2-mon01, ede-c2-mon02 mds: cephfs-1/1/1 up {0=ede-c2-mon03=up:active}, 2 up:standby osd: 19 osds: 19 up, 19 in data: pools: 4 pools, 1792 pgs objects: 1628k objects, 221 GB usage: 719 GB used, 577 GB / 1296 GB avail pgs: 1792 active+clean io: client: 54130 B/s rd, 4254 kB/s wr, 24 op/s rd, 718 op/s wr ceph health detail HEALTH_WARN 1 MDSs behind on trimming MDS_TRIM 1 MDSs behind on trimming mdsede-c2-mon03(mds.0): Behind on trimming (49174/30)max_segments: 30, num_segments: 49174
Work load: 3 cephfs clients, using kernel mount cephfs (kernel 4.13.0), all creating, checking, and deleting files. Cephfs snapshot taken hourly, with cron based process that deletes snapshot as needed to keep the file system less then 70% full.
NOTE: CephFS snapshots are enabled.
Actions