We started seeing other OSDs crash with the same symptoms. Seems like it's related to OSD memory usage.
Our environment: CentOS 7, jewel binaries taken from official Ceph repos.
Logs for the failed OSD are below:
авг 10 16:53:39 mstor01 ceph-osd[2352550]: terminate called after throwing an instance of 'ceph::buffer::bad_alloc'
авг 10 16:53:39 mstor01 ceph-osd[2352550]: what(): buffer::bad_alloc
авг 10 16:53:39 mstor01 ceph-osd[2352550]: *** Caught signal (Aborted) **
авг 10 16:53:39 mstor01 ceph-osd[2352550]: in thread 7f3f3e6f9700 thread_name:tp_osd_tp
авг 10 16:53:39 mstor01 ceph-osd[2352550]: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 1: (()+0x91341a) [0x7f3f5bc7d41a]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 2: (()+0xf100) [0x7f3f59cb3100]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 3: (gsignal()+0x37) [0x7f3f582755f7]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 4: (abort()+0x148) [0x7f3f58276ce8]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3f58b7a9d5]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 6: (()+0x5e946) [0x7f3f58b78946]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 7: (()+0x5e973) [0x7f3f58b78973]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 8: (()+0x5eb93) [0x7f3f58b78b93]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 9: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x26d) [0x7f3f5bd8740d]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 10: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1f3) [0x7f3f5bd88043]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 11: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x29f) [0x7f3f5baf2c9f]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 12: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::bu
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 13: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buff
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 14: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1127) [0x7f3f5ba05317]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 15: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, Threa
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 16: (ReplicatedPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<OpRequest>)+0x8c) [0x7f3f5b89c4fc]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 17: (ReplicatedBackend::sub_op_modify(std::shared_ptr<OpRequest>)+0xc2a) [0x7f3f5b8e3d5a]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 18: (ReplicatedBackend::handle_message(std::shared_ptr<OpRequest>)+0x3e3) [0x7f3f5b8e4703]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 19: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x100) [0x7f3f5b83d810]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 20: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f3f5b6f2a8d]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 21: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7f3f5b6f2cdd]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 22: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x7f3f5b6f7809]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 23: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7f3f5bd6d557]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 24: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f3f5bd6f4c0]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 25: (()+0x7dc5) [0x7f3f59cabdc5]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 26: (clone()+0x6d) [0x7f3f58336ced]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 2016-08-10 16:53:39.112928 7f3f3e6f9700 -1 *** Caught signal (Aborted) **
авг 10 16:53:39 mstor01 ceph-osd[2352550]: in thread 7f3f3e6f9700 thread_name:tp_osd_tp
авг 10 16:53:39 mstor01 ceph-osd[2352550]: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 1: (()+0x91341a) [0x7f3f5bc7d41a]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 2: (()+0xf100) [0x7f3f59cb3100]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 3: (gsignal()+0x37) [0x7f3f582755f7]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 4: (abort()+0x148) [0x7f3f58276ce8]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3f58b7a9d5]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 6: (()+0x5e946) [0x7f3f58b78946]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 7: (()+0x5e973) [0x7f3f58b78973]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 8: (()+0x5eb93) [0x7f3f58b78b93]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 9: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x26d) [0x7f3f5bd8740d]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 10: (ceph::buffer::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int)+0x1f3) [0x7f3f5bd88043]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 11: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x29f) [0x7f3f5baf2c9f]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 12: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::bu
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 13: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buff
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 14: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1127) [0x7f3f5ba05317]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 15: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, Threa
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 16: (ReplicatedPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<OpRequest>)+0x8c) [0x7f3f5b89c4fc]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 17: (ReplicatedBackend::sub_op_modify(std::shared_ptr<OpRequest>)+0xc2a) [0x7f3f5b8e3d5a]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 18: (ReplicatedBackend::handle_message(std::shared_ptr<OpRequest>)+0x3e3) [0x7f3f5b8e4703]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 19: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x100) [0x7f3f5b83d810]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 20: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f3f5b6f2a8d]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 21: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7f3f5b6f2cdd]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 22: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x7f3f5b6f7809]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 23: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7f3f5bd6d557]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 24: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f3f5bd6f4c0]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 25: (()+0x7dc5) [0x7f3f59cabdc5]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: 26: (clone()+0x6d) [0x7f3f58336ced]
авг 10 16:53:39 mstor01 ceph-osd[2352550]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.