Bug #40712
ceph-mon crash with assert(err == 0) after rocksdb->get
0%
Description
(1)I found a very strange problem in our environment that the ceph-mon crashed with below error in log:
2019-07-03 11:01:27 +0800 | ceph | 2019-07-03 11:01:27.488607 7f19cb0ca700 -1 /root/rpmbuild/BUILD/ceph-12.2.5-4/src/mon/Monitor.cc: In function 'bool Monitor::_scrub(ScrubResult*, std::pair<std::basic_string<char>, std::basic_string<char> >*, int*)' thread 7f19cb0ca700 time 2019-07-03 11:01:27.481338
2019-07-03 11:01:27 +0800 | ceph | /root/rpmbuild/BUILD/ceph-12.2.5-4/src/mon/Monitor.cc: 5370: FAILED assert(err == 0)
2019-07-03 11:01:27 +0800 | ceph | ceph version 12.2.5-4 (af918d94d1cbc02a20d22a946b5cfb2e8d43f809) luminous (stable)
2019-07-03 11:01:27 +0800 | ceph | 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x56070668d9e0]
2019-07-03 11:01:27 +0800 | ceph | 2: (Monitor::_scrub(ScrubResult*, std::pair<std::string, std::string>*, int*)+0xc11) [0x560706434691]
2019-07-03 11:01:27 +0800 | ceph | 3: (Monitor::handle_scrub(boost::intrusive_ptr<MonOpRequest>)+0x22f) [0x56070644186f]
2019-07-03 11:01:27 +0800 | ceph | 4: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xc08) [0x56070645d688]
2019-07-03 11:01:27 +0800 | ceph | 5: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x56070645e61b]
2019-07-03 11:01:27 +0800 | ceph | 6: (Monitor::ms_dispatch(Message*)+0x23) [0x56070648a783]
2019-07-03 11:01:27 +0800 | ceph | 7: (DispatchQueue::entry()+0x792) [0x56070693b3e2]
2019-07-03 11:01:27 +0800 | ceph | 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x56070673494d]
2019-07-03 11:01:27 +0800 | ceph | 9: (()+0x7e25) [0x7f19d3eaee25]
2019-07-03 11:01:27 +0800 | ceph | 10: (clone()+0x6d) [0x7f19d144934d]
2019-07-03 11:01:27 +0800 | ceph | NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2019-07-03 11:01:27 +0800 | ceph | --- begin dump of recent events ---
(2) that means we meet the problem in rocksdb, then I tried to debug the store.db with ceph-kvstore-tool as below:
[root@atest-guest ceph0705]# ceph-kvstore-tool rocksdb node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/ list|grep auth|grep 10851
auth 10851
[root@atest-guest ceph0705]# ceph-kvstore-tool rocksdb node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/ get auth 10851
(auth, 10851) does not exist
I found there are some keys we can list by ceph-kvstore-tool, but if I want to get it, that error out with not exist. I found the
all releted keys as below:
auth 10851
auth 10852
auth 10853
auth 10854
auth 10855
auth 10856
auth 10857
auth 10858
(3) next step, I wrote a simple test code with rocksdb to get this value,
The same result, I can list it with iterator, but rocksdb return NotFound in db->Get():
#include <cstdio>
#include <string>
#include <iostream>
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
using namespace rocksdb;
std::string kDBPath = "/data/ceph0705/node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/";
int main() {
DB* db;
Options options;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism();
// open DB
Status s = DB::Open(options, kDBPath, &db);
assert(s.ok());
std::string value;
std::string prefix("auth");
std::string key("10851");
std::string combined_key = prefix;
combined_key.push_back(0);
combined_key.append(key);
Status st = db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), rocksdb::Slice(combined_key), &value);
fprintf(stdout, "Result of Get() with key(auth10851) is: %d\n", st.code());
rocksdb::Iterator* it = db->NewIterator(rocksdb::ReadOptions());
for (it->SeekToFirst(); it->Valid(); it->Next()) {
if (combined_key == it->key().ToString())
std::cout << "But we can find this key in iterator: " << it->key().ToString() << std::endl;
}
assert(it->status().ok()); // Check for any errors found during the scan
delete it;
delete db;
return 0;
}
result:
[root@atest-guest examples]# ./simple_example
Result of Get() with key(auth10851) is: 1
But we can find this key in iterator: auth10851
[root@atest-guest examples]#
I believe this is rocksdb problem, but I want to report an issue here to see is anyone meet similar problem in ceph-mon.
Thanx a lot
Related issues
History
#1 Updated by Yang Dongsheng over 4 years ago
I also opened an issue in rocksdb: https://github.com/facebook/rocksdb/issues/5558, and I attached the db file in this Issue.
#2 Updated by huang jun over 4 years ago
we meet this problem recently.
we decline this related more to rocksdb but not ceph
#3 Updated by Neha Ojha over 3 years ago
- Related to Bug #40777: hit assert in AuthMonitor::update_from_paxos added