Project

General

Profile

Bug #40712

ceph-mon crash with assert(err == 0) after rocksdb->get

Added by Yang Dongsheng over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(1)I found a very strange problem in our environment that the ceph-mon crashed with below error in log:

2019-07-03 11:01:27 +0800 | ceph | 2019-07-03 11:01:27.488607 7f19cb0ca700 -1 /root/rpmbuild/BUILD/ceph-12.2.5-4/src/mon/Monitor.cc: In function 'bool Monitor::_scrub(ScrubResult*, std::pair<std::basic_string<char>, std::basic_string<char> >*, int*)' thread 7f19cb0ca700 time 2019-07-03 11:01:27.481338
2019-07-03 11:01:27 +0800 | ceph | /root/rpmbuild/BUILD/ceph-12.2.5-4/src/mon/Monitor.cc: 5370: FAILED assert(err == 0)
2019-07-03 11:01:27 +0800 | ceph |  ceph version 12.2.5-4 (af918d94d1cbc02a20d22a946b5cfb2e8d43f809) luminous (stable)
2019-07-03 11:01:27 +0800 | ceph |  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x56070668d9e0]
2019-07-03 11:01:27 +0800 | ceph |  2: (Monitor::_scrub(ScrubResult*, std::pair<std::string, std::string>*, int*)+0xc11) [0x560706434691]
2019-07-03 11:01:27 +0800 | ceph |  3: (Monitor::handle_scrub(boost::intrusive_ptr<MonOpRequest>)+0x22f) [0x56070644186f]
2019-07-03 11:01:27 +0800 | ceph |  4: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xc08) [0x56070645d688]
2019-07-03 11:01:27 +0800 | ceph |  5: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x56070645e61b]
2019-07-03 11:01:27 +0800 | ceph |  6: (Monitor::ms_dispatch(Message*)+0x23) [0x56070648a783]
2019-07-03 11:01:27 +0800 | ceph |  7: (DispatchQueue::entry()+0x792) [0x56070693b3e2]
2019-07-03 11:01:27 +0800 | ceph |  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x56070673494d]
2019-07-03 11:01:27 +0800 | ceph |  9: (()+0x7e25) [0x7f19d3eaee25]
2019-07-03 11:01:27 +0800 | ceph |  10: (clone()+0x6d) [0x7f19d144934d]
2019-07-03 11:01:27 +0800 | ceph |  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2019-07-03 11:01:27 +0800 | ceph | --- begin dump of recent events ---

(2) that means we meet the problem in rocksdb, then I tried to debug the store.db with ceph-kvstore-tool as below:
[root@atest-guest ceph0705]# ceph-kvstore-tool rocksdb node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/ list|grep auth|grep 10851
auth 10851
[root@atest-guest ceph0705]# ceph-kvstore-tool rocksdb node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/ get auth 10851
(auth, 10851) does not exist

I found there are some keys we can list by ceph-kvstore-tool, but if I want to get it, that error out with not exist. I found the
all releted keys as below:

auth    10851
auth    10852
auth    10853
auth    10854
auth    10855
auth    10856
auth    10857
auth    10858

(3) next step, I wrote a simple test code with rocksdb to get this value,
The same result, I can list it with iterator, but rocksdb return NotFound in db->Get():

#include <cstdio>
#include <string>
#include <iostream>

#include "rocksdb/db.h" 
#include "rocksdb/slice.h" 
#include "rocksdb/options.h" 

using namespace rocksdb;

std::string kDBPath = "/data/ceph0705/node-2/ceph/ceph/mon/mon/ceph-node-2/store.db/";

int main() {
  DB* db;
  Options options;
  // Optimize RocksDB. This is the easiest way to get RocksDB to perform well
  options.IncreaseParallelism();

  // open DB
  Status s = DB::Open(options, kDBPath, &db);
  assert(s.ok());

  std::string value;
  std::string prefix("auth");
  std::string key("10851");
  std::string combined_key = prefix;
  combined_key.push_back(0);
  combined_key.append(key);
  Status st = db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), rocksdb::Slice(combined_key), &value);
  fprintf(stdout, "Result of Get() with key(auth10851) is: %d\n", st.code());

  rocksdb::Iterator* it = db->NewIterator(rocksdb::ReadOptions());
  for (it->SeekToFirst(); it->Valid(); it->Next()) {
    if (combined_key ==  it->key().ToString())
        std::cout << "But we can find this key in iterator: " << it->key().ToString() << std::endl;
  }
  assert(it->status().ok()); // Check for any errors found during the scan
  delete it;

  delete db;
  return 0;
}

result:
[root@atest-guest examples]# ./simple_example
Result of Get() with key(auth10851) is: 1
But we can find this key in iterator: auth10851
[root@atest-guest examples]#

I believe this is rocksdb problem, but I want to report an issue here to see is anyone meet similar problem in ceph-mon.

Thanx a lot


Related issues

Related to RADOS - Bug #40777: hit assert in AuthMonitor::update_from_paxos New

History

#1 Updated by Yang Dongsheng over 4 years ago

I also opened an issue in rocksdb: https://github.com/facebook/rocksdb/issues/5558, and I attached the db file in this Issue.

#2 Updated by huang jun over 4 years ago

we meet this problem recently.
we decline this related more to rocksdb but not ceph

#3 Updated by Neha Ojha over 3 years ago

  • Related to Bug #40777: hit assert in AuthMonitor::update_from_paxos added

Also available in: Atom PDF