Project

General

Profile

Bug #42173

_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory

Added by super xor 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
14.2.4
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature:

Description

-4> 2019-10-03 17:58:44.023 7fde2e2f9700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4545611..4546321) is_readable = 1 - now=2019-10-03 17:58:44.024231 lease_expire=0.000000 has v0 lc 4546321
-3> 2019-10-03 17:58:44.023 7fde2e2f9700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4545611..4546321) is_readable = 1 - now=2019-10-03 17:58:44.024259 lease_expire=0.000000 has v0 lc 4546321
-2> 2019-10-03 17:58:44.027 7fde2e2f9700 -1 mon.km-fsn-1-dc4-m1-797678@0(leader).osd e257325 get_full_from_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory
-1> 2019-10-03 17:58:44.031 7fde2e2f9700 -1 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: In function 'int OSDMonitor::get_full_from_pinned_map(version_t, ceph::bufferlist&)' thread 7fde2e2f9700 time 2019-10-03 17:58:44.032979
/build/ceph-14.2.4/src/mon/OSDMonitor.cc: 3932: FAILED ceph_assert(err == 0)

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fde39d4864e]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fde39d48829]
3: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
4: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
5: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
6: (PaxosService::maybe_trim()+0x473) [0x707443]
7: (Monitor::tick()+0xa9) [0x5ecf39]
8: (C_MonContext::finish(int)+0x39) [0x5c3f29]
9: (Context::complete(int)+0x9) [0x6070d9]
10: (SafeTimer::timer_thread()+0x190) [0x7fde39ddd580]
11: (SafeTimerThread::entry()+0xd) [0x7fde39ddee4d]
12: (()+0x76ba) [0x7fde38b436ba]
13: (clone()+0x6d) [0x7fde3836c41d]

0> 2019-10-03 17:58:44.031 7fde2e2f9700 -1 ** Caught signal (Aborted) *
in thread 7fde2e2f9700 thread_name:safe_timer

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0x11390) [0x7fde38b4d390]
2: (gsignal()+0x38) [0x7fde3829a428]
3: (abort()+0x16a) [0x7fde3829c02a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7fde39d4869f]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fde39d48829]
6: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b]
7: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82]
8: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c]
9: (PaxosService::maybe_trim()+0x473) [0x707443]
10: (Monitor::tick()+0xa9) [0x5ecf39]
11: (C_MonContext::finish(int)+0x39) [0x5c3f29]
12: (Context::complete(int)+0x9) [0x6070d9]
13: (SafeTimer::timer_thread()+0x190) [0x7fde39ddd580]
14: (SafeTimerThread::entry()+0xd) [0x7fde39ddee4d]
15: (()+0x76ba) [0x7fde38b436ba]
16: (clone()+0x6d) [0x7fde3836c41d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000

History

#1 Updated by Greg Farnum 5 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Status changed from New to Need More Info
  • Component(RADOS) Monitor added

This came up in "[ceph-users] mon sudden crash loop - pinned map" as well; is that perhaps the same cluster?

Were any repairs run against the monitor's disk store (the RocksDB instance in particular)? Have there been previous issues that might have led to it actually missing data that should be present?

#2 Updated by super xor 5 months ago

Hi,
sorry yes I forgot to elaborate.
We had another issue resulting in crashing mon because of apparent rocksdb corruption (for unknown reason). So the only way to make it work was to run rocksdb repair, then 2 days later this issue here appeared.
Everything seems to work but sadly it crashes rather fast (<1min). No data issues, just a problem because cephfs can't be mounted.

#3 Updated by Josh Durgin 5 months ago

  • Status changed from Need More Info to Closed

Rocksdb repair isn't guaranteed to get all the data back - it sounds like it lost some maps in this case. For further advice we can continue the mailing list thread.

Also available in: Atom PDF