Project

General

Profile

Actions

Bug #25146

closed

"rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-distro-basic-smithi

Added by Yuri Weinstein over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/mimic-x
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is on mew mimic-x suite https://github.com/ceph/ceph/pull/23292
Run: http://pulpito.ceph.com/yuriw-2018-07-27_21:19:33-upgrade:mimic-x:parallel-master-distro-basic-smithi/
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-07-27_21:19:33-upgrade:mimic-x:parallel-master-distro-basic-smithi/2826004/teuthology.log

2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:2018-07-27 21:41:17.752 7fa1e5fdf140 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: while stat a file for size: /var/lib/ceph/mon/ceph-c/store.db/000000.sst: No such file or directory
2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:
2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:2018-07-27 21:41:17.752 7fa1e5fdf140 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-c': (22) Invalid argument

Per Josh "the monitors are getting "Unable to load table properties for file 0 --- IO error: While open a file for random read: /var/lib/ceph/mon/ceph-a/store.db/000000.sst: No such file or directory" after upgrading to master"


Related issues 1 (0 open1 closed)

Has duplicate RADOS - Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suiteDuplicateBrad Hubbard11/11/2018

Actions
Actions #1

Updated by Patrick Donnelly over 5 years ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) Monitor added
Actions #2

Updated by Kefu Chai over 5 years ago

i create a vstart.sh cluster using mimic branch, and ceph-monstore-tool from master is able to open it just fine.

$ bin/ceph-monstore-tool ~/dev/ceph/build/dev/mon.a/ dump-keys
...
2018-08-01 14:27:22.935 7f9e16c6df80  1 rocksdb: do_open column families: [default]
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: RocksDB version: 5.14.0

2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: Git sha rocksdb_build_git_sha:@9090ae3ecfbf9b50a398a5d8b178f14b88dc047e@
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: Compile date Jul 27 2018
...
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: SST files in /home/kefu/dev/ceph/build/dev/mon.a/store.db dir, Total Num: 3, files: 000004.sst 000007.sst 000
013.sst
...
2018-08-01 14:27:23.067 7f9e16c6df80  4 rocksdb: [/var/ssd/ceph/src/rocksdb/db/db_impl_open.cc:1219] DB pointer 0x559081bc1000
auth / 1
auth / 2
auth / 3
auth / 4
auth / 5
auth / 6
auth / 7
auth / 8
auth / first_committed
auth / format_version
...

rerunning the test at http://pulpito.ceph.com/kchai-2018-08-01_06:21:53-upgrade:mimic-x:parallel-master-distro-basic-smithi/

Actions #3

Updated by Kefu Chai over 5 years ago

it's a regression in rocksdb. the rocksdb in mimic (eaee6d3beab3429232ceb188377a3f94e844fca7) is f4a857da0b720691effc524469f6db895ad00d8e, which contains https://github.com/facebook/rocksdb/commit/73f21a7b2177aeb82b9f518222e2b9ea8fbb7c4f. this commit is fine per se. but older rocksdb not containing this commit will not be able to open the db file created by this change.

because 73f21a7b2177aeb82b9f518222e2b9ea8fbb7c4f creates dummy entry in the manifest to record the deleted WALs, expecting that the recovery will skip them. so this change is not forward compatible. i.e. old rocksdb is not necessarily able to open store created by new rocksdb.

the dummy entry looks like:

$20 = {fd = {table_reader = 0x0, packed_number_and_path_id = 0, file_size = 0}, smallest = {rep_ = "dummy_key\001\000\000\000\000\000\000"},
  largest = {rep_ = "dummy_key\001\000\000\000\000\000\000"}, smallest_seqno = 0, largest_seqno = 0, table_reader_handle = 0x0, stats = {
    num_reads_sampled = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 0}, <No data fields>}},
  compensated_file_size = 0, num_entries = 0, num_deletions = 0, raw_key_size = 0, raw_value_size = 0, refs = 1, being_compacted = false,
  init_stats_from_file = false, marked_for_compaction = false}

in the context of Ceph's use case, it's sort of fine. because, we do not support downgrade.

but when rocksdb identified that this change is not forward compatible, they decided to revert this change in https://github.com/facebook/rocksdb/pull/3762. so, this change makes rocksdb not backward compatible. in other words, new rocksdb is not able to open store created by old rocksdb.

and we do have this change in master! that's why we are suffering from a fix from rocksdb upstream.

this is embarrassing. because mimic is released. i don't really want our user to rebuild their OSDs or monitors to be forward compatible with master. we probably have to bite the bullet to keep the reverted change in our fork and maintain it, unless we can persuade the upstream to revert https://github.com/facebook/rocksdb/pull/3762 .

Actions #4

Updated by Kefu Chai over 5 years ago

an alternative option is to whip up a tool to rebuild the manifest to remove the dummy File4 with kDeletedLogNumberHack custom_tag. see also rocksdb/db/repair.cc .

Actions #5

Updated by Nathan Cutler over 5 years ago

  • Subject changed from "rocksdb: Corruption: Can't access /000000.sst" in pgrade:mimic-x:parallel-master-distro-basic-smithi to "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-distro-basic-smithi
Actions #6

Updated by Sage Weil over 5 years ago

another option would be to only partially revert, and keep just the bits that ignore the older deleted log files.

Actions #7

Updated by Sage Weil over 5 years ago

  • Status changed from New to 12
  • Assignee deleted (Sage Weil)
  • Priority changed from Normal to Urgent

I think we need to fix this sooner rather than later. My suggestion is to incorporate enough of the original rocksdb changes to interpret the newer MANIFEST entries, but do not generate new ones, so that we can silently "upgrade" from the mimic store.dbs with the patch back to the traditional format.

My guess is that rocksdb would take such a patch upstream, too?

Actions #8

Updated by Sage Weil over 5 years ago

  • Assignee set to Radoslaw Zarzynski
Actions #9

Updated by Radoslaw Zarzynski over 5 years ago

  • Status changed from 12 to In Progress

Very early fix: https://github.com/rzarzynski/rocksdb/tree/wip-bug-25146.

The case appears more complicated as the change has been reintroduced to RocksDB (see https://github.com/facebook/rocksdb/pull/3765) but in a modified form. VersionEdit uses now different dencoding that doesn't understand the original format of https://github.com/facebook/rocksdb/pull/3488. The value 0x03 of CustomTag has been reused. All of these variants were/are in our master: https://gist.github.com/rzarzynski/24a753176f7cf2d2c1fc173d8da763dc.

Actions #10

Updated by Radoslaw Zarzynski over 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #11

Updated by Brad Hubbard over 5 years ago

  • Has duplicate Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite added
Actions #13

Updated by Kefu Chai over 5 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF