Project

General

Profile

Bug #25146

"rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-distro-basic-smithi

Added by Yuri Weinstein about 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
Start date:
07/28/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/mimic-x
Component(RADOS):
Monitor
Pull request ID:

Description

This is on mew mimic-x suite https://github.com/ceph/ceph/pull/23292
Run: http://pulpito.ceph.com/yuriw-2018-07-27_21:19:33-upgrade:mimic-x:parallel-master-distro-basic-smithi/
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-07-27_21:19:33-upgrade:mimic-x:parallel-master-distro-basic-smithi/2826004/teuthology.log

2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:2018-07-27 21:41:17.752 7fa1e5fdf140 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: while stat a file for size: /var/lib/ceph/mon/ceph-c/store.db/000000.sst: No such file or directory
2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:
2018-07-27T21:41:17.751 INFO:tasks.ceph.mon.c.smithi031.stderr:2018-07-27 21:41:17.752 7fa1e5fdf140 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-c': (22) Invalid argument

Per Josh "the monitors are getting "Unable to load table properties for file 0 --- IO error: While open a file for random read: /var/lib/ceph/mon/ceph-a/store.db/000000.sst: No such file or directory" after upgrading to master"


Related issues

Duplicated by RADOS - Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite Duplicate 11/11/2018

History

#1 Updated by Patrick Donnelly about 1 year ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) Monitor added

#2 Updated by Kefu Chai about 1 year ago

i create a vstart.sh cluster using mimic branch, and ceph-monstore-tool from master is able to open it just fine.

$ bin/ceph-monstore-tool ~/dev/ceph/build/dev/mon.a/ dump-keys
...
2018-08-01 14:27:22.935 7f9e16c6df80  1 rocksdb: do_open column families: [default]
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: RocksDB version: 5.14.0

2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: Git sha rocksdb_build_git_sha:@9090ae3ecfbf9b50a398a5d8b178f14b88dc047e@
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: Compile date Jul 27 2018
...
2018-08-01 14:27:22.935 7f9e16c6df80  4 rocksdb: SST files in /home/kefu/dev/ceph/build/dev/mon.a/store.db dir, Total Num: 3, files: 000004.sst 000007.sst 000
013.sst
...
2018-08-01 14:27:23.067 7f9e16c6df80  4 rocksdb: [/var/ssd/ceph/src/rocksdb/db/db_impl_open.cc:1219] DB pointer 0x559081bc1000
auth / 1
auth / 2
auth / 3
auth / 4
auth / 5
auth / 6
auth / 7
auth / 8
auth / first_committed
auth / format_version
...

rerunning the test at http://pulpito.ceph.com/kchai-2018-08-01_06:21:53-upgrade:mimic-x:parallel-master-distro-basic-smithi/

#3 Updated by Kefu Chai about 1 year ago

it's a regression in rocksdb. the rocksdb in mimic (eaee6d3beab3429232ceb188377a3f94e844fca7) is f4a857da0b720691effc524469f6db895ad00d8e, which contains https://github.com/facebook/rocksdb/commit/73f21a7b2177aeb82b9f518222e2b9ea8fbb7c4f. this commit is fine per se. but older rocksdb not containing this commit will not be able to open the db file created by this change.

because 73f21a7b2177aeb82b9f518222e2b9ea8fbb7c4f creates dummy entry in the manifest to record the deleted WALs, expecting that the recovery will skip them. so this change is not forward compatible. i.e. old rocksdb is not necessarily able to open store created by new rocksdb.

the dummy entry looks like:

$20 = {fd = {table_reader = 0x0, packed_number_and_path_id = 0, file_size = 0}, smallest = {rep_ = "dummy_key\001\000\000\000\000\000\000"},
  largest = {rep_ = "dummy_key\001\000\000\000\000\000\000"}, smallest_seqno = 0, largest_seqno = 0, table_reader_handle = 0x0, stats = {
    num_reads_sampled = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 0}, <No data fields>}},
  compensated_file_size = 0, num_entries = 0, num_deletions = 0, raw_key_size = 0, raw_value_size = 0, refs = 1, being_compacted = false,
  init_stats_from_file = false, marked_for_compaction = false}

in the context of Ceph's use case, it's sort of fine. because, we do not support downgrade.

but when rocksdb identified that this change is not forward compatible, they decided to revert this change in https://github.com/facebook/rocksdb/pull/3762. so, this change makes rocksdb not backward compatible. in other words, new rocksdb is not able to open store created by old rocksdb.

and we do have this change in master! that's why we are suffering from a fix from rocksdb upstream.

this is embarrassing. because mimic is released. i don't really want our user to rebuild their OSDs or monitors to be forward compatible with master. we probably have to bite the bullet to keep the reverted change in our fork and maintain it, unless we can persuade the upstream to revert https://github.com/facebook/rocksdb/pull/3762 .

#4 Updated by Kefu Chai about 1 year ago

an alternative option is to whip up a tool to rebuild the manifest to remove the dummy File4 with kDeletedLogNumberHack custom_tag. see also rocksdb/db/repair.cc .

#5 Updated by Nathan Cutler about 1 year ago

  • Subject changed from "rocksdb: Corruption: Can't access /000000.sst" in pgrade:mimic-x:parallel-master-distro-basic-smithi to "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-distro-basic-smithi

#6 Updated by Sage Weil about 1 year ago

another option would be to only partially revert, and keep just the bits that ignore the older deleted log files.

#7 Updated by Sage Weil about 1 year ago

  • Status changed from New to Verified
  • Assignee deleted (Sage Weil)
  • Priority changed from Normal to Urgent

I think we need to fix this sooner rather than later. My suggestion is to incorporate enough of the original rocksdb changes to interpret the newer MANIFEST entries, but do not generate new ones, so that we can silently "upgrade" from the mimic store.dbs with the patch back to the traditional format.

My guess is that rocksdb would take such a patch upstream, too?

#8 Updated by Sage Weil about 1 year ago

  • Assignee set to Radoslaw Zarzynski

#9 Updated by Radoslaw Zarzynski 12 months ago

  • Status changed from Verified to In Progress

Very early fix: https://github.com/rzarzynski/rocksdb/tree/wip-bug-25146.

The case appears more complicated as the change has been reintroduced to RocksDB (see https://github.com/facebook/rocksdb/pull/3765) but in a modified form. VersionEdit uses now different dencoding that doesn't understand the original format of https://github.com/facebook/rocksdb/pull/3488. The value 0x03 of CustomTag has been reused. All of these variants were/are in our master: https://gist.github.com/rzarzynski/24a753176f7cf2d2c1fc173d8da763dc.

#10 Updated by Radoslaw Zarzynski 11 months ago

  • Status changed from In Progress to Need Review

#11 Updated by Brad Hubbard 9 months ago

  • Duplicated by Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite added

#13 Updated by Kefu Chai 9 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF