Project

General

Profile

Actions

Bug #10402

closed

ceph-objectstore-tool import may need to add an OSDMap for old epoch

Added by David Zafman over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Failure waiting for osd.5 restart which had the last import:

2014-12-19 23:24:14,498.498 INFO:teuthology.orchestra.run.plana02:Running: 'sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --journal-path /var/lib/ceph/osd/ceph-5/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op import --file /home/ubuntu/cephtest/data/exp.0.e5.4'

2014-12-19 23:24:27.752699 7f567b1c6880 15 filestore(/var/lib/ceph/osd/ceph-5) get_omap_iterator meta/51716277/pglog_0.dc/0//-1
2014-12-19 23:24:27.752789 7f567b1c6880 10 read_log done
2014-12-19 23:24:27.752808 7f567b1c6880 10 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=0 pi=4011-4032/2 crt=0'0 inactive] handle_loaded
2014-12-19 23:24:27.752822 7f567b1c6880 5 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=0 pi=4011-4032/2 crt=0'0 inactive NOTIFY] exit Initial 0.000303 0 0.000000
2014-12-19 23:24:27.752834 7f567b1c6880 5 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=0 pi=4011-4032/2 crt=0'0 inactive NOTIFY] enter Reset
2014-12-19 23:24:27.752844 7f567b1c6880 20 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=0 pi=4011-4032/2 crt=0'0 inactive NOTIFY] set_last_peering_reset 4034
2014-12-19 23:24:27.752852 7f567b1c6880 10 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=4034 pi=4011-4032/2 crt=0'0 inactive NOTIFY] Clearing blocked outgoing recovery messages
2014-12-19 23:24:27.752861 7f567b1c6880 10 osd.5 pg_epoch: 4034 pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=4034 pi=4011-4032/2 crt=0'0 inactive NOTIFY] Not blocking outgoing recovery messages
2014-12-19 23:24:27.752870 7f567b1c6880 10 osd.5 4035 load_pgs loaded pg[0.dc( empty local-les=4034 n=0 ec=1 les/c 4034/4034 4031/4033/4033) [1]/[1,5] r=1 lpr=4034 pi=4011-4032/2 crt=0'0 inactive NOTIFY] log((0'0,0'0], crt=0'0)
2014-12-19 23:24:27.752880 7f567b1c6880 10 osd.5 4035 pgid 0.e5 coll 0.e5_head
2014-12-19 23:24:27.752890 7f567b1c6880 15 filestore(/var/lib/ceph/osd/ceph-5) collection_getattr /var/lib/ceph/osd/ceph-5/current/0.e5_head 'info'
2014-12-19 23:24:27.752911 7f567b1c6880 10 filestore(/var/lib/ceph/osd/ceph-5) collection_getattr /var/lib/ceph/osd/ceph-5/current/0.e5_head 'info' = 1
2014-12-19 23:24:27.752920 7f567b1c6880 15 filestore(/var/lib/ceph/osd/ceph-5) omap_get_values meta/16ef7597/infos/head//-1
2014-12-19 23:24:27.752988 7f567b1c6880 20 osd.5 0 get_map 3840 - loading and decoding 0x46c1800
2014-12-19 23:24:27.752995 7f567b1c6880 15 filestore(/var/lib/ceph/osd/ceph-5) read meta/a081208/osdmap.3840/0//-1 0~0
2014-12-19 23:24:27.753036 7f567b1c6880 10 filestore(/var/lib/ceph/osd/ceph-5) error opening file /var/lib/ceph/osd/ceph-5/current/meta/DIR_8/osdmap.3840__0_0A081208__none with flags=2: (2) No such file or directory
2014-12-19 23:24:27.753052 7f567b1c6880 10 filestore(/var/lib/ceph/osd/ceph-5) FileStore::read(meta/a081208/osdmap.3840/0//-1) open error: (2) No such file or directory
2014-12-19 23:24:27.757670 7f567b1c6880 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f567b1c6880 time 2014-12-19 23:24:27.753063
osd/OSD.h: 715: FAILED assert(ret)

ceph version 0.89-830-g35eb9d8 (35eb9d85ae2fc92eacbc1783efb1d8bd25188ee1)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb9c625]
2: (OSDService::get_map(unsigned int)+0x3f) [0x6f0def]
3: (OSD::load_pgs()+0x1cf8) [0x6ab208]
4: (OSD::init()+0x730) [0x6ac3f0]
5: (main()+0x279c) [0x63661c]

Should check that OSDMap for epoch of importing PG is already present or it should be added to the OSD. Also, just as a sanity check if the import epoch > cluster epoch we should just abort. This shouldn't happen because we are now checking cluster UUID unless cluster re-created but kept the UUID somehow.

Actions #1

Updated by Yuri Weinstein over 9 years ago

Moved to bug #10422

Actions #2

Updated by Yuri Weinstein over 9 years ago

Moved to bug #10422

Actions #3

Updated by David Zafman over 9 years ago

Yuri Weinstein wrote:

Looks like this issue was caught in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-21_17:13:02-upgrade:firefly-x-next-distro-basic-multi/671671/ job:

[...]

Yuri's comment about asserts look like 10422 not this bug.

Actions #4

Updated by David Zafman over 9 years ago

  • Status changed from 12 to Rejected

We are rebuilding past intervals on all imports and setting the pg epoch and same_interval_since to the OSD epoch. This guarantees that no trimmed map from the past would be required.

Actions

Also available in: Atom PDF