Bug #11110: updated history.last_epoch_started but not info.last_epoch_started - Ceph - Ceph

Actions

Copy link

Bug #11110

closed

updated history.last_epoch_started but not info.last_epoch_started

Added by Samuel Just about 9 years ago. Updated about 9 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Reported on list:

2015-03-13 16:15:48.128081 7f3c2cc53700 5 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 inactive] enter Started/Primary/Peering
2015-03-13 16:15:48.128094 7f3c2cc53700 5 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo
2015-03-13 16:15:48.128108 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] _calc_past_interval_range: already have past intervals back to 66791
2015-03-13 16:15:48.128122 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67050-67050 up [6689,1919,2329](6689) acting [6689](6689))
2015-03-13 16:15:48.128134 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67048-67049 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128148 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67047-67047 up [6689,1919,2329](6689) acting [6689](6689))
2015-03-13 16:15:48.128161 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67045-67046 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128175 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67044-67044 up [6689,1919,2329](6689) acting [6689](6689))
2015-03-13 16:15:48.128188 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67037-67043 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128202 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67034-67036 up [1919,2329](1919) acting [1919,2329](1919) maybe_went_rw)
2015-03-13 16:15:48.128216 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67031-67033 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128229 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67028-67030 up [1919,2329](1919) acting [1919,2329](1919) maybe_went_rw)
2015-03-13 16:15:48.128243 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67025-67027 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128258 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(67022-67024 up [1919,2329](1919) acting [1919,2329](1919) maybe_went_rw)
2015-03-13 16:15:48.128271 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(66796-67021 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.128293 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(66792-66795 up [6689,1919](6689) acting [6689,1919](6689))
2015-03-13 16:15:48.128305 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior interval(66787-66791 up [6689](6689) acting [6689](6689) maybe_went_rw)
2015-03-13 16:15:48.128317 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] PriorSet: build_prior final: probe 1919,2329,6689 down blocked_by {}
2015-03-13 16:15:48.128329 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] up_thru 67050 < same_since 67051, must notify monitor
2015-03-13 16:15:48.128343 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.1919
2015-03-13 16:15:48.128361 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: querying info from osd.2329
2015-03-13 16:15:48.128376 7f3c2cc53700 15 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] publish_stats_to_osd 67051:20493
2015-03-13 16:15:48.128391 7f3c2cc53700 20 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] handle_activate_map: Not dirtying info: last_persisted is 67050 while current is 67051
2015-03-13 16:15:48.128399 7f3c2cc53700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 67051 epoch_requested: 67051 NullEvt
2015-03-13 16:15:48.129402 7f3c37664700 20 osd.6689 67051 _dispatch 0x43a4ec0 pg_notify(75.45(30) epoch 67051) v5
2015-03-13 16:15:48.130427 7f3c37664700 20 osd.6689 67051 _dispatch 0x43a0b40 pg_notify(75.45(30) epoch 67051) v5
2015-03-13 16:15:48.142378 7f3c2e055700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 67051 epoch_requested: 67051 MNotifyRec from 2329 notify: (query_epoch:67051, epoch_sent:67051, info:75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)) features: 0x3ffffffffffff
2015-03-13 16:15:48.142398 7f3c2e055700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] got osd.2329 75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)
2015-03-13 16:15:48.142429 7f3c2e055700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] update_heartbeat_peers 1919,2329,6689 unchanged
2015-03-13 16:15:48.142441 7f3c2e055700 20 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 2329 features: 3ffffffffffff
2015-03-13 16:15:48.142460 7f3c2e055700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 67051 epoch_requested: 67051 MNotifyRec from 2329 notify: (query_epoch:67051, epoch_sent:67051, info:75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)) features: 0x3ffffffffffff
2015-03-13 16:15:48.142472 7f3c2e055700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] got dup osd.2329 info 75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037), identical to ours
2015-03-13 16:15:48.219295 7f3c37664700 20 osd.6689 67051 _dispatch 0xe5efbc0 pg_notify(75.45(23) epoch 67051) v5
2015-03-13 16:15:48.219397 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 67051 epoch_requested: 67051 MNotifyRec from 1919 notify: (query_epoch:67051, epoch_sent:67051, info:75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)) features: 0x3ffffffffffff
2015-03-13 16:15:48.219436 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] got osd.1919 75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)
2015-03-13 16:15:48.219465 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] update_heartbeat_peers 1919,2329,6689 unchanged
2015-03-13 16:15:48.219478 7f3c2c252700 20 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 1919 features: 3ffffffffffff
2015-03-13 16:15:48.219492 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: last maybe_went_rw interval was interval(67048-67049 up [6689,1919,2329](6689) acting [6689,1919,2329](6689) maybe_went_rw)
2015-03-13 16:15:48.219506 7f3c2c252700 20 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Common features: 3ffffffffffff
2015-03-13 16:15:48.219522 7f3c2c252700 5 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetInfo 0.091427 4 0.000490
2015-03-13 16:15:48.219539 7f3c2c252700 5 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
2015-03-13 16:15:48.219564 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] calc_acting osd.1919 75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)
2015-03-13 16:15:48.219582 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] calc_acting osd.2329 75.45( v 66245'4028 (48932'1001,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)
2015-03-13 16:15:48.219595 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] calc_acting osd.6689 75.45( v 66245'4028 (49044'1025,66245'4028] local-les=61515 n=3994 ec=48759 les/c 66791/66791 67037/67051/67037)
2015-03-13 16:15:48.219609 7f3c2c252700 10 osd.6689 =67051 pi=66787-67050/14 crt=66226'4026 lcod 0'0 mlcod 0'0 peering] choose_acting failed

This can cause a pg to become incorrectly unavailable.

Files

Download all files

ceph.zip (957 KB) ceph.zip		Dan van der Ster, 03/13/2015 09:20 PM
ceph-osd.6689.log.gz (256 KB) ceph-osd.6689.log.gz		Dan van der Ster, 03/16/2015 12:10 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Samuel Just about 9 years ago

Description updated (diff)

Actions

Copy link

Updated by Dan van der Ster about 9 years ago

Here's the ceph.log for the day. Network shit hits fan around 09:36. First 3 PGs are incomplete at 09:39:16.013190. I set nodown/noout 09:48. By around 10:17 the network is good again -- and 10 PGs are incomplete. Here are those 10 now... they all have the same symptoms as pg 75.45:

pg 75.f6bc is incomplete, acting [7329,6461,2204]
pg 75.e8f6 is incomplete, acting [7351,4313,6746]
pg 75.ccce is incomplete, acting [7348,1678,7081]
pg 75.8da8 is incomplete, acting [1176,3694,571]
pg 75.8bce is incomplete, acting [7347,7007,2727]
pg 75.848a is incomplete, acting [635,7333,2239]
pg 75.65d3 is incomplete, acting [7344,1878,5646]
pg 75.511c is incomplete, acting [4178,1232,2036]
pg 75.306 is incomplete, acting [2081,4198,1736]
pg 75.45 is incomplete, acting [6689,1919,2329]

pool 75 'testbb3' replicated size 3 min_size 1 crush_ruleset 5 object_hash rjenkins pg_num 65536 pgp_num 65536 last_change 48759 flags hashpspool stripe_width 0

Triple confirmed all OSDs & MONs are running ceph-0.93-76.gc35f422.el6.x86_64.

Actions

Copy link

Updated by Dan van der Ster about 9 years ago

File ceph.zip ceph.zip added

here it is...

Actions

Copy link

Updated by Samuel Just about 9 years ago

https://github.com/athanatos/ceph/tree/wip-11110 might explain it. Would you consider attempting to reproduce it with logging on a clean pool? If I can get logging from where the last_epoch_started values become silly, I can confirm that my guess is right. wip-11110 also has a patch introducing an osd_find_best_info_ignore_history_les config option which when set on all of the primary osds for those pgs should cause the primary to ignore the offending history.last_epoch_started value and go active (I have not yet tried this config value, might destroy cluster, etc).

Actions

Copy link

Updated by Dan van der Ster about 9 years ago

File ceph-osd.6689.log.gz ceph-osd.6689.log.gz added

Hi Sam,
Enabling that new option then remove is causing one of your new asserts to trigger (after we run ceph osd pg-temp 75.45 6689)

     0> 2015-03-16 13:00:26.286619 7f0fed725700 -1 osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)'
 thread 7f0fed725700 time 2015-03-16 13:00:26.284909
osd/PG.cc: 287: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

 ceph version 0.93-102-g182bea0 (182bea04322537853f1d731141188fafb2b5dcc9)
 1: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x514) [0x853db4]
 2: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x232) [0x854002]
 3: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_i
mpl(boost::statechart::event_base const&, void const*)+0x201) [0x88d6e1]
 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::proces
s_queued_events()+0xdf) [0x87575f]
 5: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x337) [0x83b277]
 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x37c) [0x67d7cc]
 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x16) [0x6d7b26]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xbb48a6]
 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb64f0]
 10: /lib64/libpthread.so.0() [0x323e0079d1]
 11: (clone()+0x6d) [0x323dce88fd]

pg query is here: http://pastebin.com/NM38d4Ew
osd.6689 log is attached.

Actions

Copy link

Updated by Samuel Just about 9 years ago

I repushed a version with the asserts disabled when that config is enabled (7751c35d7297330159efaa11cf75fc429ac588e9). You'll want to enable to config on all osds which are handling the pg I think.

Actions

Copy link

Updated by Samuel Just about 9 years ago

Ok. I have generated incomplete pgs with:

#!/bin/bash

./stop.sh

rm -rf dev/*
mkdir dev

( ( ./stop.sh; CEPH_NUM_OSD=2 ./vstart.sh --localhost -n -x -d ; ) ) 2>&1
./ceph osd set nodown
./ceph osd set noup
./ceph osd pool create foo 128 128
./ceph osd pool set foo size 2
sleep 10
./rados -p foo bench 20 write -b 1
sleep 10
./ceph osd down 0
sleep 20
for i in 0 1 2; do
./ceph --admin-daemon out/osd.$i.asok config set osd_debug_delay_activate 1000
./ceph --admin-daemon out/osd.$i.asok config set osd_debug_delay_activate_prob 100
./ceph --admin-daemon out/osd.$i.asok config set osd_debug_find_best_info_ignore_primary 1
done
./ceph osd unset noup
sleep 20
./ceph osd set noup
./ceph osd down 0
sleep 20
for i in 0 1 2; do
./ceph --admin-daemon out/osd.$i.asok config set osd_debug_delay_activate 0
./ceph --admin-daemon out/osd.$i.asok config set osd_debug_delay_activate_prob 0
done
./ceph osd unset noup
./ceph -w

Note, this requires patches to delay peering at the right time and to avoid selecting the primary preferentially as the best log (has the effect, at least in simple cases, of avoiding the problem since it keeps the info the same -- probably explains why we haven't seen it in our testing). I haven't been able to work out a plausible sequence without the osd_debug_find_best_info_ignore_primary knob set yet (though it's not really hard to imagine that one exists). I also still need to confirm that wip-11110 fixes it.

A log from a healthy pg to the incomplete state on all involved osds would help massively. Unfortunately, if this is indeed the mechanism, you'll need a lot of pgs to make it likely that you hit the conditions. You'll want to age the pgs a little bit (a few writes on all pgs) and then shuffle the acting set around without writes going on on the pool.

Actions

Copy link

Updated by Samuel Just about 9 years ago

wip-11110 does seem to work with this test.

Actions

Copy link

Updated by Samuel Just about 9 years ago

The osd_find_best_info_ignore_history_les option did appear to successfully clean it up (though haven't tried it combined with the asserts from the rest of wip-11110).

Actions

Copy link

#10

Updated by Samuel Just about 9 years ago

Status changed from New to Resolved

A fix is merged into hammer.

Actions

Copy link

#11

Updated by Yuri Weinstein about 9 years ago

Status changed from Resolved to New

See the same in
Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-19_17:05:03-upgrade:giant-x-hammer-distro-basic-multi/
Jobs: ['812075', '812079', '812081']
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-19_17:05:03-upgrade:giant-x-hammer-distro-basic-multi/812075/

2015-03-20T06:59:02.293 INFO:tasks.ceph.osd.1.plana50.stderr:osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)' thread 7fd61de65700 time 2015-03-20 06:59:02.227682
2015-03-20T06:59:02.293 INFO:tasks.ceph.osd.1.plana50.stderr:osd/PG.cc: 288: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)
2015-03-20T06:59:02.349 INFO:tasks.ceph.osd.1.plana50.stderr: ceph version 0.93-158-gcfecd12 (cfecd125fe1846187285a640328f53ff70d33cca)
2015-03-20T06:59:02.349 INFO:tasks.ceph.osd.1.plana50.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaee7af]
2015-03-20T06:59:02.349 INFO:tasks.ceph.osd.1.plana50.stderr: 2: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x4fa) [0x79557a]
2015-03-20T06:59:02.350 INFO:tasks.ceph.osd.1.plana50.stderr: 3: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x2ad) [0x7b763d]
2015-03-20T06:59:02.350 INFO:tasks.ceph.osd.1.plana50.stderr: 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::GotLog>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::transition<PG::RecoveryState::IsIncomplete, PG::RecoveryState::Incomplete, boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>, &(boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>::no_function(PG::RecoveryState::IsIncomplete const&))> >, boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x104) [0x7ec134]
2015-03-20T06:59:02.350 INFO:tasks.ceph.osd.1.plana50.stderr: 5: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8d) [0x7ec27d]
2015-03-20T06:59:02.351 INFO:tasks.ceph.osd.1.plana50.stderr: 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x51) [0x7ca7f1]
2015-03-20T06:59:02.351 INFO:tasks.ceph.osd.1.plana50.stderr: 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x2b) [0x7ca9cb]
2015-03-20T06:59:02.351 INFO:tasks.ceph.osd.1.plana50.stderr: 8: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x77f3d3]
2015-03-20T06:59:02.351 INFO:tasks.ceph.osd.1.plana50.stderr: 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x274) [0x671454]
2015-03-20T06:59:02.352 INFO:tasks.ceph.osd.1.plana50.stderr: 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6c6ec2]
2015-03-20T06:59:02.352 INFO:tasks.ceph.osd.1.plana50.stderr: 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaddc8e]
2015-03-20T06:59:02.352 INFO:tasks.ceph.osd.1.plana50.stderr: 12: (ThreadPool::WorkThread::entry()+0x10) [0xae0a90]
2015-03-20T06:59:02.353 INFO:tasks.ceph.osd.1.plana50.stderr: 13: (()+0x7e9a) [0x7fd6363a0e9a]
2015-03-20T06:59:02.353 INFO:tasks.ceph.osd.1.plana50.stderr: 14: (clone()+0x6d) [0x7fd634b493fd]
2015-03-20T06:59:02.353 INFO:tasks.ceph.osd.1.plana50.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

#12

Updated by Yuri Weinstein about 9 years ago

Yuri Weinstein wrote:

See the same in
Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-19_17:05:03-upgrade:giant-x-hammer-distro-basic-multi/
Jobs: ['812075', '812079', '812081']
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-19_17:05:03-upgrade:giant-x-hammer-distro-basic-multi/812075/

[...]

Maybe latest commits were not picked up by this run?

Actions

Copy link

#13

Updated by Samuel Just about 9 years ago

Status changed from New to Resolved

Fixed in 9a2ff34d75cc69759584fb802f903068669f6233.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #11110

updated history.last_epoch_started but not info.last_epoch_started

Updated by Samuel Just about 9 years ago

Updated by Dan van der Ster about 9 years ago

Updated by Dan van der Ster about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Dan van der Ster about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Samuel Just about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Samuel Just about 9 years ago