Project

General

Profile

Actions

Bug #10718

closed

osd/PGLog.h: 279: FAILED assert(log.log.size() == log_keys_debug.size())

Added by Samuel Just about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.91-1098-gd3b7616 (d3b76165be4b0c59ff7c292f6534a3aa4e2cd27a)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xae582f]
2: ceph-osd() [0x73b5d2]
3: (PGLog::write_log(ObjectStore::Transaction&, coll_t const&, ghobject_t const&)+0x110) [0x7363d0]
4: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x670b90]
5: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6c65d2]
6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xad4d7e]
7: (ThreadPool::WorkThread::entry()+0x10) [0xad7b10]
8: (()+0x7e9a) [0x7fdde5530e9a]
9: (clone()+0x6d) [0x7fdde3cd93fd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

21> 2015-01-31 01:18:24.547157 7fdddb15a700 20 journal do_write latency 0.007027
-20> 2015-01-31 01:18:24.547164 7fdddb15a700 20 journal do_write queueing finishers through seq 44180
-19> 2015-01-31 01:18:24.547165 7fdddb15a700 10 journal queue_completions_thru seq 44180 queueing seq 44180 0x5683840 lat 0.007077
-18> 2015-01-31 01:18:24.547174 7fdddb15a700 5 journal put_throttle finished 1 ops and 1123 bytes, now 0 ops and 0 bytes
-17> 2015-01-31 01:18:24.547181 7fdddb15a700 20 journal write_thread_entry going to sleep
-16> 2015-01-31 01:18:24.547182 7fddda959700 5 filestore(/var/lib/ceph/osd/ceph-2) _journaled_ahead 0x707d0a0 seq 44180 osr(0.74 0x53eb030) 0x6bb7e00
-15> 2015-01-31 01:18:24.547190 7fddda959700 5 filestore(/var/lib/ceph/osd/ceph-2) queue_op 0x707d0a0 seq 44180 osr(0.74 0x53eb030) 1109 bytes (queue has 1 ops and 1109 bytes)
-14> 2015-01-31 01:18:24.547200 7fddda959700 10 filestore(/var/lib/ceph/osd/ceph-2) queueing ondisk 0x5c0d500
-13> 2015-01-31 01:18:24.547206 7fddda158700 10 journal op_apply_start 44180 open_ops 0 -> 1
-12> 2015-01-31 01:18:24.547210 7fddda158700 5 filestore(/var/lib/ceph/osd/ceph-2) _do_op 0x707d0a0 seq 44180 osr(0.74 0x53eb030)/0x53eb030 start
-11> 2015-01-31 01:18:24.547215 7fddda158700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction on 0x6bb7e00
-10> 2015-01-31 01:18:24.547225 7fddda158700 15 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys 0.74_head/74//head//0
-9> 2015-01-31 01:18:24.547261 7fddda158700 10 filestore oid: 74//head//0 not skipping op, *spos 44180.0.1
-8> 2015-01-31 01:18:24.547263 7fddda158700 10 filestore > header.spos 0.0.0
-7> 2015-01-31 01:18:24.547312 7fddda158700 20 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys 0.74_head/74//head//0 = 0
-6> 2015-01-31 01:18:24.547317 7fddda158700 10 journal op_apply_finish 44180 open_ops 1 -> 0, max_applied_seq 44179 -> 44180
-5> 2015-01-31 01:18:24.547319 7fddda158700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_op 0x707d0a0 seq 44180 r = 0, finisher 0x6069aa0 0
-4> 2015-01-31 01:18:24.547322 7fddda158700 10 filestore(/var/lib/ceph/osd/ceph-2) _finish_op 0x707d0a0 seq 44180 osr(0.74 0x53eb030)/0x53eb030
-3> 2015-01-31 01:18:24.547337 7fddd9156700 10 osd.2 pg_epoch: 1099 pg[0.74( empty local-les=1097 n=0 ec=1 les/c 1065/1068 1094/1094/1094) [4,3,2] r=2 lpr=1094 pi=1064-1093/1 crt=0'0 inactive] _activate_committed 1099 telling primary
-2> 2015-01-31 01:18:24.547356 7fddd9156700 20 osd.2 1099 share_map_peer 0x5df6420 already has epoch 1099
-1> 2015-01-31 01:18:24.547358 7fddd9156700 1 -
10.214.132.2:6809/14117 --> 10.214.132.23:6801/10057 -- pg_info(1 pgs e1099:0.74) v4 -- ?+0 0x6dbc000 con 0x5df6420
0> 2015-01-31 01:18:24.547373 7fddd9156700 10 osd.2 pg_epoch: 1099 pg[0.74( empty local-les=1097 n=0 ec=1 les/c 1065/1068 1094/1094/1094) [4,3,2] r=2 lpr=1094 pi=1064-1093/1 crt=0'0 active] flushed
--- logging levels ---

wip-sam-working with min_size and degraded writes branches and some other bug fixes. I suspect this is not related to those however. Possibly of interest:

-51> 2015-01-31 01:18:24.540220 7fddcc7f6700 10 merge_log log((0'0,760'153], crt=760'153) from osd.0 into log((696'148,760'153], crt=696'148)
-50> 2015-01-31 01:18:24.540222 7fddcc7f6700 10 merge_log extending tail to 0'0
-49> 2015-01-31 01:18:24.540225 7fddcc7f6700 15 11'56 (0'0) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:1153 2015-01-31 00:42:48.963228
-48> 2015-01-31 01:18:24.540231 7fddcc7f6700 15 11'57 (11'56) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:1154 2015-01-31 00:42:48.981092
-47> 2015-01-31 01:18:24.540235 7fddcc7f6700 15 11'58 (11'57) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:1155 2015-01-31 00:42:48.981162
-46> 2015-01-31 01:18:24.540238 7fddcc7f6700 15 369'119 (11'58) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5299 2015-01-31 00:56:19.901271
-45> 2015-01-31 01:18:24.540242 7fddcc7f6700 15 369'120 (369'119) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5300 2015-01-31 00:56:19.920461
-44> 2015-01-31 01:18:24.540246 7fddcc7f6700 15 369'121 (369'120) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5301 2015-01-31 00:56:19.930051
-43> 2015-01-31 01:18:24.540249 7fddcc7f6700 15 369'122 (369'121) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5302 2015-01-31 00:56:19.930118
-42> 2015-01-31 01:18:24.540253 7fddcc7f6700 15 389'123 (369'122) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5506 2015-01-31 00:56:50.851106
-41> 2015-01-31 01:18:24.540257 7fddcc7f6700 15 389'124 (389'123) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5507 2015-01-31 00:56:50.873646
-40> 2015-01-31 01:18:24.540261 7fddcc7f6700 15 389'125 (389'124) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5508 2015-01-31 00:56:50.889520
-39> 2015-01-31 01:18:24.540264 7fddcc7f6700 15 389'126 (389'125) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:5509 2015-01-31 00:56:50.889587
-38> 2015-01-31 01:18:24.540269 7fddcc7f6700 15 497'131 (389'126) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:6170 2015-01-31 00:59:59.387999
-37> 2015-01-31 01:18:24.540273 7fddcc7f6700 15 497'132 (497'131) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:6171 2015-01-31 00:59:59.410810
-36> 2015-01-31 01:18:24.540276 7fddcc7f6700 15 497'133 (497'132) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:6172 2015-01-31 00:59:59.423980
-35> 2015-01-31 01:18:24.540280 7fddcc7f6700 15 497'134 (497'133) modify c7bf9e62/plana7618503-230/head//1 by client.4119.0:6173 2015-01-31 00:59:59.424047
-34> 2015-01-31 01:18:24.540284 7fddcc7f6700 15 623'140 (497'134) delete c7bf9e62/plana7618503-230/head//1 by client.4119.0:7479 2015-01-31 01:05:22.824251

ubuntu@teuthology:/a/samuelj-2015-01-30_20:24:13-rados-wip-sam-working-wip-testing-vanilla-fixes-basic-multi/733097/remote

I'm going to add a branch into the mix to hopefully give me some more insight.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #12652: osd/PGLog.cc: 732: FAILED assert(log.log.size() == log_keys_debug.size())Resolved08/09/2015

Actions
Actions #1

Updated by Samuel Just about 9 years ago

  • Status changed from New to Can't reproduce
Actions #2

Updated by Samuel Just about 9 years ago

  • Status changed from Can't reproduce to 12
Actions #3

Updated by Sage Weil about 9 years ago

  • Status changed from 12 to Pending Backport
  • Source changed from other to Q/A
  • Backport set to firefly
Actions #6

Updated by Loïc Dachary almost 9 years ago

  • 69d9339 PGLog: improve PGLog::check() debugging (in firefly),
Actions #7

Updated by Loïc Dachary almost 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF