Project

General

Profile

Bug #14438

"FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)" in upgrade:hammer-x-jewel-distro-basic-openstack

Added by Yuri Weinstein almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/hammer-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/
Job: 7344
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/7344/teuthology.log

zgrep -i "^ ceph version" /a/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/7344/remote/target092194/log/ceph-osd.0.log.gz

194879580-2016-01-20 10:51:52.846913 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 334
194879673-2016-01-20 10:51:52.852331 7f5023f797c0 -1 osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f5023f797c0 time 2016-01-20 10:51:52.846933
194879836-osd/OSD.cc: 3151: FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)
194879931-
194879932: ceph version 10.0.2-29-ge7107f8 (e7107f87151e9674502088bbe2d183ff4528647f)
194880008- 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f5023a651eb]
194880105- 2: (OSD::build_past_intervals_parallel()+0x1677) [0x7f50234a68d7]
194880172- 3: (OSD::load_pgs()+0x554) [0x7f50234aad64]
194880217- 4: (OSD::init()+0xf24) [0x7f50234bb214]
194880258- 5: (main()+0x297e) [0x7f502342c2de]
194880295- 6: (__libc_start_main()+0xf5) [0x7f50202a2ec5]
194880343- 7: (()+0x3137b7) [0x7f502346b7b7]
194880378- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
194880471-
194880472---- begin dump of recent events ---
194880508- -9999> 2016-01-20 10:51:52.562241 7f5023f797c0 20 _collection_list_partial start:0:-1/00000000//0 end:GHMAX-64 ls.size 0
194880630- -9998> 2016-01-20 10:51:52.562289 7f5023f797c0 20 list_by_hash_bitwise prefix D0000000
194880718- -9997> 2016-01-20 10:51:52.562301 7f5023f797c0 20 list_by_hash_bitwise prefix D0000000 ob 0:3/0000000d//head
194880828- -9996> 2016-01-20 10:51:52.562310 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1
194880916- -9995> 2016-01-20 10:51:52.562318 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1 ob 0:3/1a0717cd/target092002.ovh.sepia.ceph.com11169-23/9d
194881063- -9994> 2016-01-20 10:51:52.562327 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1 ob 0:3/1a0717cd/target092002.ovh.sepia.ceph.com11169-23/head
194881212- -9993> 2016-01-20 10:51:52.562336 7f5023f797c0 20 filestore(/var/lib/ceph/osd/ceph-0) objects: 0x7ffc5fb66fd0
194881323- -9992> 2016-01-20 10:51:52.562345 7f5023f797c0 20 osd.0 334  clearing temps in 0.10_head pgid 0.10
194881423- -9991> 2016-01-20 10:51:52.562355 7f5023f797c0 20 filestore(/var/lib/ceph/osd/ceph-0) collection_list pool is 0 shard is 255 pgid 0.10
194881559- -9990> 2016-01-20 10:51:52.562364 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) collection_list first checking temp pool
--
196661778-   -17> 2016-01-20 10:51:52.846229 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 330
196661879-   -16> 2016-01-20 10:51:52.846238 7f5023f797c0 20 osd.0 0 get_map 330 - loading and decoding 0x7f5029a8ba80
196661988-   -15> 2016-01-20 10:51:52.846247 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968fa5/osdmap.330/0 0~0
196662114-   -14> 2016-01-20 10:51:52.846291 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968fa5/osdmap.330/0 0~7056/7056
196662259-   -13> 2016-01-20 10:51:52.846305 7f5023f797c0 10 osd.0 0 add_map_bl 330 7056 bytes
196662344-   -12> 2016-01-20 10:51:52.846360 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 331
196662445-   -11> 2016-01-20 10:51:52.846373 7f5023f797c0 20 osd.0 0 get_map 331 - loading and decoding 0x7f5029a8ad00
196662554-   -10> 2016-01-20 10:51:52.846382 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968f75/osdmap.331/0 0~0
196662680-    -9> 2016-01-20 10:51:52.846435 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968f75/osdmap.331/0 0~7056/7056
196662825-    -8> 2016-01-20 10:51:52.846448 7f5023f797c0 10 osd.0 0 add_map_bl 331 7056 bytes
196662910-    -7> 2016-01-20 10:51:52.846504 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 332
196663011-    -6> 2016-01-20 10:51:52.846521 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 333
196663112-    -5> 2016-01-20 10:51:52.846530 7f5023f797c0 20 osd.0 0 get_map 333 - loading and decoding 0x7f5029a89f80
196663221-    -4> 2016-01-20 10:51:52.846539 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968dd5/osdmap.333/0 0~0
196663347-    -3> 2016-01-20 10:51:52.846583 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968dd5/osdmap.333/0 0~7056/7056
196663492-    -2> 2016-01-20 10:51:52.846609 7f5023f797c0 10 osd.0 0 add_map_bl 333 7056 bytes
196663577-    -1> 2016-01-20 10:51:52.846913 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 334
196663678-     0> 2016-01-20 10:51:52.852331 7f5023f797c0 -1 osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f5023f797c0 time 2016-01-20 10:51:52.846933


Related issues

Copied to Ceph - Backport #15315: hammer: "FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)" in upgrade:hammer-x-jewel-distro-basic-openstack Resolved

History

#1 Updated by Yuri Weinstein almost 7 years ago

  • Description updated (diff)

#2 Updated by Samuel Just almost 7 years ago

  • Assignee set to David Zafman
  • Priority changed from Normal to Urgent

#3 Updated by David Zafman almost 7 years ago

We should create a commit that changes the assert(pg->info.history.same_interval_since == p.same_interval_since) to output information about which pg and the value of p.same_interval_since. There were 2 pg imports into osd.0. The earlier pg 0.11 was already fixed 7 minutes earlier. pg 1.16 was imported 20 seconds before osd.0 was restarted and hit the assert.

#4 Updated by David Zafman almost 7 years ago

  • Status changed from New to In Progress

Trying to reproduce with branch wip-diag-14438 containing diagnostics.

#5 Updated by David Zafman almost 7 years ago

db3dc4d3e2a74cfd366cd2ef4d0c9c4581269113 was merged to jewel branch to get further diagnostics if this problem recurs.

#6 Updated by David Zafman almost 7 years ago

  • Status changed from In Progress to Need More Info

#7 Updated by Yuri Weinstein almost 7 years ago

  • Release set to infernalis

Also in run:
http://pulpito.ceph.com/teuthology-2016-02-25_17:10:10-upgrade:hammer-x-infernalis-distro-basic-vps/
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-02-25_17:10:10-upgrade:hammer-x-infernalis-distro-basic-vps/27453/teuthology.log

2016-02-25T20:07:39.118 INFO:tasks.ceph.osd.2.vpm111.stderr:osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f4ffd222940 time 2016-02-26 04:07:39.115536
2016-02-25T20:07:39.119 INFO:tasks.ceph.osd.2.vpm111.stderr:osd/OSD.cc: 3097: FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)
2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: ceph version 9.2.0-132-gd148ed6 (d148ed64e4b865baceeadbe999dcba2ca956d932)
2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f4ffcd1e27b]
2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 2: (OSD::build_past_intervals_parallel()+0x1638) [0x7f4ffc791508]
2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 3: (OSD::load_pgs()+0x12fb) [0x7f4ffc793dab]
2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 4: (OSD::init()+0xe74) [0x7f4ffc7a31d4]
2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 5: (main()+0x2954) [0x7f4ffc728084]
2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 6: (__libc_start_main()+0xf5) [0x7f4ff95aaec5]
2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 7: (()+0x2f7ec7) [0x7f4ffc757ec7]
2016-02-25T20:07:39.144 INFO:tasks.ceph.osd.2.vpm111.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#8 Updated by David Zafman almost 7 years ago

Unfortunately, I added the diagnostics to jewel branch so it didn't add anything in this last case of Infernalis.

#9 Updated by David Zafman over 6 years ago

  • Status changed from Need More Info to Fix Under Review

#11 Updated by Samuel Just over 6 years ago

  • Status changed from Fix Under Review to 12

PR closed?

#12 Updated by David Zafman over 6 years ago

  • Status changed from 12 to Resolved

0fd674bbf0c17a673be40123645adee3d64375a0

#13 Updated by David Zafman over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer

#14 Updated by David Zafman over 6 years ago

  • Copied to Backport #15315: hammer: "FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)" in upgrade:hammer-x-jewel-distro-basic-openstack added

#15 Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF