Bug #14438
closed"FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)" in upgrade:hammer-x-jewel-distro-basic-openstack
0%
Description
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/
Job: 7344
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/7344/teuthology.log
zgrep -i "^ ceph version" /a/teuthology-2016-01-20_10:02:02-upgrade:hammer-x-jewel-distro-basic-openstack/7344/remote/target092194/log/ceph-osd.0.log.gz
194879580-2016-01-20 10:51:52.846913 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 334 194879673-2016-01-20 10:51:52.852331 7f5023f797c0 -1 osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f5023f797c0 time 2016-01-20 10:51:52.846933 194879836-osd/OSD.cc: 3151: FAILED assert(pg->info.history.same_interval_since == p.same_interval_since) 194879931- 194879932: ceph version 10.0.2-29-ge7107f8 (e7107f87151e9674502088bbe2d183ff4528647f) 194880008- 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f5023a651eb] 194880105- 2: (OSD::build_past_intervals_parallel()+0x1677) [0x7f50234a68d7] 194880172- 3: (OSD::load_pgs()+0x554) [0x7f50234aad64] 194880217- 4: (OSD::init()+0xf24) [0x7f50234bb214] 194880258- 5: (main()+0x297e) [0x7f502342c2de] 194880295- 6: (__libc_start_main()+0xf5) [0x7f50202a2ec5] 194880343- 7: (()+0x3137b7) [0x7f502346b7b7] 194880378- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 194880471- 194880472---- begin dump of recent events --- 194880508- -9999> 2016-01-20 10:51:52.562241 7f5023f797c0 20 _collection_list_partial start:0:-1/00000000//0 end:GHMAX-64 ls.size 0 194880630- -9998> 2016-01-20 10:51:52.562289 7f5023f797c0 20 list_by_hash_bitwise prefix D0000000 194880718- -9997> 2016-01-20 10:51:52.562301 7f5023f797c0 20 list_by_hash_bitwise prefix D0000000 ob 0:3/0000000d//head 194880828- -9996> 2016-01-20 10:51:52.562310 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1 194880916- -9995> 2016-01-20 10:51:52.562318 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1 ob 0:3/1a0717cd/target092002.ovh.sepia.ceph.com11169-23/9d 194881063- -9994> 2016-01-20 10:51:52.562327 7f5023f797c0 20 list_by_hash_bitwise prefix DC7170A1 ob 0:3/1a0717cd/target092002.ovh.sepia.ceph.com11169-23/head 194881212- -9993> 2016-01-20 10:51:52.562336 7f5023f797c0 20 filestore(/var/lib/ceph/osd/ceph-0) objects: 0x7ffc5fb66fd0 194881323- -9992> 2016-01-20 10:51:52.562345 7f5023f797c0 20 osd.0 334 clearing temps in 0.10_head pgid 0.10 194881423- -9991> 2016-01-20 10:51:52.562355 7f5023f797c0 20 filestore(/var/lib/ceph/osd/ceph-0) collection_list pool is 0 shard is 255 pgid 0.10 194881559- -9990> 2016-01-20 10:51:52.562364 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) collection_list first checking temp pool -- 196661778- -17> 2016-01-20 10:51:52.846229 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 330 196661879- -16> 2016-01-20 10:51:52.846238 7f5023f797c0 20 osd.0 0 get_map 330 - loading and decoding 0x7f5029a8ba80 196661988- -15> 2016-01-20 10:51:52.846247 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968fa5/osdmap.330/0 0~0 196662114- -14> 2016-01-20 10:51:52.846291 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968fa5/osdmap.330/0 0~7056/7056 196662259- -13> 2016-01-20 10:51:52.846305 7f5023f797c0 10 osd.0 0 add_map_bl 330 7056 bytes 196662344- -12> 2016-01-20 10:51:52.846360 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 331 196662445- -11> 2016-01-20 10:51:52.846373 7f5023f797c0 20 osd.0 0 get_map 331 - loading and decoding 0x7f5029a8ad00 196662554- -10> 2016-01-20 10:51:52.846382 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968f75/osdmap.331/0 0~0 196662680- -9> 2016-01-20 10:51:52.846435 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968f75/osdmap.331/0 0~7056/7056 196662825- -8> 2016-01-20 10:51:52.846448 7f5023f797c0 10 osd.0 0 add_map_bl 331 7056 bytes 196662910- -7> 2016-01-20 10:51:52.846504 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 332 196663011- -6> 2016-01-20 10:51:52.846521 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 333 196663112- -5> 2016-01-20 10:51:52.846530 7f5023f797c0 20 osd.0 0 get_map 333 - loading and decoding 0x7f5029a89f80 196663221- -4> 2016-01-20 10:51:52.846539 7f5023f797c0 15 filestore(/var/lib/ceph/osd/ceph-0) read meta/-1/ac968dd5/osdmap.333/0 0~0 196663347- -3> 2016-01-20 10:51:52.846583 7f5023f797c0 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read meta/-1/ac968dd5/osdmap.333/0 0~7056/7056 196663492- -2> 2016-01-20 10:51:52.846609 7f5023f797c0 10 osd.0 0 add_map_bl 333 7056 bytes 196663577- -1> 2016-01-20 10:51:52.846913 7f5023f797c0 10 osd.0 334 build_past_intervals_parallel epoch 334 196663678- 0> 2016-01-20 10:51:52.852331 7f5023f797c0 -1 osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f5023f797c0 time 2016-01-20 10:51:52.846933
Updated by Samuel Just over 8 years ago
- Assignee set to David Zafman
- Priority changed from Normal to Urgent
Updated by David Zafman over 8 years ago
We should create a commit that changes the assert(pg->info.history.same_interval_since == p.same_interval_since) to output information about which pg and the value of p.same_interval_since. There were 2 pg imports into osd.0. The earlier pg 0.11 was already fixed 7 minutes earlier. pg 1.16 was imported 20 seconds before osd.0 was restarted and hit the assert.
Updated by David Zafman over 8 years ago
- Status changed from New to In Progress
Trying to reproduce with branch wip-diag-14438 containing diagnostics.
Updated by David Zafman about 8 years ago
db3dc4d3e2a74cfd366cd2ef4d0c9c4581269113 was merged to jewel branch to get further diagnostics if this problem recurs.
Updated by David Zafman about 8 years ago
- Status changed from In Progress to Need More Info
Updated by Yuri Weinstein about 8 years ago
- Release set to infernalis
Also in run:
http://pulpito.ceph.com/teuthology-2016-02-25_17:10:10-upgrade:hammer-x-infernalis-distro-basic-vps/
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-02-25_17:10:10-upgrade:hammer-x-infernalis-distro-basic-vps/27453/teuthology.log
2016-02-25T20:07:39.118 INFO:tasks.ceph.osd.2.vpm111.stderr:osd/OSD.cc: In function 'void OSD::build_past_intervals_parallel()' thread 7f4ffd222940 time 2016-02-26 04:07:39.115536 2016-02-25T20:07:39.119 INFO:tasks.ceph.osd.2.vpm111.stderr:osd/OSD.cc: 3097: FAILED assert(pg->info.history.same_interval_since == p.same_interval_since) 2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: ceph version 9.2.0-132-gd148ed6 (d148ed64e4b865baceeadbe999dcba2ca956d932) 2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f4ffcd1e27b] 2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 2: (OSD::build_past_intervals_parallel()+0x1638) [0x7f4ffc791508] 2016-02-25T20:07:39.142 INFO:tasks.ceph.osd.2.vpm111.stderr: 3: (OSD::load_pgs()+0x12fb) [0x7f4ffc793dab] 2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 4: (OSD::init()+0xe74) [0x7f4ffc7a31d4] 2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 5: (main()+0x2954) [0x7f4ffc728084] 2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 6: (__libc_start_main()+0xf5) [0x7f4ff95aaec5] 2016-02-25T20:07:39.143 INFO:tasks.ceph.osd.2.vpm111.stderr: 7: (()+0x2f7ec7) [0x7f4ffc757ec7] 2016-02-25T20:07:39.144 INFO:tasks.ceph.osd.2.vpm111.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by David Zafman about 8 years ago
Unfortunately, I added the diagnostics to jewel branch so it didn't add anything in this last case of Infernalis.
Updated by David Zafman about 8 years ago
- Status changed from Need More Info to Fix Under Review
Updated by David Zafman about 8 years ago
Updated by David Zafman about 8 years ago
- Status changed from 12 to Resolved
0fd674bbf0c17a673be40123645adee3d64375a0
Updated by David Zafman about 8 years ago
- Status changed from Resolved to Pending Backport
- Backport set to hammer
Updated by David Zafman about 8 years ago
- Copied to Backport #15315: hammer: "FAILED assert(pg->info.history.same_interval_since == p.same_interval_since)" in upgrade:hammer-x-jewel-distro-basic-openstack added
Updated by Nathan Cutler almost 8 years ago
- Status changed from Pending Backport to Resolved