Project

General

Profile

Bug #26958

osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().objects.find( so

Added by Sage Weil over 1 year ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] _handle_message: 0xfdb8540
2018-08-17 09:44:38.587 7f744a7c4700 20 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] do_op: op osd_op(client.4318.0:305955 3.e 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head [read 0~1048576] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e271) v8
2018-08-17 09:44:38.587 7f744a7c4700 20 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] op_has_sufficient_caps session=0x1083c580 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=1 need_write_cap=0 classes=[] -> yes
2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] do_op osd_op(client.4318.0:305955 3.e 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head [read 0~1048576] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e271) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected
2018-08-17 09:44:38.587 7f744a7c4700  7 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] object 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head v 27'328, recovering.
2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] start_recovery_op 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head
2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 272 start_recovery_op pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 rops=1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head (0/3 rops)
2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 rops=1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] recover_object: 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head
2018-08-17 09:44:38.587 7f744a7c4700  7 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 rops=1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] pull 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head v 27'328 on osds 1,4 from osd.1
2018-08-17 09:44:38.587 7f744a7c4700 10 osd.5 pg_epoch: 272 pg[3.e( v 41'1238 lc 21'45 (0'0,41'1238] local-lis/les=271/272 n=1238 ec=18/18 lis/c 271/89 les/c/f 272/90/0 270/271/271) [5,4] r=0 lpr=271 pi=[89,271)/1 rops=1 crt=41'1238 mlcod 0'0 active+recovering m=38 mbc={255={(1+1)=38}}] pulling soid 3:7e3cf7a0:::benchmark_data_smithi196_13013_object5325:head from osd 1 at version 0'0 rather than at version 27'328
2018-08-17 09:44:38.595 7f744a7c4700 -1 /build/ceph-14.0.0-2276-g18b3b5c/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::prepare_pull(eversion_t, const hobject_t&, ObjectContextRef, ReplicatedBackend::RPGHandle*)' thread 7f744a7c4700 time 2018-08-17 09:44:38.588929
2018-08-17 09:44:38.603 7f744a7c4700 -1 *** Caught signal (Aborted) **

/a/sage-2018-08-16_19:51:38-rados-wip-sage3-testing-2018-08-16-1315-distro-basic-smithi/2910807

It looks liek the primary's peer_missing is wrong somehow?


Related issues

Copied to RADOS - Backport #39537: luminous: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().object Resolved
Copied to RADOS - Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().objects.f Resolved
Copied to RADOS - Backport #39539: nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().object Resolved

History

#1 Updated by Sage Weil over 1 year ago

/a/sage-2018-10-12_13:16:07-rados-wip-sage-testing-2018-10-11-1437-distro-basic-smithi/3131789

#3 Updated by Neha Ojha 10 months ago

  • Status changed from 12 to Fix Under Review
  • Pull request ID set to 27702

#4 Updated by Sage Weil 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus, mimic, luminous

#5 Updated by Nathan Cutler 10 months ago

DEBUG:root:backports for 26958 is nautilus, mimic, luminous {'nautilus', 'luminous', 'mimic'}
DEBUG:urllib3.connectionpool:http://tracker.ceph.com:80 "GET /issues/26958/relations.json?key=9626798fb97bf94f35f9e942844685df2bb2cb37&issue_id=26958&limit=100&offset=0 HTTP/1.1" 200 16
DEBUG:urllib3.connectionpool:http://tracker.ceph.com:80 "POST /projects/36/issues.json?key=9626798fb97bf94f35f9e942844685df2bb2cb37 HTTP/1.1" 422 62
Traceback (most recent call last):
  File "/home/smithfarm/bin/backport-create-issue", line 258, in <module>
    iterate_over_backports(redmine, issues, args.dry_run)
  File "/home/smithfarm/bin/backport-create-issue", line 230, in iterate_over_backports
    update_relations(r, issue, dry_run)
  File "/home/smithfarm/bin/backport-create-issue", line 202, in update_relations
    "value": release,
  File "/usr/lib/python3.6/site-packages/redminelib/managers/base.py", line 162, in create
    response = self.redmine.engine.request(self.resource_class.http_method_create, url, data=request)
  File "/usr/lib/python3.6/site-packages/redminelib/engines/base.py", line 76, in request
    return self.process_response(self.session.request(method, url, **kwargs))
  File "/usr/lib/python3.6/site-packages/redminelib/engines/base.py", line 166, in process_response
    raise exceptions.ValidationError(', '.join(': '.join(e) if isinstance(e, list) else e for e in errors))
redminelib.exceptions.ValidationError: Subject is too long (maximum is 255 characters)

Apparently, the title of this bug is something like 255 characters long, so the backport-create-issue script vomits when it tries to prepend "nautilus: " to it. This, obviously, is a bug in the script.

#6 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #39537: luminous: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().object added

#7 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().objects.f added

#8 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #39539: nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().object added

#9 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF