Project

General

Profile

Bug #3072

osd/ReplicatedPG.cc: 3548: FAILED assert(waiting_for_ondisk.begin()->first == repop->v)

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2012-09-03 16:55:13.818212 7f41c1ce8700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)' thread 7f41c1ce8700 time 2012-09-03 16:55:13.816130
osd/ReplicatedPG.cc: 3548: FAILED assert(waiting_for_ondisk.begin()->first == repop->v)

 ceph version 0.51-391-ge094090 (commit:e09409087f521464ae10e6787bd116b8327c9ff7)
 1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x6e0) [0x5514d0]
 2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x1d4) [0x5527d4]
 3: (ReplicatedPG::sub_op_modify_reply(std::tr1::shared_ptr<OpRequest>)+0x172) [0x555042]
 4: (ReplicatedPG::do_sub_op_reply(std::tr1::shared_ptr<OpRequest>)+0x82) [0x586042]
 5: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x325) [0x655435]
 6: (OSD::dequeue_op(PG*)+0x27c) [0x5ba1dc]
 7: (ThreadPool::worker()+0x523) [0x7cbff3]
 8: (ThreadPool::WorkThread::entry()+0xd) [0x5fa15d]
 9: (()+0x7e9a) [0x7f41d256ee9a]
 10: (clone()+0x6d) [0x7f41d09124bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/sage-2012-09-03_14:23:13-regression-wip-3070-testing-basic/14933$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 995fc068ddf675260098c60591989bf2ee184338
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
    fs: xfs
    log-whitelist:
    - slow request
    sha1: e09409087f521464ae10e6787bd116b8327c9ff7
  workunit:
    sha1: e09409087f521464ae10e6787bd116b8327c9ff7
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana73.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoJnvRI1V0OJuQI9SosOedC7mj9O627LjoQWPKilJiBbHduPe1byBaKrgwTeEghl43VNf+EBs1+MwVH7zlDolnwN4tAlW9bRpC2SzURJfhZskp2CSQY3l8ca7a5f0J3hdOhx47oSSapN7O2cqmPzwlL/+MrFKGi+ITT613nUtzCjduZRPdhjyqZ0cQWeb0p1neDw5hbDBKd+HAH+ek/E6DK2PaqN6YAtmIgP76q0fQ85Omd0oDlmGXpKe3jlxlPT0W/5KD1+mpobPsh/EF2qar7IG/WqHHJ6NZAcXbdZ4KiMf9erP+Pk4KkD5SJ+e3GF7OEOwXtahKIIR1An4P2GD
  ubuntu@plana75.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/sBKIbaWlkUEbStD0wYVUj2aEuiP8WB0B4h4oyzOJaWaKSTPAK2hzAxEDVOkG1JhpR2JrfXitDtA7MW48NvP77Ov/EvOnTHBeTE7mvWL0D2d4/YUoqhF+RLojHgFNOE0FsVEc/2rhARYX9/4VL5YQ1kaE4dKeRqLxn/eA6BoW5+NDbdQ1Bt6qWNSTXYC2qs09do6wUXHbB+KE1Obay4QTGf77QA+ueVnAnKmYym5c5kGMqb7DD+I/OZyUcOWTCQ4sDpo2nh0GpHATqAAWXeFMSpJ0sVQmR5ByTpKsoRV3QxmxlNHBJVDrBoGbw7O0z8AisuwOfqzrOO5M3Q+16Gen
  ubuntu@plana76.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCoNlhzO4H0dC4/dcXgYUv2BA0azBZ8DQhoLeW3PWTI6bkmNKB5dR40BVtf7XcTwoLYsbKYKKf3hwQOo53TQNCOpiPjk2DyAIWUbth6JYraLWcDoIjsdGOjHi3hdjcCkIS3SzG/ZTgK8UwOu8fU2wdcuLNIV02jvU1t5n46Vw2Q1I8h4Am38x7+91UF41/E/ibHHxjFDbZMiqB75IWBaQumQzMmWkoH6z2xHQyWIxLUoHemxXGDCW2Mv4vaaHPsCfWuzbLLX+jhQRbOapWbaalM0/B2V0tieQGrunTlr9Pmve3+1Rga4RwB4cTYpmxYj+MvZqOSktky6Fpd2Dta+dr/
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      snap_create: 50
      snap_remove: 50
      snap_rollback: 50
      write: 100
    ops: 4000

Related issues

Duplicated by Ceph - Bug #2956: osd:FAILED assert(waiting_for_ondisk.begin()->first == repop->v) Resolved 08/16/2012

Associated revisions

Revision 0aad5462 (diff)
Added by Samuel Just over 11 years ago

ReplicatedPG: fill in user log entry last after snapdir tran

The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.

This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.

Signed-off-by: Samuel Just <>

Revision 03136d05 (diff)
Added by Sage Weil over 11 years ago

osd: fill in user log entry last after snapdir tran

Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).

The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.

This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.

Signed-off-by: Sage Weil <>
Reviewed-by: Samuel Just <>

Revision 656ab158 (diff)
Added by Sage Weil over 11 years ago

osd: fill in user log entry last after snapdir tran

Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).

The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.

This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.

Signed-off-by: Sage Weil <>
Reviewed-by: Samuel Just <>

History

#1 Updated by Sage Weil over 11 years ago

ubuntu@teuthology:/a/sage-h/15556 and 15540

#2 Updated by Samuel Just over 11 years ago

  • Status changed from New to 7

0aad5462eb79be0427004f2442903bb56c2057c1 should take care of this one.

#3 Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF