Bug #3072
osd/ReplicatedPG.cc: 3548: FAILED assert(waiting_for_ondisk.begin()->first == repop->v)
0%
Description
2012-09-03 16:55:13.818212 7f41c1ce8700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)' thread 7f41c1ce8700 time 2012-09-03 16:55:13.816130 osd/ReplicatedPG.cc: 3548: FAILED assert(waiting_for_ondisk.begin()->first == repop->v) ceph version 0.51-391-ge094090 (commit:e09409087f521464ae10e6787bd116b8327c9ff7) 1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x6e0) [0x5514d0] 2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x1d4) [0x5527d4] 3: (ReplicatedPG::sub_op_modify_reply(std::tr1::shared_ptr<OpRequest>)+0x172) [0x555042] 4: (ReplicatedPG::do_sub_op_reply(std::tr1::shared_ptr<OpRequest>)+0x82) [0x586042] 5: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x325) [0x655435] 6: (OSD::dequeue_op(PG*)+0x27c) [0x5ba1dc] 7: (ThreadPool::worker()+0x523) [0x7cbff3] 8: (ThreadPool::WorkThread::entry()+0xd) [0x5fa15d] 9: (()+0x7e9a) [0x7f41d256ee9a] 10: (clone()+0x6d) [0x7f41d09124bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ubuntu@teuthology:/a/sage-2012-09-03_14:23:13-regression-wip-3070-testing-basic/14933$ cat config.yaml kernel: &id001 kdb: true sha1: 995fc068ddf675260098c60591989bf2ee184338 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 fs: xfs log-whitelist: - slow request sha1: e09409087f521464ae10e6787bd116b8327c9ff7 workunit: sha1: e09409087f521464ae10e6787bd116b8327c9ff7 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana73.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoJnvRI1V0OJuQI9SosOedC7mj9O627LjoQWPKilJiBbHduPe1byBaKrgwTeEghl43VNf+EBs1+MwVH7zlDolnwN4tAlW9bRpC2SzURJfhZskp2CSQY3l8ca7a5f0J3hdOhx47oSSapN7O2cqmPzwlL/+MrFKGi+ITT613nUtzCjduZRPdhjyqZ0cQWeb0p1neDw5hbDBKd+HAH+ek/E6DK2PaqN6YAtmIgP76q0fQ85Omd0oDlmGXpKe3jlxlPT0W/5KD1+mpobPsh/EF2qar7IG/WqHHJ6NZAcXbdZ4KiMf9erP+Pk4KkD5SJ+e3GF7OEOwXtahKIIR1An4P2GD ubuntu@plana75.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/sBKIbaWlkUEbStD0wYVUj2aEuiP8WB0B4h4oyzOJaWaKSTPAK2hzAxEDVOkG1JhpR2JrfXitDtA7MW48NvP77Ov/EvOnTHBeTE7mvWL0D2d4/YUoqhF+RLojHgFNOE0FsVEc/2rhARYX9/4VL5YQ1kaE4dKeRqLxn/eA6BoW5+NDbdQ1Bt6qWNSTXYC2qs09do6wUXHbB+KE1Obay4QTGf77QA+ueVnAnKmYym5c5kGMqb7DD+I/OZyUcOWTCQ4sDpo2nh0GpHATqAAWXeFMSpJ0sVQmR5ByTpKsoRV3QxmxlNHBJVDrBoGbw7O0z8AisuwOfqzrOO5M3Q+16Gen ubuntu@plana76.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCoNlhzO4H0dC4/dcXgYUv2BA0azBZ8DQhoLeW3PWTI6bkmNKB5dR40BVtf7XcTwoLYsbKYKKf3hwQOo53TQNCOpiPjk2DyAIWUbth6JYraLWcDoIjsdGOjHi3hdjcCkIS3SzG/ZTgK8UwOu8fU2wdcuLNIV02jvU1t5n46Vw2Q1I8h4Am38x7+91UF41/E/ibHHxjFDbZMiqB75IWBaQumQzMmWkoH6z2xHQyWIxLUoHemxXGDCW2Mv4vaaHPsCfWuzbLLX+jhQRbOapWbaalM0/B2V0tieQGrunTlr9Pmve3+1Rga4RwB4cTYpmxYj+MvZqOSktky6Fpd2Dta+dr/ tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 snap_create: 50 snap_remove: 50 snap_rollback: 50 write: 100 ops: 4000
Related issues
Associated revisions
ReplicatedPG: fill in user log entry last after snapdir tran
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Samuel Just <sam.just@inktank.com>
osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
History
#1 Updated by Sage Weil over 11 years ago
ubuntu@teuthology:/a/sage-h/15556 and 15540
#2 Updated by Samuel Just over 11 years ago
- Status changed from New to 7
0aad5462eb79be0427004f2442903bb56c2057c1 should take care of this one.
#3 Updated by Sage Weil over 11 years ago
- Status changed from 7 to Resolved