Bug #7907
osd: rollback to head didn't mark_unrollbackable
0%
Description
[ERR] 161.38 push f2203eb8/plana6225762-445/head//161
archive_path: /var/lib/teuthworker/archive/teuthology-2014-03-28_22:35:22-upgrade:dumpling-x:stress-split-firefly---basic-plana/152001 description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-import-export.yaml 6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml} email: null job_id: '152001' last_in_suite: false machine_type: plana name: teuthology-2014-03-28_22:35:22-upgrade:dumpling-x:stress-split-firefly---basic-plana nuke-on-error: true os_type: ubuntu os_version: '12.04' overrides: admin_socket: branch: firefly ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8 ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8 s3tests: branch: master workunit: sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8 owner: scheduled_teuthology@teuthology roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - mon.c - - client.0 targets: ubuntu@plana62.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSC3e8HvnLeaoPWNNoKjTmvf5ebfvXKGCgTNCjPeI9Qtl9pnLXo5XniCH6tTFWpYMQLtesPysgoQxCHEOEzui/YHwXfq3JPWIPMlANLNBH1e6RRqjHC+mIzSNr0nP295AUFZycyAIxRDRUWwz8IWaPtEjqohd+oojP/uBklTaHyDEZ84nEabMXy1giwPwjQeEYrTNBtuTDTkOZryrrP6XxoBEKK/P3ZyD54ouSdUb2K3XVzVqrNdf4+pUh6TuctCPpbmewEuW4tygAPZhaNmHujMkhT7DsfIcTFR/dDig0ubYlGExyiOsrXF1eRHLKGPMYl6Y7ZFTLOOkD1OBVeJ8/ ubuntu@plana69.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCnoYSVTeXHpYqSOxvnFLq3x0YBY1ZYRCnJPWGlyYubMHT60crIyLLB9hc1rEbWXRtkIj4hpKNiVTcoFC1UOpJrDqfJmY7XfC5SYVeJMLKlnMprkaCgC/z/4OfavzP7zZAkShnvLrKj5ZQ2C2Jni2Ox6Y3tG/DgkRH7nyqZF54VaoJnatPYcvoh43f9m2yRtPiqytn7B1umPsoWwcLlprmbYFmJpphaZBc1IvYUTANTbi7ItYLN6stj3RQw5uZHeE64u/buVUgubPSKsQi31QVEAPdbbHS1v+5WUX6KX18Tgs5vV9M+g0+1wKgWe5Hy4OrPrnLv5r378+auy+ulTkG1 ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUaRgJmTQ9QH0nmaEOyIo3FlyrkJEZseFyHVdtziq4VgksZNdWf9ebVaQPjBfuk94VHrKvPN5udeXU5/DIvTGOYFUdK0QbZ8RdzyvMQiTxnEVxmPk4xcqhxktymFQc4BpyNa+/xgTXhCvG6rLCRucNfwMbgnMqpGlkUZOIa7MdHJTRf0kY0gBGwNF0gRExU/4Ub0TAc++0psIK/P5EwbJRgozdzDDOQo13T4LtEOswmbPXYWPLdZivWs4byKpvPP5XJPB6qRLUpgKfKfL9g6waaOVOf6RIO6TnWlYnIBkLyxi4NfY0pWWI4tvAkAYVCz8foUsSzLLMR+zhaUDAHusX tasks: - internal.lock_machines: - 3 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rbd/test_librbd.sh - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: idle_timeout: 120 - swift: client.0: rgw_server: client.0 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 teuthology_branch: firefly verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.16943
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-import-export.yaml 6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml} duration: 6608.918809175491 failure_reason: '"2014-03-29 05:24:23.501635 osd.2 10.214.133.36:6810/5684 1 : [ERR] 161.38 push f2203eb8/plana6225762-445/head//161 v 1840''266 failed because local copy is 2186''396" in cluster log' flavor: basic owner: scheduled_teuthology@teuthology success: false
Associated revisions
osd/ReplicatedPG: mark_unrollbackable when _rollback_to head
We fell into the case in _rollback_to where we just set ctx->modify = true
and don't explicitly mark the ctx and unrollbackable. Later, we screw up
in proc_replica_log as a result because we think we can rollback this
update to the head when in reality we cannot.
Fixes: #7907
Signed-off-by: Sage Weil <sage@inktank.com>
History
#1 Updated by Ian Colle almost 10 years ago
- Priority changed from Normal to Urgent
#2 Updated by Sage Weil almost 10 years ago
- Assignee set to Sage Weil
#3 Updated by Sage Weil almost 10 years ago
- Subject changed from "failed because local copy is 2186''396" in cluster log" in upgrade:dumpling-x:stress-split-firefly---basic-plana to osd: rollback to head didn't mark_unrollbackable
- Status changed from New to 7
- Assignee changed from Sage Weil to Samuel Just
- Source changed from other to Q/A
the original modify:
2014-03-29 05:22:27.680866 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] populate_obc_watchers f2203eb8/plana6225762-445/head//161 2014-03-29 05:22:27.680881 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc f2203eb8/plana6225762-445/h ead//161 2014-03-29 05:22:27.680893 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] get_object_context 0x5493000 f2203eb8/plana6225762-445/head//161 rwstate(none n= 0 w=0) oi:f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185) 0 -> 1 read f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185) 2014-03-29 05:22:27.680914 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context f2203eb8/plana6225762-445/head//161 @head oi=f2203eb8/plana6 225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185) 2014-03-29 05:22:27.680951 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] execute_ctx 0x4e9d100 2014-03-29 05:22:27.680978 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_op f2203eb8/plana6225762-445/head//161 [rollback 1ad] ov 1840'266 av 2186'395 snapc 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] snapset aa=[aa,a9,a8,a7,9e,78]:[]+head 2014-03-29 05:22:27.680997 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op f2203eb8/plana6225762-445/head//161 [rollback 1ad] 2014-03-29 05:22:27.681009 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op rollback 1ad 2014-03-29 05:22:27.681021 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] _rollback_to f2203eb8/plana6225762-445/head//161 snapid 1ad 2014-03-29 05:22:27.681035 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context f2203eb8/plana6225762-445/1ad//161 @1ad snapset aa=[aa,a9,a8 ,a7,9e,78]:[]+head 2014-03-29 05:22:27.681050 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] get_object_context 0x5493000 f2203eb8/plana6225762-445/head//161 rwstate(write n =1 w=0) oi:f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185) 2014-03-29 05:22:27.681072 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context f2203eb8/plana6225762-445/head//161 want 1ad > snapset seq aa -- HIT f2203eb8/plana6225762-445/head//161 2014-03-29 05:22:27.681087 7f94a9404700 15 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op_effects on session 0x3833e00 2014-03-29 05:22:27.681100 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] make_writeable f2203eb8/plana6225762-445/head//161 snapset=0x37fd1f8 snapc=1dc= [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] 2014-03-29 05:22:27.681146 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] cloning v 1840'266 to f2203eb8/plana6225762-445/1dc//161 v 2186'395 snaps=[1dc, 1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] 2014-03-29 05:22:27.681173 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] make_writeable f2203eb8/plana6225762-445/head//161 done, snapset=1dc=[1dc,1da,1d 8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]:[1dc]+head 2014-03-29 05:22:27.681188 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] finish_ctx f2203eb8/plana6225762-445/head//161 0x4e9d100 op modify 2014-03-29 05:22:27.681204 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] set mtime to 2014-03-29 05:22:27.679945 2014-03-29 05:22:27.681225 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] final snapset 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6, 19d,19a,194,178]:[1dc]+head in f2203eb8/plana6225762-445/head//161 2014-03-29 05:22:27.681247 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] zeroing write result code 0 2014-03-29 05:22:27.681261 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] op order client.4650 tid 4039 (first) 2014-03-29 05:22:27.681273 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] new_repop rep_tid 453 on osd_op(client.4650.0:4039 plana6225762-445 [rollback 1a d] 161.f2203eb8 snapc 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] ack+write e2186) v4 2014-03-29 05:22:27.681293 7f94a9404700 7 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] issue_repop rep_tid 453 o f2203eb8/plana6225762-445/head//161 2014-03-29 05:22:27.681342 7f94a9404700 1 -- 10.214.133.36:6802/1932 --> osd.1 10.214.133.36:6815/23465 -- osd_sub_op(client.4650.0:4039 161.38 f2203eb8/plana6225762-445/head//161 [] v 2186'396 snapset=0=[]:[] snapc=0=[]) v9 -- ?+1401 0x2be0000 2014-03-29 05:22:27.681382 7f94a9404700 15 filestore(/var/lib/ceph/osd/ceph-2) omap_get_values meta/a468ec03/snapmapper/0//-1 2014-03-29 05:22:27.681663 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] append_log log((0'0,2186'394], crt=2127'344) [2186'395 (1840'266) clone f2203 eb8/plana6225762-445/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178],2186'396 (1840'266) modify f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945] 2014-03-29 05:22:27.681713 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'395 (0'0,2186'395] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] add_log_entry 2186'395 (1840'266) clone f2203eb8/plana6225762-4 45/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] 2014-03-29 05:22:27.681746 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] add_log_entry 2186'396 (1840'266) modify f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945 2014-03-29 05:22:27.681768 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] append_log adding 2 keys
later, osd.2 is down and has the latest version '396. the new primary is osd.3, and peers at last_update '394.
later osd.3 comes back up and we merge the replica log:
2014-03-29 05:24:14.080256 7f6cf43b2700 10 osd.3 pg_epoch: 2236 pg[161.38( v 2186'394 lc 1273'21 (0'0,2186'394] local-les=2235 n=27 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2020-2232/11 crt=2186'394 lcod 0'0 mlcod 0'0 inactive m=10] proc_replica_log for osd.(2,255): 161.38( v 218 6'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) log((0'0,2186'396], crt=2127'344) missing(0) 2014-03-29 05:24:14.080305 7f6cf43b2700 10 proc_replica_log for osd.(2,255): 161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) log((0'0,2186'396], crt=2127'344) missing(0) 2014-03-29 05:24:14.080319 7f6cf43b2700 20 merge_log cut point (usually last shared) is 2186'394 (2183'390) modify d0ef05f8/plana6225762-104/snapdir//161 by unknown.0.0:0 2014-03-29 05:22:24.896728 2014-03-29 05:24:14.093867 7f6cf43b2700 10 _merge_object_divergent_entries: merging hoid f2203eb8/plana6225762-445/1dc//161 entries: 2186'395 (1840'266) clone f2203eb8/plana6225762-445/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] 2014-03-29 05:24:14.093897 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 prior_version: 1840'266 first_divergent_update: 2186'395 last_divergent_update: 2186'395 2014-03-29 05:24:14.093906 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 has no more recent entries in log 2014-03-29 05:24:14.093909 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 prior_version or op type indicates creation, deleting 2014-03-29 05:24:14.093911 7f6cf43b2700 10 _merge_object_divergent_entries: merging hoid f2203eb8/plana6225762-445/head//161 entries: 2186'396 (1840'266) modify f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945 2014-03-29 05:24:14.093918 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 prior_version: 1840'266 first_divergent_update: 2186'396 last_divergent_update: 2186'396 2014-03-29 05:24:14.093922 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 has no more recent entries in log 2014-03-29 05:24:14.093924 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 must be rolled back or recovered, attempting to rollback 2014-03-29 05:24:14.093926 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 rolling back 2186'396 (1840'266) modify f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945 2014-03-29 05:24:14.093932 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 rolled back
and the ops get rolled back, which should not happen.
later, the primary tries to pull the old version of the object from osd.2 and that fails:
2014-03-29 05:24:23.501084 7ff5d53d9700 10 osd.2 2240 dequeue_op 0x4a30a50 prio 63 cost 1049576 latency 0.000380 MOSDPGPull(161.38 2240 [PullOp(f2203eb8/plana6225762-445/head//161, recovery_info: ObjectRecoveryInfo(f2203eb8/plana6225762-445/head//161@1840'266, copy_subset: [0~18446744073709551615], clone_subset: {} ), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 pg pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] 2014-03-29 05:24:23.501151 7ff5d53d9700 10 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] can_handle_while_inactive: 0x4a30a50 2014-03-29 05:24:23.501170 7ff5d53d9700 10 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] handle_message: 0x4a30a50 2014-03-29 05:24:23.501238 7ff5d53d9700 10 filestore(/var/lib/ceph/osd/ceph-2) stat 161.38_head/f2203eb8/plana6225762-445/head//161 = 0 (size 3432998) 2014-03-29 05:24:23.501253 7ff5d53d9700 7 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] send_push_op f2203eb8/plana6225762-445/head//161 v 1840'266 size 3432998 recovery _info: ObjectRecoveryInfo(f2203eb8/plana6225762-445/head//161@1840'266, copy_subset: [0~3432998], clone_subset: {}) 2014-03-29 05:24:23.501273 7ff5d53d9700 15 filestore(/var/lib/ceph/osd/ceph-2) omap_get_header 161.38_head/f2203eb8/plana6225762-445/head//161 2014-03-29 05:24:23.501367 7ff5d53d9700 20 filestore lookup_parent: parent 7392 for seq 7547 2014-03-29 05:24:23.501400 7ff5d53d9700 20 filestore lookup_parent: parent seq is 7392 with parent 0 2014-03-29 05:24:23.501429 7ff5d53d9700 15 filestore(/var/lib/ceph/osd/ceph-2) getattrs 161.38_head/f2203eb8/plana6225762-445/head//161 2014-03-29 05:24:23.501504 7ff5d53d9700 20 filestore(/var/lib/ceph/osd/ceph-2) fgetattrs 75 getting '_' 2014-03-29 05:24:23.501520 7ff5d53d9700 20 filestore(/var/lib/ceph/osd/ceph-2) fgetattrs 75 getting 'snapset' 2014-03-29 05:24:23.501601 7ff5d53d9700 10 filestore(/var/lib/ceph/osd/ceph-2) getattrs 161.38_head/f2203eb8/plana6225762-445/head//161 = 0 2014-03-29 05:24:23.501633 7ff5d53d9700 0 log [ERR] : 161.38 push f2203eb8/plana6225762-445/head//161 v 1840'266 failed because local copy is 2186'396 2014-03-29 05:24:23.501647 7ff5d53d9700 20 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] send_pushes: sending push PushOp(f2203eb8/plana6225762-445/head//161, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.(3,255) 2014-03-29 05:24:23.501678 7ff5d53d9700 1 -- 10.214.133.36:6811/5684 --> 10.214.132.9:6804/24034 -- MOSDPGPush(161.38 2240 [PushOp(f2203eb8/plana6225762-445/head//161, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x49c4600 con 0x3d22dc0
primary sees the ENOENT and recovers (by pulling from another replica).
not clear why the rollback didn't mark the ObjectModDesc as can't rollback, given ReplicatedPG::_rollback_to() line 4606.
oh, it's teh ctx->modify = true case where we just clone the head.
#4 Updated by Sage Weil almost 10 years ago
- Status changed from 7 to Resolved