Project

General

Profile

Bug #7907

osd: rollback to head didn't mark_unrollbackable

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-28_22:35:22-upgrade:dumpling-x:stress-split-firefly---basic-plana/152001/

 [ERR] 161.38 push f2203eb8/plana6225762-445/head//161
archive_path: /var/lib/teuthworker/archive/teuthology-2014-03-28_22:35:22-upgrade:dumpling-x:stress-split-firefly---basic-plana/152001
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-import-export.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
email: null
job_id: '152001'
last_in_suite: false
machine_type: plana
name: teuthology-2014-03-28_22:35:22-upgrade:dumpling-x:stress-split-firefly---basic-plana
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8
  s3tests:
    branch: master
  workunit:
    sha1: 780fa91ce7aa3aba6bc2605fa034db70225282a8
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@plana62.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSC3e8HvnLeaoPWNNoKjTmvf5ebfvXKGCgTNCjPeI9Qtl9pnLXo5XniCH6tTFWpYMQLtesPysgoQxCHEOEzui/YHwXfq3JPWIPMlANLNBH1e6RRqjHC+mIzSNr0nP295AUFZycyAIxRDRUWwz8IWaPtEjqohd+oojP/uBklTaHyDEZ84nEabMXy1giwPwjQeEYrTNBtuTDTkOZryrrP6XxoBEKK/P3ZyD54ouSdUb2K3XVzVqrNdf4+pUh6TuctCPpbmewEuW4tygAPZhaNmHujMkhT7DsfIcTFR/dDig0ubYlGExyiOsrXF1eRHLKGPMYl6Y7ZFTLOOkD1OBVeJ8/
  ubuntu@plana69.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCnoYSVTeXHpYqSOxvnFLq3x0YBY1ZYRCnJPWGlyYubMHT60crIyLLB9hc1rEbWXRtkIj4hpKNiVTcoFC1UOpJrDqfJmY7XfC5SYVeJMLKlnMprkaCgC/z/4OfavzP7zZAkShnvLrKj5ZQ2C2Jni2Ox6Y3tG/DgkRH7nyqZF54VaoJnatPYcvoh43f9m2yRtPiqytn7B1umPsoWwcLlprmbYFmJpphaZBc1IvYUTANTbi7ItYLN6stj3RQw5uZHeE64u/buVUgubPSKsQi31QVEAPdbbHS1v+5WUX6KX18Tgs5vV9M+g0+1wKgWe5Hy4OrPrnLv5r378+auy+ulTkG1
  ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUaRgJmTQ9QH0nmaEOyIo3FlyrkJEZseFyHVdtziq4VgksZNdWf9ebVaQPjBfuk94VHrKvPN5udeXU5/DIvTGOYFUdK0QbZ8RdzyvMQiTxnEVxmPk4xcqhxktymFQc4BpyNa+/xgTXhCvG6rLCRucNfwMbgnMqpGlkUZOIa7MdHJTRf0kY0gBGwNF0gRExU/4Ub0TAc++0psIK/P5EwbJRgozdzDDOQo13T4LtEOswmbPXYWPLdZivWs4byKpvPP5XJPB6qRLUpgKfKfL9g6waaOVOf6RIO6TnWlYnIBkLyxi4NfY0pWWI4tvAkAYVCz8foUsSzLLMR+zhaUDAHusX
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 120
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.16943
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-import-export.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
duration: 6608.918809175491
failure_reason: '"2014-03-29 05:24:23.501635 osd.2 10.214.133.36:6810/5684 1 : [ERR]
  161.38 push f2203eb8/plana6225762-445/head//161 v 1840''266 failed because local
  copy is 2186''396" in cluster log'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Associated revisions

Revision d4d39a01 (diff)
Added by Sage Weil almost 10 years ago

osd/ReplicatedPG: mark_unrollbackable when _rollback_to head

We fell into the case in _rollback_to where we just set ctx->modify = true
and don't explicitly mark the ctx and unrollbackable. Later, we screw up
in proc_replica_log as a result because we think we can rollback this
update to the head when in reality we cannot.

Fixes: #7907
Signed-off-by: Sage Weil <>

History

#1 Updated by Ian Colle almost 10 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Sage Weil almost 10 years ago

  • Assignee set to Sage Weil

#3 Updated by Sage Weil almost 10 years ago

  • Subject changed from "failed because local copy is 2186''396" in cluster log" in upgrade:dumpling-x:stress-split-firefly---basic-plana to osd: rollback to head didn't mark_unrollbackable
  • Status changed from New to 7
  • Assignee changed from Sage Weil to Samuel Just
  • Source changed from other to Q/A

the original modify:

2014-03-29 05:22:27.680866 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] populate_obc_watchers f2203eb8/plana6225762-445/head//161
2014-03-29 05:22:27.680881 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] ReplicatedPG::check_blacklisted_obc_watchers for obc f2203eb8/plana6225762-445/h
ead//161
2014-03-29 05:22:27.680893 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] get_object_context 0x5493000 f2203eb8/plana6225762-445/head//161 rwstate(none n=
0 w=0) oi:f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185) 0 -> 1 read f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185)
2014-03-29 05:22:27.680914 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context f2203eb8/plana6225762-445/head//161 @head oi=f2203eb8/plana6
225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185)
2014-03-29 05:22:27.680951 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] execute_ctx 0x4e9d100
2014-03-29 05:22:27.680978 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_op f2203eb8/plana6225762-445/head//161 [rollback 1ad] ov 1840'266 av 2186'395
 snapc 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] snapset aa=[aa,a9,a8,a7,9e,78]:[]+head
2014-03-29 05:22:27.680997 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op f2203eb8/plana6225762-445/head//161 [rollback 1ad]
2014-03-29 05:22:27.681009 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op  rollback 1ad
2014-03-29 05:22:27.681021 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] _rollback_to f2203eb8/plana6225762-445/head//161 snapid 1ad
2014-03-29 05:22:27.681035 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context f2203eb8/plana6225762-445/1ad//161 @1ad snapset aa=[aa,a9,a8
,a7,9e,78]:[]+head
2014-03-29 05:22:27.681050 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] get_object_context 0x5493000 f2203eb8/plana6225762-445/head//161 rwstate(write n
=1 w=0) oi:f2203eb8/plana6225762-445/head//161(1840'266 client.4650.0:2296 wrlock_by=unknown.0.0:0 dirty s 3432998 uv185)
2014-03-29 05:22:27.681072 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] find_object_context  f2203eb8/plana6225762-445/head//161 want 1ad > snapset seq 
aa -- HIT f2203eb8/plana6225762-445/head//161
2014-03-29 05:22:27.681087 7f94a9404700 15 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] do_osd_op_effects on session 0x3833e00
2014-03-29 05:22:27.681100 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] make_writeable f2203eb8/plana6225762-445/head//161 snapset=0x37fd1f8  snapc=1dc=
[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]
2014-03-29 05:22:27.681146 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean]  cloning v 1840'266 to f2203eb8/plana6225762-445/1dc//161 v 2186'395 snaps=[1dc,
1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]
2014-03-29 05:22:27.681173 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] make_writeable f2203eb8/plana6225762-445/head//161 done, snapset=1dc=[1dc,1da,1d
8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]:[1dc]+head
2014-03-29 05:22:27.681188 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] finish_ctx f2203eb8/plana6225762-445/head//161 0x4e9d100 op modify  
2014-03-29 05:22:27.681204 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean]  set mtime to 2014-03-29 05:22:27.679945
2014-03-29 05:22:27.681225 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=27 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean]  final snapset 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,
19d,19a,194,178]:[1dc]+head in f2203eb8/plana6225762-445/head//161
2014-03-29 05:22:27.681247 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean]  zeroing write result code 0
2014-03-29 05:22:27.681261 7f94a9404700 20 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean]  op order client.4650 tid 4039 (first)
2014-03-29 05:22:27.681273 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] new_repop rep_tid 453 on osd_op(client.4650.0:4039 plana6225762-445 [rollback 1a
d] 161.f2203eb8 snapc 1dc=[1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178] ack+write e2186) v4
2014-03-29 05:22:27.681293 7f94a9404700  7 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] issue_repop rep_tid 453 o f2203eb8/plana6225762-445/head//161
2014-03-29 05:22:27.681342 7f94a9404700  1 -- 10.214.133.36:6802/1932 --> osd.1 10.214.133.36:6815/23465 -- osd_sub_op(client.4650.0:4039 161.38 f2203eb8/plana6225762-445/head//161 [] v 2186'396 snapset=0=[]:[] snapc=0=[]) v9 -- ?+1401 0x2be0000
2014-03-29 05:22:27.681382 7f94a9404700 15 filestore(/var/lib/ceph/osd/ceph-2) omap_get_values meta/a468ec03/snapmapper/0//-1
2014-03-29 05:22:27.681663 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'394 (0'0,2186'394] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] append_log log((0'0,2186'394], crt=2127'344) [2186'395 (1840'266) clone    f2203
eb8/plana6225762-445/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178],2186'396 (1840'266) modify   f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945]
2014-03-29 05:22:27.681713 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'395 (0'0,2186'395] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] add_log_entry 2186'395 (1840'266) clone    f2203eb8/plana6225762-4
45/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]
2014-03-29 05:22:27.681746 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] add_log_entry 2186'396 (1840'266) modify   f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945
2014-03-29 05:22:27.681768 7f94a9404700 10 osd.2 pg_epoch: 2186 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2137/2137/2137) [2,1] r=0 lpr=2137 luod=2186'394 crt=2127'344 lcod 2183'392 mlcod 2183'392 active+clean] append_log  adding 2 keys

later, osd.2 is down and has the latest version '396. the new primary is osd.3, and peers at last_update '394.

later osd.3 comes back up and we merge the replica log:

2014-03-29 05:24:14.080256 7f6cf43b2700 10 osd.3 pg_epoch: 2236 pg[161.38( v 2186'394 lc 1273'21 (0'0,2186'394] local-les=2235 n=27 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2020-2232/11 crt=2186'394 lcod 0'0 mlcod 0'0 inactive m=10] proc_replica_log for osd.(2,255): 161.38( v 218
6'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) log((0'0,2186'396], crt=2127'344) missing(0)
2014-03-29 05:24:14.080305 7f6cf43b2700 10 proc_replica_log for osd.(2,255): 161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) log((0'0,2186'396], crt=2127'344) missing(0)
2014-03-29 05:24:14.080319 7f6cf43b2700 20 merge_log cut point (usually last shared) is 2186'394 (2183'390) modify   d0ef05f8/plana6225762-104/snapdir//161 by unknown.0.0:0 2014-03-29 05:22:24.896728
2014-03-29 05:24:14.093867 7f6cf43b2700 10 _merge_object_divergent_entries: merging hoid f2203eb8/plana6225762-445/1dc//161 entries: 2186'395 (1840'266) clone    f2203eb8/plana6225762-445/1dc//161 by unknown.0.0:0 2014-03-29 05:06:33.366515 snaps [1dc,1da,1d8,1d7,1ce,1cd,1cc,1ca,1c7,1bd,1b2,1af,1ad,1ac,1a6,19d,19a,194,178]
2014-03-29 05:24:14.093897 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 prior_version: 1840'266 first_divergent_update: 2186'395 last_divergent_update: 2186'395
2014-03-29 05:24:14.093906 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 has no more recent entries in log
2014-03-29 05:24:14.093909 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/1dc//161 prior_version or op type indicates creation, deleting
2014-03-29 05:24:14.093911 7f6cf43b2700 10 _merge_object_divergent_entries: merging hoid f2203eb8/plana6225762-445/head//161 entries: 2186'396 (1840'266) modify   f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945
2014-03-29 05:24:14.093918 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 prior_version: 1840'266 first_divergent_update: 2186'396 last_divergent_update: 2186'396
2014-03-29 05:24:14.093922 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 has no more recent entries in log
2014-03-29 05:24:14.093924 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 must be rolled back or recovered, attempting to rollback
2014-03-29 05:24:14.093926 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 rolling back 2186'396 (1840'266) modify   f2203eb8/plana6225762-445/head//161 by client.4650.0:4039 2014-03-29 05:22:27.679945
2014-03-29 05:24:14.093932 7f6cf43b2700 10 _merge_object_divergent_entries: hoid f2203eb8/plana6225762-445/head//161 rolled back

and the ops get rolled back, which should not happen.

later, the primary tries to pull the old version of the object from osd.2 and that fails:

2014-03-29 05:24:23.501084 7ff5d53d9700 10 osd.2 2240 dequeue_op 0x4a30a50 prio 63 cost 1049576 latency 0.000380 MOSDPGPull(161.38 2240 [PullOp(f2203eb8/plana6225762-445/head//161, recovery_info: ObjectRecoveryInfo(f2203eb8/plana6225762-445/head//161@1840'266, copy_subset: [0~18446744073709551615], clone_subset: {}
), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 pg pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY]
2014-03-29 05:24:23.501151 7ff5d53d9700 10 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] can_handle_while_inactive: 0x4a30a50
2014-03-29 05:24:23.501170 7ff5d53d9700 10 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] handle_message: 0x4a30a50
2014-03-29 05:24:23.501238 7ff5d53d9700 10 filestore(/var/lib/ceph/osd/ceph-2) stat 161.38_head/f2203eb8/plana6225762-445/head//161 = 0 (size 3432998)
2014-03-29 05:24:23.501253 7ff5d53d9700  7 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] send_push_op f2203eb8/plana6225762-445/head//161 v 1840'266 size 3432998 recovery
_info: ObjectRecoveryInfo(f2203eb8/plana6225762-445/head//161@1840'266, copy_subset: [0~3432998], clone_subset: {})
2014-03-29 05:24:23.501273 7ff5d53d9700 15 filestore(/var/lib/ceph/osd/ceph-2) omap_get_header 161.38_head/f2203eb8/plana6225762-445/head//161
2014-03-29 05:24:23.501367 7ff5d53d9700 20 filestore lookup_parent: parent 7392 for seq 7547
2014-03-29 05:24:23.501400 7ff5d53d9700 20 filestore lookup_parent: parent seq is 7392 with parent 0
2014-03-29 05:24:23.501429 7ff5d53d9700 15 filestore(/var/lib/ceph/osd/ceph-2) getattrs 161.38_head/f2203eb8/plana6225762-445/head//161
2014-03-29 05:24:23.501504 7ff5d53d9700 20 filestore(/var/lib/ceph/osd/ceph-2) fgetattrs 75 getting '_'
2014-03-29 05:24:23.501520 7ff5d53d9700 20 filestore(/var/lib/ceph/osd/ceph-2) fgetattrs 75 getting 'snapset'
2014-03-29 05:24:23.501601 7ff5d53d9700 10 filestore(/var/lib/ceph/osd/ceph-2) getattrs 161.38_head/f2203eb8/plana6225762-445/head//161 = 0
2014-03-29 05:24:23.501633 7ff5d53d9700  0 log [ERR] : 161.38 push f2203eb8/plana6225762-445/head//161 v 1840'266 failed because local copy is 2186'396
2014-03-29 05:24:23.501647 7ff5d53d9700 20 osd.2 pg_epoch: 2240 pg[161.38( v 2186'396 (0'0,2186'396] local-les=2138 n=28 ec=1260 les/c 2138/2139 2231/2233/2218) [3,5] r=-1 lpr=2233 pi=2137-2232/8 crt=2127'344 lcod 0'0 inactive NOTIFY] send_pushes: sending push PushOp(f2203eb8/plana6225762-445/head//161, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.(3,255)
2014-03-29 05:24:23.501678 7ff5d53d9700  1 -- 10.214.133.36:6811/5684 --> 10.214.132.9:6804/24034 -- MOSDPGPush(161.38 2240 [PushOp(f2203eb8/plana6225762-445/head//161, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x49c4600 con 0x3d22dc0

primary sees the ENOENT and recovers (by pulling from another replica).

not clear why the rollback didn't mark the ObjectModDesc as can't rollback, given ReplicatedPG::_rollback_to() line 4606.

oh, it's teh ctx->modify = true case where we just clone the head.

#4 Updated by Sage Weil almost 10 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF