Project

General

Profile

Actions

Bug #8206

closed

"osd.4 ...[ERR] : 3.14 push" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps

Added by Yuri Weinstein about 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This one was not reproduced on manual re-run.

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-23_20:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/212257/

Error from remotes/*vpm109*/log/ceph-osd.4* :

420353638-2014-04-24 12:37:44.345702 7f0ed18ad700 10 osd.4 1398 do_waiters -- start
420353712-2014-04-24 12:37:44.345706 7f0ed18ad700 10 osd.4 1398 do_waiters -- finish
420353787-2014-04-24 12:37:44.345707 7f0ed18ad700 20 osd.4 1398 _dispatch 0x2702200 MOSDPGPull(3.14 1398 [PullOp(1ad3a2f4/vpm1114148-5/30a//3, recovery_info: ObjectRecoveryInfo(1ad3a2f4/vpm1114148-5/30a//3@1348'1682, copy_subset: [0~55,593869~584885,1889996~303873], clone_subset: {1ad3a2f4/vpm1114148-5/2d8//3=[55~593814,1178754~711242]}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v1
420354256-2014-04-24 12:37:44.345761 7f0ed18ad700 10 osd.4 1398 handle_replica_op MOSDPGPull(3.14 1398 [PullOp(1ad3a2f4/vpm1114148-5/30a//3, recovery_info: ObjectRecoveryInfo(1ad3a2f4/vpm1114148-5/30a//3@1348'1682, copy_subset: [0~55,593869~584885,1889996~303873], clone_subset: {1ad3a2f4/vpm1114148-5/2d8//3=[55~593814,1178754~711242]}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v1 epoch 1398
420354734-2014-04-24 12:37:44.345769 7f0ed18ad700 15 osd.4 1398 require_same_or_newer_map 1398 (i am 1398) 0x2702200
420354841-2014-04-24 12:37:44.345773 7f0ed18ad700 20 osd.4 1398 _share_map_incoming osd.3 10.214.139.33:6809/4325 1398
420354950-2014-04-24 12:37:44.345781 7f0ed18ad700 15 osd.4 1398 enqueue_op 0x31ed870 prio 10 cost 1049576 latency 0.000200 MOSDPGPull(3.14 1398 [PullOp(1ad3a2f4/vpm1114148-5/30a//3, recovery_info: ObjectRecoveryInfo(1ad3a2f4/vpm1114148-5/30a//3@1348'1682, copy_subset: [0~55,593869~584885,1889996~303873], clone_subset: {1ad3a2f4/vpm1114148-5/2d8//3=[55~593814,1178754~711242]}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v1
420355458-2014-04-24 12:37:44.345801 7f0ed18ad700 10 osd.4 1398 do_waiters -- start
420355532-2014-04-24 12:37:44.345803 7f0ed18ad700 10 osd.4 1398 do_waiters -- finish
420355607-2014-04-24 12:37:44.345821 7f0ecb8a1700 10 osd.4 1398 dequeue_op 0x31ed870 prio 10 cost 1049576 latency 0.000239 MOSDPGPull(3.14 1398 [PullOp(1ad3a2f4/vpm1114148-5/30a//3, recovery_info: ObjectRecoveryInfo(1ad3a2f4/vpm1114148-5/30a//3@1348'1682, copy_subset: [0~55,593869~584885,1889996~303873], clone_subset: {1ad3a2f4/vpm1114148-5/2d8//3=[55~593814,1178754~711242]}), recovery_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v1 pg pg[3.14( v 1347'1681 (0'0,1347'1681] local-les=1289 n=12 ec=9 les/c 1289/1325 1364/1364/1351) [3,2] r=-1 lpr=1364 pi=1222-1363/4 crt=1287'1635 lcod 0'0 inactive NOTIFY]
420356287-2014-04-24 12:37:44.345852 7f0ecb8a1700 10 osd.4 pg_epoch: 1398 pg[3.14( v 1347'1681 (0'0,1347'1681] local-les=1289 n=12 ec=9 les/c 1289/1325 1364/1364/1351) [3,2] r=-1 lpr=1364 pi=1222-1363/4 crt=1287'1635 lcod 0'0 inactive NOTIFY] can_handle_while_inactive: 0x31ed870
420356557-2014-04-24 12:37:44.345863 7f0ecb8a1700 10 osd.4 pg_epoch: 1398 pg[3.14( v 1347'1681 (0'0,1347'1681] local-les=1289 n=12 ec=9 les/c 1289/1325 1364/1364/1351) [3,2] r=-1 lpr=1364 pi=1222-1363/4 crt=1287'1635 lcod 0'0 inactive NOTIFY] handle_message: 0x31ed870
420356816-2014-04-24 12:37:44.345916 7f0ecb8a1700 10 filestore(/var/lib/ceph/osd/ceph-4) stat 3.14_head/1ad3a2f4/vpm1114148-5/30a//3 = 0 (size 2193869)
420356958-2014-04-24 12:37:44.345921 7f0ecb8a1700  7 osd.4 pg_epoch: 1398 pg[3.14( v 1347'1681 (0'0,1347'1681] local-les=1289 n=12 ec=9 les/c 1289/1325 1364/1364/1351) [3,2] r=-1 lpr=1364 pi=1222-1363/4 crt=1287'1635 lcod 0'0 inactive NOTIFY] send_push_op 1ad3a2f4/vpm1114148-5/30a//3 v 1348'1682 size 0 recovery_info: ObjectRecoveryInfo(1ad3a2f4/vpm1114148-5/30a//3@1348'1682, copy_subset: [0~55,593869~584885,1889996~303873], clone_subset: {1ad3a2f4/vpm1114148-5/2d8//3=[55~593814,1178754~711242]})
420357449-2014-04-24 12:37:44.345934 7f0ecb8a1700 15 filestore(/var/lib/ceph/osd/ceph-4) omap_get_header 3.14_head/1ad3a2f4/vpm1114148-5/30a//3
420357583-2014-04-24 12:37:44.346010 7f0ecb8a1700 20 filestore lookup_parent: parent 1252 for seq 1294
420357676-2014-04-24 12:37:44.346026 7f0ecb8a1700 20 filestore lookup_parent: parent seq is 1252 with parent 0
420357777-2014-04-24 12:37:44.346042 7f0ecb8a1700 15 filestore(/var/lib/ceph/osd/ceph-4) getattrs 3.14_head/1ad3a2f4/vpm1114148-5/30a//3
420357904-2014-04-24 12:37:44.346148 7f0ecb8a1700 20 filestore(/var/lib/ceph/osd/ceph-4) fgetattrs 140 getting '_'
420358009-2014-04-24 12:37:44.346206 7f0ecb8a1700 10 filestore(/var/lib/ceph/osd/ceph-4) getattrs 3.14_head/1ad3a2f4/vpm1114148-5/30a//3 = 0
420358140:2014-04-24 12:37:44.346222 7f0ecb8a1700  0 log [ERR] : 3.14 push 1ad3a2f4/vpm1114148-5/30a//3 v 1348'1682 failed because local copy is 1345'1674
420358285-2014-04-24 12:37:44.346231 7f0ecb8a1700 20 osd.4 pg_epoch: 1398 pg[3.14( v 1347'1681 (0'0,1347'1681] local-les=1289 n=12 ec=9 les/c 1289/1325 1364/1364/1351) [3,2] r=-1 lpr=1364 pi=1222-1363/4 crt=1287'1635 lcod 0'0 inactive NOTIFY] send_pushes: sending push PushOp(1ad3a2f4/vpm1114148-5/30a//3, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.3
420359042-2014-04-24 12:37:44.346248 7f0ecb8a1700  1 -- 10.214.139.33:6807/8713 --> 10.214.139.33:6809/4325 -- MOSDPGPush(3.14 1398 [PushOp(1ad3a2f4/vpm1114148-5/30a//3, version: 0'0, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(0//0//-1@0'0, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x4107a00 con 0x28de000
420359690-2014-04-24 12:37:44.346263 7f0ecb8a1700 10 osd.4 1398 dequeue_op 0x31ed870 finish
420359772-2014-04-24 12:37:44.386633 7f0ecf0a8700  1 -- 10.214.139.33:6813/8713 <== osd.1 10.214.138.183:0/25219 8 ==== osd_ping(ping e1398 stamp 2014-04-24 12:37:44.384702) v2 ==== 47+0+0 (1351087799 0 0) 0x2625380 con 0x40b8160
420359992-2014-04-24 12:37:44.386669 7f0ecf0a8700  1 -- 10.214.139.33:6813/8713 --> 10.214.138.183:0/25219 -- osd_ping(ping_reply e1398 stamp 2014-04-24 12:37:44.384702) v2 -- ?+0 0x3246700 con 0x40b8160
420360186-2014-04-24 12:37:44.386682 7f0ecf0a8700 20 osd.4 1398 _share_map_outgoing 0x3f31dc0 already has epoch 1398
420360293-2014-04-24 12:37:44.386704 7f0ed00aa700  1 -- 10.214.139.33:6815/8713 <== osd.1 10.214.138.183:0/25219 8 ==== osd_ping(ping e1398 stamp 2014-04-24 12:37:44.384702) v2 ==== 47+0+0 (1351087799 0 0) 0x2cb6e00 con 0x40b8000
420360513-2014-04-24 12:37:44.386716 7f0ed00aa700  1 -- 10.214.139.33:6815/8713 --> 10.214.138.183:0/25219 -- osd_ping(ping_reply e1398 stamp 2014-04-24 12:37:44.384702) v2 -- ?+0 0x412b1c0 con 0x40b8000
420360707-2014-04-24 12:37:44.386729 7f0ed00aa700 20 osd.4 1398 _share_map_outgoing 0x3f31dc0 already has epoch 1398
420360814-2014-04-24 12:37:44.445035 7f0ec989d700 20 osd.4 1398 update_osd_stat osd_stat(318 MB used, 199 GB avail, 199 GB total, peers [0,1,2,3,5]/[] op hist [])
420360967-2014-04-24 12:37:44.445137 7f0ec989d700  5 osd.4 1398 heartbeat: osd_stat(318 MB used, 199 GB avail, 199 GB total, peers [0,1,2,3,5]/[] op hist [])
420361115-2014-04-24 12:37:44.445149 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6807/28671 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x3246e00 con 0x3eb1c60
420361303-2014-04-24 12:37:44.445176 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6808/28671 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x3246c40 con 0x3eb1dc0
420361491-2014-04-24 12:37:44.445208 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6802/25219 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x3246a80 con 0x411d160
420361679-2014-04-24 12:37:44.445225 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6803/25219 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32468c0 con 0x411d000
420361867-2014-04-24 12:37:44.445236 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6819/5554 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32d2e00 con 0x411d9a0
420362054-2014-04-24 12:37:44.445246 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.138.183:6820/5554 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32d2c40 con 0x411d840
420362241-2014-04-24 12:37:44.445255 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.139.33:6810/4325 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32d2a80 con 0x411d580
420362427-2014-04-24 12:37:44.445288 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.139.33:6811/4325 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32d28c0 con 0x411d420
420362613-2014-04-24 12:37:44.445303 7f0ec989d700  1 -- 10.214.139.33:0/8713 --> 10.214.139.33:6802/8592 -- osd_ping(ping e1398 stamp 2014-04-24 12:37:44.445142) v2 -- ?+0 0x32d2700 con 0x411d2c0

archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-23_20:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/212257
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
email: null
job_id: '212257'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-23_20:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  s3tests:
    branch: master
  workunit:
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm109.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6LUuoGtMT0sV29Ffm20LDDiZgdfMIu8YJutYOaMTR+adHkyJbReRrHStrYf4t5MZ636DMbJKkoUgXWlnT1ViBlV39nYOXu9XW3RCam79boWZugzXDy29pBN3k7HhKRTCvOeF0buPghX3LsmlxIilz2iLgO7RxU7YODVdrqmyyhKSsJ+RREkwngg34t+7LzRJUnBWDg5l/7pxmArMoQYGUblEhX3LUbQ6BqclIPl+b03Y0DdkzJm8mpbGnW49KeW3MEnC08PgdSqbUIKPsNLs15YyvUcAfhhjXqgoFTIR86hV8TcpgUI9trcYear5doCsDvBuwoXo97b8KGdjmSzNR
  ubuntu@vpm111.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC12QTe4w76NOlm4nWWEbXEU890wJocROYIVdo2wsioFjWdOyJD6B9SXTdIF7SWr4jopOOl7zAb0GthcV3V7GVEMoKRl192JdOkqhtwpLraW7J7fMlCz8Ka4m8G17wliBNa8if9Xy5c0Iqd13lmH6xFzTBRADmxPd1k29zg1j7N4PipD11+7v688Fs9HWSUfV79PkZ9jS/LYW2DVWm4p0RSprjBG2z7XkdRg3aB4U2I6x+vdnyHbFFoR7xHJZS0Q4q/PGQWEdvNBYBaHwdJLrPloXVcxp+OwslWVAR5JCDGMWg429pw4k+Wt0zzvRPc/bBqOlaTZ4lrzKTEWef6XyrH
  ubuntu@vpm115.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbdcKXgCVdiyx6KXkXWDHWeHmhOtc0R0/wkHo6m3qrlwqiw618NQdIc3cK3Pmsn8P/DzZSQfKg/wLfxnW1aQncUZVEnTOZylqmEbgX+qEAJ1UliefVB9wnjzgs4Ey/siWGCnTBcEj5Cf0RfFje2TrCqfT5/vn51Y8ff+/EGiIzuz9BBhMuc07nntO7ZMJPqxADary4MBu715cdbVXfv3O89jebOmX/3QYuekL/Pv02rpeFMCO9OdanTnySC6Eu9eNB4Uv5OUwdyHHfQ6NbwyWGUatYygtCyiE9cF2iSIj4Hji3H1KGrWyK3hxrXuUrNp7fHNrdYMj9s5QLeI9XUXAB
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30588
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
duration: 8927.462427139282
failure_reason: '"2014-04-24 12:37:44.346224 osd.4 10.214.139.33:6804/8713 3 : [ERR]
  3.14 push 1ad3a2f4/vpm1114148-5/30a//3 v 1348''1682 failed because local copy is
  1345''1674" in cluster log'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8162: osd: dumpling advances last_backfill prematurelyResolvedSamuel Just04/19/2014

Actions
Actions #1

Updated by Samuel Just almost 10 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF