Project

General

Profile

Actions

Bug #8396

closed

osd: message delayed in Session misdirected after split

Added by Tamilarasi muthamizhan almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs:ubuntu@teuthology:/a/teuthology-2014-05-18_02:30:21-rados-master-testing-basic-plana/260734

ceph version 0.80-469-g991f7f1

2014-05-18T11:36:19.886 INFO:teuthology.orchestra.run.plana39.stderr:config set <var> <val> [<val>...]
2014-05-18T11:36:19.886 INFO:teuthology.orchestra.run.plana39.stderr:version
2014-05-18T11:36:19.887 INFO:teuthology.orchestra.run.plana39.stderr:2
2014-05-18T11:36:19.887 INFO:teuthology.orchestra.run.plana39.stderr:config get <var>
2014-05-18T11:36:19.887 INFO:teuthology.orchestra.run.plana39.stderr:0
2014-05-18T11:36:19.888 INFO:teuthology.orchestra.run.plana39.stderr:admin_socket: invalid command
2014-05-18T11:36:19.891 INFO:teuthology.task.thrashosds.ceph_manager:waiting on admin_socket for 3, ['dump_ops_in_flight']
2014-05-18T11:36:20.140 INFO:teuthology.task.ceph.osd.3.plana39.stderr:2014-05-18 11:36:20.139436 7f38ca252780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-05-18T11:36:24.745 INFO:teuthology.task.rados.rados.0.plana59.stderr:3885: Error: oid 432 read returned error code -2
2014-05-18T11:36:24.794 INFO:teuthology.task.rados.rados.0.plana59.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fcf5dffb700 time 2014-05-18 11:36:24.743525
2014-05-18T11:36:24.794 INFO:teuthology.task.rados.rados.0.plana59.stderr:./test/osd/RadosModel.h: 1045: FAILED assert(0)
2014-05-18T11:36:24.833 INFO:teuthology.task.rados.rados.0.plana59.stderr: ceph version 0.80-469-g991f7f1 (991f7f15a6e107b33a24bbef1169f21eb7fcce2c)
2014-05-18T11:36:24.833 INFO:teuthology.task.rados.rados.0.plana59.stderr: 1: (ReadOp::_finish(TestOp::CallbackInfo*)+0x12f) [0x41b03f]
2014-05-18T11:36:24.833 INFO:teuthology.task.rados.rados.0.plana59.stderr: 2: (librados::C_AioComplete::finish(int)+0x1d) [0x7fcf68cb694d]
2014-05-18T11:36:24.833 INFO:teuthology.task.rados.rados.0.plana59.stderr: 3: (Context::complete(int)+0x9) [0x7fcf68c94409]
2014-05-18T11:36:24.834 INFO:teuthology.task.rados.rados.0.plana59.stderr: 4: (Finisher::finisher_thread_entry()+0x1c0) [0x7fcf68d47490]
2014-05-18T11:36:24.834 INFO:teuthology.task.rados.rados.0.plana59.stderr: 5: (()+0x7e9a) [0x7fcf688ebe9a]
2014-05-18T11:36:24.834 INFO:teuthology.task.rados.rados.0.plana59.stderr: 6: (clone()+0x6d) [0x7fcf681023fd]
2014-05-18T11:36:24.834 INFO:teuthology.task.rados.rados.0.plana59.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-18_02:30:21-rados-master-testing-basic-plana/260734
branch: master
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml
  thrashers/pggrow.yaml workloads/readwrite.yaml}
email: null
job_id: '260734'
kernel:
  kdb: true
  sha1: 335cb91ce950ce0e12294af671c64a468d89194c
last_in_suite: false
machine_type: plana
name: teuthology-2014-05-18_02:30:21-rados-master-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject internal delays: 0.002
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
        osd sloppy crc: true
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
  s3tests:
    branch: master
  workunit:
    sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
  - client.0
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.1
suite: rados
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 2
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
teuthology_branch: master
verbose: true


Related issues 3 (0 open3 closed)

Has duplicate Ceph - Bug #8522: ENOENT on read from ceph_test_rados + osd thrashing Duplicate06/03/2014

Actions
Has duplicate Ceph - Bug #8613: ENOENT from ceph_test_radosDuplicate06/16/2014

Actions
Blocks Ceph - Bug #8505: OSD osd/OSD.cc: 6222: FAILED assert(p->second.empty())ResolvedSamuel Just06/02/2014

Actions
Actions #1

Updated by Samuel Just almost 10 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Samuel Just almost 10 years ago

Hmm... op misdirected to the pre-split pgid (3.7 not 3.47)... I suspect the new queuing code.

Actions #3

Updated by Samuel Just almost 10 years ago

It's the waiting_for_pg queue. It used to get reprocessed through the op queue, but now gets directly enqueue'd into the pg queue.

Actions #4

Updated by Samuel Just almost 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just
Actions #5

Updated by Samuel Just almost 10 years ago

  • Status changed from In Progress to 7
Actions #6

Updated by Greg Farnum almost 10 years ago

  • Status changed from 7 to Resolved

Merged into master as of commit: 25466839589813047c975e44e67e14f34e32139e

Actions #7

Updated by Samuel Just almost 10 years ago

  • Status changed from Resolved to 12

reverting

Actions #8

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/a/sage-2014-07-24_11:53:12-rados-master-testing-basic-plana/376677

Actions #9

Updated by Sage Weil over 9 years ago

  • Subject changed from assert in ReadOp::_finish to osd: message delayed in Session misdirected after split
Actions #10

Updated by Sage Weil over 9 years ago

very likely another instance, but i didn't look closely.

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380416$ grep 'oid 419' teuthology.log 
2014-07-27T18:46:45.851 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 419: write initial oid 419
2014-07-27T18:46:46.445 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: update_object_version oid 419 v 65 (ObjNum 418 snap 0 seq_num 418) dirty exists
2014-07-27T18:51:55.132 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 1046: read oid 419 snap -1
2014-07-27T19:01:58.428 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 1835: read oid 419 snap -1
2014-07-27T19:03:08.596 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 1902: read oid 419 snap -1
2014-07-27T19:19:34.867 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: append oid 419 current snap is 0
2014-07-27T19:19:42.713 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: update_object_version oid 419 v 165 (ObjNum 1704 snap 0 seq_num 1704) dirty exists
2014-07-27T19:20:52.849 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 2826: read oid 419 snap -1
2014-07-27T19:26:51.491 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: append oid 419 current snap is 0
2014-07-27T19:26:56.633 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: update_object_version oid 419 v 176 (ObjNum 1904 snap 0 seq_num 1904) dirty exists
2014-07-27T19:39:14.354 INFO:teuthology.task.rados.rados.0.out:[10.214.133.30]: 3452: read oid 419 snap -1
2014-07-27T19:39:39.875 INFO:teuthology.task.rados.rados.0.err:[10.214.133.30]: 3452: Error: oid 419 read returned error code -2
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380416$ grep -A2 desc summary.yaml 
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml
  thrashers/morepggrow.yaml workloads/ec-readwrite.yaml}
duration: 4551.11395406723
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380416$

Actions #11

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392256

Actions #12

Updated by Samuel Just over 9 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF