Project

General

Profile

Actions

Bug #6207

closed

Found incorrect object contents

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: Found incorrect object contents ObjNum: 148 snap: 45 seqnum: 148 prefix: plana3710235-OID: 45 snap 45
2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: , expected ObjNum: 326 snap: 107 seqnum: 326 prefix: plana3710235-OID: 45 snap 107
2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  oid 45
2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: Object 45 contents ObjNum: 148 snap: 45 seqnum: 148 prefix: plana3710235-OID: 45 snap 45
2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  corrupt
2013-09-01T21:45:47.962 INFO:teuthology.task.rados.rados.0.out:[10.214.131.3]: incorrect buffer at pos 4
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: ./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f0facd77700 time 2013-09-01 21:46:00.402859
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: ./test/osd/RadosModel.h: 1053: FAILED assert(0)
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  ceph version 0.67.2-22-ga708c8a (a708c8ab52e5b1476405a1f817c23b8845fbaab3)
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  1: (ReadOp::_finish(TestOp::CallbackInfo*)+0x1421) [0x414ef1]
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  2: (librados::C_AioComplete::finish(int)+0x1d) [0x7f0fb2b08a4d]
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  3: (Context::complete(int)+0xa) [0x7f0fb2ae94fa]
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  4: (Finisher::finisher_thread_entry()+0x1c0) [0x7f0fb2b8cd10]
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  5: (()+0x7e9a) [0x7f0fb276ae9a]
2013-09-01T21:45:47.963 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  6: (clone()+0x6d) [0x7f0fb1f81ccd]
2013-09-01T21:45:47.964 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-09-01T21:45:47.964 INFO:teuthology.task.rados.rados.0.err:[10.214.131.3]: terminate called after throwing an instance of 'ceph::FailedAssertion'
ubuntu@teuthology:/a/teuthology-2013-09-01_20:00:14-rados-dumpling-testing-basic-plana/16572$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 263cbbcaf605e359a46e30889595d82629f82080
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject internal delays: 0.002
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon min osdmap epochs: 2
      osd:
        osd map cache size: 1
    fs: ext4
    log-whitelist:
    - slow request
    sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
  s3tests:
    branch: master
  workunit:
    sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    chance_test_map_discontinuity: 0.5
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: dumpling
Actions #1

Updated by Ian Colle over 10 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Sage Weil over 10 years ago

full logs here: ubuntu@teuthology:/var/lib/teuthworker/archive/sage-bug-6222-b/20456

Actions #3

Updated by Samuel Just over 10 years ago

The problem offset appears not to have changed since the prior successful read, and kern.log had:

2013-09-04T08:57:47.637749-07:00 plana40 kernel: [16102.849264] device fsid 09558d82-c22f-4b2d-94c2-03dd7b0fc4c2 devid 1 transid 7 /dev/sdd
2013-09-04T08:57:47.657682-07:00 plana40 kernel: [16102.871685] btrfs flagging fs with big metadata feature
2013-09-04T08:57:47.887731-07:00 plana40 kernel: [16103.098433] device fsid e568d361-7658-4543-8122-f60136cd4955 devid 1 transid 7 /dev/sdb
2013-09-04T08:57:47.897670-07:00 plana40 kernel: [16103.112006] btrfs flagging fs with big metadata feature
2013-09-04T08:57:48.127733-07:00 plana40 kernel: [16103.334890] device fsid 6336ab6e-c306-44a5-9379-7ee01160d270 devid 1 transid 7 /dev/sdc
2013-09-04T08:57:48.137675-07:00 plana40 kernel: [16103.348147] btrfs flagging fs with big metadata feature
2013-09-04T08:57:48.922264-07:00 plana40 kernel: [16103.934210]
2013-09-04T08:57:48.922280-07:00 plana40 kernel: [16103.935720] =====================================
2013-09-04T08:57:48.922331-07:00 plana40 kernel: [16103.940439] [ BUG: bad unlock balance detected! ]
2013-09-04T08:57:48.922333-07:00 plana40 kernel: [16103.945159] 3.11.0-rc5-ceph-00074-g263cbbc #1 Not tainted
2013-09-04T08:57:48.922335-07:00 plana40 kernel: [16103.950574] -------------------------------------
2013-09-04T08:57:48.922337-07:00 plana40 kernel: [16103.955294] ceph-osd/29077 is trying to release lock (sb_internal) at:
2013-09-04T08:57:48.922340-07:00 plana40 kernel: [16103.961885] [<ffffffffa012efd8>] btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs]
2013-09-04T08:57:48.922342-07:00 plana40 kernel: [16103.969574] but there are no more locks to release!
2013-09-04T08:57:48.922343-07:00 plana40 kernel: [16103.974469]
2013-09-04T08:57:48.922347-07:00 plana40 kernel: [16103.974469] other info that might help us debug this:
2013-09-04T08:57:48.922350-07:00 plana40 kernel: [16103.981030] no locks held by ceph-osd/29077.
2013-09-04T08:57:48.922367-07:00 plana40 kernel: [16103.985316]
2013-09-04T08:57:48.922369-07:00 plana40 kernel: [16103.985316] stack backtrace:
2013-09-04T08:57:48.922371-07:00 plana40 kernel: [16103.989704] CPU: 1 PID: 29077 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00074-g263cbbc #1
2013-09-04T08:57:48.922373-07:00 plana40 kernel: [16103.998002] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
2013-09-04T08:57:48.922375-07:00 plana40 kernel: [16104.005519] ffffffffa012efd8 ffff8802247e1c68 ffffffff81642d85 0000000000000007
2013-09-04T08:57:48.922377-07:00 plana40 kernel: [16104.012989] ffff88000732bf20 ffff8802247e1c98 ffffffff810ab89e 0000000000000050
2013-09-04T08:57:48.922381-07:00 plana40 kernel: [16104.021586] 0000000000000246 ffff88017485edd0 ffffffffa012efd8 ffff8802247e1ce8
2013-09-04T08:57:48.922383-07:00 plana40 kernel: [16104.029059] Call Trace:
2013-09-04T08:57:48.922386-07:00 plana40 kernel: [16104.031535] [<ffffffffa012efd8>] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs]
2013-09-04T08:57:48.922388-07:00 plana40 kernel: [16104.039491] [<ffffffff81642d85>] dump_stack+0x46/0x58
2013-09-04T08:57:48.922390-07:00 plana40 kernel: [16104.044651] [<ffffffff810ab89e>] print_unlock_imbalance_bug+0xfe/0x110
2013-09-04T08:57:48.922393-07:00 plana40 kernel: [16104.051300] [<ffffffffa012efd8>] ? btrfs_commit_transaction_async+0x1c8/0x2c0 [btrfs]
2013-09-04T08:57:48.922395-07:00 plana40 kernel: [16104.059253] [<ffffffff810aeafe>] lock_release+0x15e/0x220
2013-09-04T08:57:48.922397-07:00 plana40 kernel: [16104.064771] [<ffffffffa012efff>] btrfs_commit_transaction_async+0x1ef/0x2c0 [btrfs]
2013-09-04T08:57:48.922402-07:00 plana40 kernel: [16104.072552] [<ffffffff810af6f5>] ? trace_hardirqs_on_caller+0x105/0x1d0
2013-09-04T08:57:48.922404-07:00 plana40 kernel: [16104.079289] [<ffffffffa015d907>] btrfs_ioctl_start_sync+0x47/0xc0 [btrfs]
2013-09-04T08:57:48.922406-07:00 plana40 kernel: [16104.086201] [<ffffffffa0162575>] btrfs_ioctl+0xe55/0x1af0 [btrfs]
2013-09-04T08:57:48.922409-07:00 plana40 kernel: [16104.092404] [<ffffffff8164ab2b>] ? _raw_spin_unlock+0x2b/0x40
2013-09-04T08:57:48.922411-07:00 plana40 kernel: [16104.098256] [<ffffffff810b40b5>] ? do_futex+0xa45/0xbb0
2013-09-04T08:57:48.922413-07:00 plana40 kernel: [16104.103590] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
2013-09-04T08:57:48.922415-07:00 plana40 kernel: [16104.109008] [<ffffffff811922a6>] do_vfs_ioctl+0x96/0x560
2013-09-04T08:57:48.922418-07:00 plana40 kernel: [16104.114425] [<ffffffff8119cf6e>] ? fget_light+0x9e/0x130
2013-09-04T08:57:48.922420-07:00 plana40 kernel: [16104.119842] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
2013-09-04T08:57:48.922422-07:00 plana40 kernel: [16104.125260] [<ffffffff81192801>] SyS_ioctl+0x91/0xb0
2013-09-04T08:57:48.922425-07:00 plana40 kernel: [16104.130331] [<ffffffff813338fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
2013-09-04T08:57:48.922427-07:00 plana40 kernel: [16104.136794] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
2013-09-04T09:04:22.067668-07:00 plana40 kernel: [16497.521078] BTRFS debug (device sdc): unlinked 1 orphans
2013-09-04T09:08:06.457718-07:00 plana40 kernel: [16722.042874] BTRFS debug (device sdc): unlinked 1 orphans
2013-09-04T09:08:11.702927-07:00 plana40 kernel: [16727.295732] BTRFS debug (device sdc): unlinked 1 orphans
2013-09-04T09:08:17.032895-07:00 plana40 kernel: [16732.628876] BTRFS debug (device sdc): unlinked 1 orphans
2013-09-04T09:08:22.167754-07:00 plana40 kernel: [16737.762392] BTRFS debug (device sdc): unlinked 1 orphans

Actions #4

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-08_01:00:04-rados-master-testing-basic-plana/25850

Actions #5

Updated by Greg Farnum over 10 years ago

[ From here to comment 11 do not apply to any instances of this bug on master or dumpling. Please disregard. ]

Sam, did you correlate that btrfs lock warning with the error sequence in your example at all?

/a/sage-bug-6207-c/1 has an example of it failing but we don't have a good read happening before the failed one (this is on wip-copyfrom-big instead of master, though). It does involve the read happening from a new primary after the write, though, so there's a lot of space for things to go wrong in there.

Actions #6

Updated by Greg Farnum over 10 years ago

I looked through that failure more and gathered up some notes that I suspect will lead one of you to know what's going on. Verbatim:

plana046077-199 is bad object
403659 is bad offset
client.4108.0:1859 is the op that writes that offset
preceding op 1858 (but not this one) was a retry on the same object, got an ack but not an ondisk prior to processing this one (using btrfs!)
they commit to journal at the same time and are responded to in order
this happens on osd.0 (replicating to osd.4) and the read comes from osd.5

osd.4 backfill and replicated write serving:
ObjectRecoveryInfo(e255c7f1/plana046077-199/head//3@17'1497, copy_subset: [0~56], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:56, data_complete:true, omap_recovered_to:, omap_complete:true)
and does not attempt any other pulls.
And then osd.0 sends it subops for 1858, 1859, 1860? I don't understand recovery very well :/
and then osd.4 restarts
I see the 3 recovered log entries listed on boot

osd.5 has the first mention of the object at client op 2440. It delays serving them a while, but I don't ever see the object being recovered in the log?
osd.0 has plana046077-199 in a backfill list, and osd.5 seems to have set it as beginning backfill point, but osd.0 looks like it skips the object for some reason? Maybe because osd.5 is reporting the version is 50'1934 and that's the newest one osd.0 has? perhaps osd.5 incorrectly wrote over its object version when taking the log entries?

So it sure looks to me like osd.5 is filling in the local versions it has incorrectly for some reason. I also note that it looks like it creates the PG fresh during this so it didn't have any values of its own set to start with, so maybe there's an obvious place to look for unset values that are getting inappropriately overriden?

Actions #7

Updated by Greg Farnum over 10 years ago

http://fpaste.org/39993/79377326/ contains the output on osd.0 during backfill that concerns me if you don't want to go dig through the full logs to start with. I'm quite certain that osd.5, while it's gotten log events, has never seen the full plana046077-199 object.

Actions #8

Updated by Sage Weil over 10 years ago

which run are these notes referring to?

Actions #9

Updated by Greg Farnum over 10 years ago

/a/sage-bug-6207-c/1

Actions #10

Updated by Sage Weil over 10 years ago

pushed what it hink is a fix to wip-copyfrom-big. should get squashed down once we verify.

Actions #11

Updated by Greg Farnum over 10 years ago

The comments from me and Sage about wip-copyfrom-big sadly refer to a different issue that would not have been present in dumpling or master. Please disregard!

Actions #12

Updated by Samuel Just over 10 years ago

slider:/home/samuelj/interesting_logs/13-09-27-04:48:13-btrfsmaybezero has a possible example. notes.txt contains a brief explanation.

Actions #13

Updated by Sage Weil over 10 years ago

i think we should implement opportunistic crcs in FileStore.

any time we do a write, we record in an xattr the crc for any complete blocks (4k, 64k, whatever you configure). on any read, we verify crcs that we happen to have.

this would be pretty easy, and very useful for debugging (and/or the paranoid) to identify fs corruption. would need to disable the read-side checks during journal replay since the xattr could be out of sync with file data.

Actions #14

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-27_23:00:04-rados-master-testing-basic-plana/21804

Actions #15

Updated by Samuel Just over 10 years ago

Oportunistic checksums are enabled by default for xfs and btrfs in the nightlies at the moment. Nothing so far?

Actions #16

Updated by Samuel Just over 10 years ago

  • Status changed from New to Need More Info
Actions #17

Updated by Samuel Just over 10 years ago

  • Status changed from Need More Info to Resolved

This can be explained by prematurely updating last_backfill based on backfill_pos and backfills_in_flight.

ad5655beb2444667026b33b274744de0dc1bbdb7

Actions

Also available in: Atom PDF