Project

General

Profile

Actions

Bug #10534

closed

full-object read crc 0x######## != expected 0x######## on

Added by David Zafman over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-01-13 18:55:18,979.979 INFO:tasks.ceph.osd.0.plana02.stderr:2015-01-13 18:55:18.860647 7fe6d209b700 -1 log_channel(default) log [ERR] : 1.67 full-object read crc 0x52b5f
03f != expected 0x4fae88b7 on 31c93c67/plana0213876-8/head//1
5-01-13 18:55:18,989.989 INFO:tasks.rados.rados.0.plana02.stderr:8308: Error: oid 8 read returned error code -5
2015-01-13 18:55:18,994.994 INFO:tasks.thrashosds.thrasher:Recovered, killing an osd
2015-01-13 18:55:18,995.995 INFO:tasks.thrashosds.thrasher:Killing osd 1, live_osds are [4, 0, 3, 5, 2, 1]
2015-01-13 18:55:18,996.996 INFO:tasks.rados.rados.0.plana02.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f4b38ff9700 time 2015-01-13 18:55:18.969589
2015-01-13 18:55:18,996.996 INFO:tasks.rados.rados.0.plana02.stderr:./test/osd/RadosModel.h: 1067: FAILED assert(0)
2015-01-13 18:55:18,997.997 INFO:tasks.rados.rados.0.plana02.stderr: ceph version 0.90-940-gd159586 (d15958631b52a116ef596523fac5ffb8d77ead4e)
2015-01-13 18:55:18,997.997 INFO:tasks.rados.rados.0.plana02.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x4e0e65]
2015-01-13 18:55:18,997.997 INFO:tasks.rados.rados.0.plana02.stderr: 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xff) [0x4d0eff]
2015-01-13 18:55:18,998.998 INFO:tasks.rados.rados.0.plana02.stderr: 3: (()+0x7ff8d) [0x7f4b4582cf8d]
2015-01-13 18:55:18,998.998 INFO:tasks.rados.rados.0.plana02.stderr: 4: (()+0x5c259) [0x7f4b45809259]
2015-01-13 18:55:18,998.998 INFO:tasks.rados.rados.0.plana02.stderr: 5: (()+0x125868) [0x7f4b458d2868]
2015-01-13 18:55:18,998.998 INFO:tasks.rados.rados.0.plana02.stderr: 6: (()+0x7df3) [0x7f4b44bf9df3]
2015-01-13 18:55:18,999.999 INFO:tasks.rados.rados.0.plana02.stderr: 7: (clone()+0x6d) [0x7f4b43cfb3dd]
2015-01-13 18:55:18,999.999 INFO:tasks.rados.rados.0.plana02.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-01-13 18:55:18,999.999 INFO:tasks.rados.rados.0.plana02.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
Traceback (most recent call last):
File "/home/dzafman/src/teuthology/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run
result = self._run(*self.args, **self.kwargs)
File "/home/dzafman/src/ceph-qa-suite_master/tasks/rados.py", line 179, in thread
run.wait(tests.itervalues())
File "/home/dzafman/src/teuthology/teuthology/orchestra/run.py", line 391, in wait
proc.wait()
File "/home/dzafman/src/teuthology/teuthology/orchestra/run.py", line 103, in wait
raise CommandCrashedError(command=self.command)
CommandCrashedError: Command crashed: 'CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 100 --op write 0 --op delete 50 --max-ops 40000 --objects 50 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op snap_create 50 --op snap_remove 50 --op rollback 50 --op setattr 25 --op rmattr 25 --op copy_from 50 --op append 100 --pool unique_pool_0'
<Greenlet at 0x3117cd0: thread> failed with CommandCrashedError

I have reproduced this on the ceph master branch doing a single rados:thrash type job on xfs using master branch ceph-qa-suite. I found the object plana0213876-8 on osd.2 that got the read-crc error and another replica on osd.0. Both have the exact same object_info_t (see below). Also, md5sum of data are the same.

[ubuntu@plana02 ~]$ md5sum /var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1
bbdbef60799393ca24d66a6344c56a5b /var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1
[ubuntu@plana02 ~]$ md5sum /var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1
bbdbef60799393ca24d66a6344c56a5b /var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1

{ "oid": { "oid": "plana0213876-8",
"key": "",
"snapid": -2,
"hash": 835271783,
"max": 0,
"pool": 1,
"namespace": ""},
"version": "2380'1669",
"prior_version": "0'0",
"last_reqid": "client.4128.0:14753",
"user_version": 3102,
"size": 9911803,
"mtime": "2015-01-13 18:55:08.395923",
"local_mtime": "2015-01-13 18:55:08.573775",
"lost": 0,
"flags": 52,
"wrlock_by": "unknown.0.0:0",
"snaps": [],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": 1336838327,
"omap_digest": 4294967295,
"watchers": {}}

data_digest in hex 0x4FAE88B7 as seen in read-crc error

I picked another random object and it has the same content and the same object_info_t digest values.

[ubuntu@plana02 ~]$ md5sum /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1
bbdbef60799393ca24d66a6344c56a5b /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1

{ "oid": { "oid": "plana0213876-29",
"key": "",
"snapid": -2,
"hash": 1902236845,
"max": 0,
"pool": 1,
"namespace": ""},
"version": "2382'1056",
"prior_version": "0'0",
"last_reqid": "client.4128.0:14777",
"user_version": 3102,
"size": 9911803,
"mtime": "2015-01-13 18:55:12.870849",
"local_mtime": "2015-01-13 18:55:13.044068",
"lost": 0,
"flags": 52,
"wrlock_by": "unknown.0.0:0",
"snaps": [],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": 1336838327,
"omap_digest": 4294967295,
"watchers": {}}

Actions #1

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by David Zafman over 9 years ago


Coincidence that I picked the only other object that happens to be the same length and has the same content.  Also, there is one replica of one of the 2 objects that has the correct data.  

[ubuntu@plana02 ~]$ find /var/lib/ceph -name '*plana*' -ls | grep 9911803
67108984 1488 -rw-r--r--   1 root     root      9911803 Jan 13 18:55 /var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1
134353421 1488 -rw-r--r--   1 root     root      9911803 Jan 13 18:55 /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1
134337341 1488 -rw-r--r--   1 root     root      9911803 Jan 13 18:55 /var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1
201326829 9680 -rw-r--r--   1 root     root      9911803 Jan 13 18:55 /var/lib/ceph/osd/ceph-0/current/1.ad_head/plana0213876-29__head_7161D4AD__1
[ubuntu@plana02 ~]$ md5sum `find /var/lib/ceph -name '*plana*'` | egrep 'plana0213876-8|plana0213876-29'
bbdbef60799393ca24d66a6344c56a5b  /var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1
bbdbef60799393ca24d66a6344c56a5b  /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1
bbdbef60799393ca24d66a6344c56a5b  /var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1
e93314bed6cde6d6a464628c4bb1fdb9  /var/lib/ceph/osd/ceph-0/current/1.ad_head/plana0213876-29__head_7161D4AD__1

2015-01-14 10:55:00.939164 7f30e39dd700 10 osd.0 pg_epoch: 2393 pg[1.67( v 2380'1669 (0'0,2380'1669] local-les=2393 n=1 ec=8 les/c 2393/2393 2392/2392/2389) [0,2] r=0 lpr=2392 crt=2297'1664 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] be_select_auth_object: rejecting osd 0 for obj 31c93c67/plana0213876-8/head//1, data digest mismatch 0x52b5f03f != 0x4fae88b7
2015-01-14 10:55:00.939186 7f30e39dd700 10 osd.0 pg_epoch: 2393 pg[1.67( v 2380'1669 (0'0,2380'1669] local-les=2393 n=1 ec=8 les/c 2393/2393 2392/2392/2389) [0,2] r=0 lpr=2392 crt=2297'1664 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] be_select_auth_object: rejecting osd 2 for obj 31c93c67/plana0213876-8/head//1, data digest mismatch 0x52b5f03f != 0x4fae88b7
2015-01-14 10:57:42.286053 7fe1cfe5e700 10 osd.2 pg_epoch: 2393 pg[1.ad( v 2382'1056 (0'0,2382'1056] local-les=2393 n=1 ec=8 les/c 2393/2393 2392/2392/2392) [2,0] r=0 lpr=2392 crt=2335'1055 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] be_select_auth_object: rejecting osd 2 for obj 7161d4ad/plana0213876-29/head//1, data digest mismatch 0x52b5f03f != 0x4fae88b7

The 3 bad files have either a hole / zero bytes at the start and many fewer blocks then the good data file.

[ubuntu@plana02 ~]$ od -cx  /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1 | less
0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
           0000    0000    0000    0000    0000    0000    0000    0000
*
40000000 246 371 302   v 313 326   w   * 344 304   ~ 313 260   J  \b 230
           f9a6    76c2    d6cb    2a77    c4e4    cb7e    4ab0    9808

[ubuntu@plana02 ~]$ stat /var/lib/ceph/osd/ceph-0/current/1.ad_head/plana0213876-29__head_7161D4AD__1
  File: ‘/var/lib/ceph/osd/ceph-0/current/1.ad_head/plana0213876-29__head_7161D4AD__1’
  Size: 9911803         Blocks: 19360      IO Block: 4096   regular file
Device: 830h/2096d      Inode: 201326829   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-01-13 18:55:13.119139616 -0800
Modify: 2015-01-13 18:55:13.143139989 -0800
Change: 2015-01-13 18:55:13.447144717 -0800
 Birth: -
[ubuntu@plana02 ~]$ stat /var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1 /var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1 /var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1
  File: ‘/var/lib/ceph/osd/ceph-2/current/1.67_head/plana0213876-8__head_31C93C67__1’
  Size: 9911803         Blocks: 2976       IO Block: 4096   regular file
Device: 820h/2080d      Inode: 67108984    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-01-13 18:55:08.755071755 -0800
Modify: 2015-01-13 18:55:08.755071755 -0800
Change: 2015-01-13 18:55:08.947074741 -0800
 Birth: -
  File: ‘/var/lib/ceph/osd/ceph-2/current/1.ad_head/plana0213876-29__head_7161D4AD__1’
  Size: 9911803         Blocks: 2976       IO Block: 4096   regular file
Device: 820h/2080d      Inode: 134353421   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-01-13 18:55:13.243141544 -0800
Modify: 2015-01-13 18:55:13.243141544 -0800
Change: 2015-01-13 18:55:13.351143224 -0800
 Birth: -
  File: ‘/var/lib/ceph/osd/ceph-0/current/1.67_head/plana0213876-8__head_31C93C67__1’
  Size: 9911803         Blocks: 2976       IO Block: 4096   regular file
Device: 830h/2096d      Inode: 134337341   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-01-13 18:55:08.811072625 -0800
Modify: 2015-01-13 18:55:08.811072625 -0800
Change: 2015-01-13 18:55:08.891073869 -0800
 Birth: -

Actions #3

Updated by David Zafman over 9 years ago

This object which has one good and one bad replica has identical object_info_t on both osd.0 and osd.2

{ "oid": { "oid": "plana0213876-29",
"key": "",
"snapid": -2,
"hash": 1902236845,
"max": 0,
"pool": 1,
"namespace": ""},
"version": "2382'1056",
"prior_version": "0'0",
"last_reqid": "client.4128.0:14777",
"user_version": 3102,
"size": 9911803,
"mtime": "2015-01-13 18:55:12.870849",
"local_mtime": "2015-01-13 18:55:13.044068",
"lost": 0,
"flags": 52,
"wrlock_by": "unknown.0.0:0",
"snaps": [],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": 1336838327,
"omap_digest": 4294967295,
"watchers": {}}

Actions #4

Updated by David Zafman over 9 years ago

The yaml file that reproduced this issue:

description: rados:thrash/{clusters/fixed-2.yaml fs/ext4.yaml msgr-failures/few.yaml
  thrashers/mapgap.yaml workloads/ec-rados-plugin=jerasure-k=2-m=1.yaml}
kernel: &id001
  kdb: true
  branch: testing
interactive-on-error: true
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon min osdmap epochs: 2
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
        osd map cache size: 1
    fs: xfs
    log-whitelist:
    - slow request
    branch: master
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      branch: master
  s3tests:
    branch: master
  workunit:
    branch: master
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
  - client.0
- - mon.b
  - osd.3
  - osd.4
  - osd.5
  - client.1
#suite: rados:thrash
suite_branch: master
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
    - osd_map_cache_size
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    chance_test_map_discontinuity: 0.5
    timeout: 1800
- rados:
    clients:
    - client.0
    ec_pool: false
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 40000
teuthology_branch: master

Actions #5

Updated by David Zafman over 9 years ago

Also seen as a corrupt test file:

2015-01-15 12:19:05,431.431 INFO:tasks.rados.rados.0.plana02.stderr:1718: oid 1 contents (ObjNum 524 snap 190 seq_num 524) corrupt
2015-01-15 12:19:05,431.431 INFO:tasks.rados.rados.0.plana02.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa2e3fff700 time 2015-01-15 12:19:05.415463
2015-01-15 12:19:05,431.431 INFO:tasks.rados.rados.0.plana02.stderr:./test/osd/RadosModel.h: 1102: FAILED assert(0)

CommandCrashedError: Command crashed: 'CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 100 --op write 0 --op delete 50 --max-ops 40000 --objects 50 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op snap_create 50 --op snap_remove 50 --op rollback 50 --op setattr 25 --op rmattr 25 --op copy_from 50 --op append 100 --pool unique_pool_0'
<Greenlet at 0x1b7e050: thread> failed with CommandCrashedError

A deep scrub sees data that doesn't match the digest.
2015-01-15 14:12:50.943131 osd.3 [ERR] deep-scrub 1.6 23c52c06/plana0213726-1/dc//1 on disk data digest 0x438ccf63 != 0xe5868739
2015-01-15 14:12:50.943159 osd.3 [ERR] deep-scrub 1.6 23c52c06/plana0213726-1/cc//1 on disk data digest 0x438ccf63 != 0xe5868739

Actions #6

Updated by David Zafman over 9 years ago

I reproduced this without snapshots and found a case where a single replica on a single osd was bad. A touch, write, setattr happened before a create_collection("1.c_TEMP") so they all failed with ENOENT. The good osd happen to have created the collection at a much earlier time.

2015-01-16 16:31:16.739055 7f51087c0700 10 osd.0 pg_epoch: 19 pg[1.c( v 16'42 (0'0,16'42] local-les=6 n=2 ec=5 les/c 6/6 5/5/5) [0,4] r=0 lpr=19 pi=5-18/1 luod=0'0 crt=13'36 lcod 16'41 mlcod 0'0 active] start_peering_interval: check_new_interval output: generate_past_intervals interval(5-18 up [3,0](3) acting [3,0](3)): not rw, up_thru 17 up_from 3 last_epoch_clean 6
2015-01-16 16:34:12.964657 7f5115123700 10 journal op_apply_start 1093 open_ops 0 -> 1
2015-01-16 16:34:12.964663 7f5115123700 5 filestore(/var/lib/ceph/osd/ceph-0) _do_op 0x4c52f30 seq 1093 osr(1.c 0x42787f0)/0x42787f0 start
2015-01-16 16:34:12.964667 7f5115123700 10 filestore(/var/lib/ceph/osd/ceph-0) _do_transaction on 0x5521180
2015-01-16 16:34:12.964676 7f5115123700 15 filestore(/var/lib/ceph/osd/ceph-0) touch 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1
2015-01-16 16:34:12.964762 7f5115123700 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 0/temp_1.c_0_4116_1/head//-1 in index: (2) No such file or directory
2015-01-16 16:34:12.964802 7f5115123700 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 0/temp_1.c_0_4116_1/head//-1 in index: (2) No such file or directory
2015-01-16 16:34:12.964807 7f5115123700 10 filestore(/var/lib/ceph/osd/ceph-0) setattrs 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 = -2
2015-01-16 16:34:12.964812 7f5115123700 15 filestore(/var/lib/ceph/osd/ceph-0) write 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 0~8388608
2015-01-16 16:34:12.964829 7f5115123700 0 filestore(/var/lib/ceph/osd/ceph-0) write couldn't open 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1: (2) No such file or directory
2015-01-16 16:34:12.964832 7f5115123700 10 filestore(/var/lib/ceph/osd/ceph-0) write 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 0~8388608 = -2
2015-01-16 16:34:12.964847 7f5115123700 15 filestore(/var/lib/ceph/osd/ceph-0) create_collection /var/lib/ceph/osd/ceph-0/current/1.c_TEMP
2015-01-16 16:34:12.964899 7f5115123700 10 filestore(/var/lib/ceph/osd/ceph-0) create_collection /var/lib/ceph/osd/ceph-0/current/1.c_TEMP = 0

Some later thing looks ok for a later offset in the file:

2015-01-16 16:34:13.030240 7f5115924700 10 journal op_apply_start 1094 open_ops 0 -> 1
2015-01-16 16:34:13.030246 7f5115924700 5 filestore(/var/lib/ceph/osd/ceph-0) _do_op 0x4fb6b80 seq 1094 osr(1.c 0x42787f0)/0x42787f0 start
2015-01-16 16:34:13.030250 7f5115924700 10 filestore(/var/lib/ceph/osd/ceph-0) _do_transaction on 0x53e3f00
2015-01-16 16:34:13.030260 7f5115924700 15 filestore(/var/lib/ceph/osd/ceph-0) remove 1.c_head/dc7fcd4c/plana0215983-30/head//1
2015-01-16 16:34:13.031024 7f5115924700 10 filestore(/var/lib/ceph/osd/ceph-0) remove 1.c_head/dc7fcd4c/plana0215983-30/head//1 = 0
2015-01-16 16:34:13.031033 7f5115924700 15 filestore(/var/lib/ceph/osd/ceph-0) write 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 8388608~469563
2015-01-16 16:34:13.031490 7f5115924700 10 filestore(/var/lib/ceph/osd/ceph-0) write 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 8388608~469563 = 469563
2015-01-16 16:34:13.031498 7f5115924700 15 filestore(/var/lib/ceph/osd/ceph-0) _collection_move_rename 1.c_head/dc7fcd4c/plana0215983-30/head//1 from 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1
2015-01-16 16:34:13.147573 7f5115924700 10 filestore(/var/lib/ceph/osd/ceph-0) _collection_move_rename 1.c_head/dc7fcd4c/plana0215983-30/head//1 from 1.c_TEMP/0/temp_1.c_0_4116_1/head//-1 = 0

Actions #7

Updated by David Zafman over 9 years ago

  • Status changed from 12 to 7
Actions #8

Updated by David Zafman over 9 years ago

An OP_MKCOLL is being put in out of order. Somehow under simple situations the data somehow gets written, but I assume that on log replay whatever ends up writing the first chunk of data does not happen again.


   -60> 2015-01-17 13:07:11.187671 7f7720ff9700  0 filestore(/home/dzafman/ceph/src/dev/osd2)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "touch",
            "collection": "1.5_TEMP",
            "oid": "0\/temp_1.5_0_4109_1\/head\/\/-1" 
        },
        {
            "op_num": 1,
            "op_name": "write",
            "collection": "1.5_TEMP",
            "oid": "0\/temp_1.5_0_4109_1\/head\/\/-1",
            "length": 8388608,
            "offset": 0,
            "bufferlist length": 8388608
        },
        {
            "op_num": 2,
            "op_name": "mkcoll",
            "collection": "1.5_TEMP" 
        },
        {
            "op_num": 3,
            "op_name": "omap_setkeys",
            "collection": "1.5_head",
            "oid": "5\/\/head\/\/1",
            "attr_lens": {}
        },
        {
            "op_num": 4,
            "op_name": "omap_setkeys",
            "collection": "1.5_head",
            "oid": "5\/\/head\/\/1",
            "attr_lens": {
                "_epoch": 4,
                "_info": 729
            }
        }
    ]
}

I think this is where the create_collection() is being added to the transaction:

#0  ObjectStore::Transaction::create_collection (this=0x7f47ad7f8950, cid=...) at ./os/ObjectStore.h:1187
#1  0x00000000017a5cf3 in PGBackend::get_temp_coll (this=0x7f476805fe60, t=0x7f47ad7f8950) at osd/PGBackend.cc:104
#2  0x0000000001883f01 in ReplicatedBackend::submit_transaction (this=0x7f476805fe60, soid=..., at_version=..., _t=0x7f4788017ca0, trim_to=..., trim_rollback_to=...,
    log_entries=std::vector of length 0, capacity 0, hset_history=..., on_local_applied_sync=0x7f478805a010, on_all_acked=0x7f478805d0d0, on_all_commit=0x7f478800eca0, tid=34, reqid=..., orig_op=
    std::tr1::shared_ptr (empty) 0x0) at osd/ReplicatedBackend.cc:577
#3  0x00000000016fe3c1 in ReplicatedPG::issue_repop (this=0x7f47680a7010, repop=0x7f4788019940, now=...) at osd/ReplicatedPG.cc:7516
#4  0x00000000016ffaba in ReplicatedPG::simple_repop_submit (this=0x7f47680a7010, repop=0x7f4788019940) at osd/ReplicatedPG.cc:7703
#5  0x00000000016f30e5 in ReplicatedPG::process_copy_chunk (this=0x7f47680a7010, oid=..., tid=1, r=0) at osd/ReplicatedPG.cc:6343
#6  0x000000000174abfc in C_Copyfrom::finish (this=0x7f47a400afa0, r=0) at osd/ReplicatedPG.cc:6000
#7  0x00000000014514fb in Context::complete (this=0x7f47a400afa0, r=0) at ./include/Context.h:64
#8  0x000000000196d7ee in Finisher::finisher_thread_entry (this=0x4a4ab00) at common/Finisher.cc:59
#9  0x000000000145fd70 in Finisher::FinisherThread::entry (this=0x4a4ac18) at ./common/Finisher.h:46
#10 0x0000000001a9a1aa in Thread::entry_wrapper (this=0x4a4ac18) at common/Thread.cc:61
#11 0x0000000001a9a11c in Thread::_entry_func (arg=0x4a4ac18) at common/Thread.cc:45
#12 0x00007f47f42ace9a in start_thread (arg=0x7f47ad7fa700) at pthread_create.c:308
#13 0x00007f47f28b273d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
Actions #9

Updated by Samuel Just over 9 years ago

Looks like d427ca35404a30e1f428859c3274e030f2f83ef6 reversed the order of localt (which contains the create_collection) and op_t (which contains all of the operations on the object in the temp collection).

Actions #10

Updated by David Zafman over 9 years ago

  • Assignee changed from David Zafman to Dong Yuan
Actions #11

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF