Project

General

Profile

Actions

Bug #13381

closed

osd/SnapMapper.cc: 282: FAILED assert(check(oid)) on hammer->jewel upgrade

Added by Sage Weil over 8 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/firefly-hammer-x, upgrade/hammer-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -14> 2015-10-05 19:30:49.183599 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0 = 0
   -13> 2015-10-05 19:30:49.183608 7f8e7f200700 20 snap_mapper.remove_oid 0/3473053c/1000e081c63.0000009d/head
   -12> 2015-10-05 19:30:49.183612 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0
   -11> 2015-10-05 19:30:49.183645 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0 = 0
   -10> 2015-10-05 19:30:49.183654 7f8e7f200700 20 snap_mapper.remove_oid 0/c273053c/100010365d9.00000000/head
    -9> 2015-10-05 19:30:49.183658 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0
    -8> 2015-10-05 19:30:49.183690 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0 = 0
    -7> 2015-10-05 19:30:49.183699 7f8e7f200700 20 snap_mapper.remove_oid 0/a6f3053c/100005b91ae.00000000/head
    -6> 2015-10-05 19:30:49.183703 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0
    -5> 2015-10-05 19:30:49.183736 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0 = 0
    -4> 2015-10-05 19:30:49.183745 7f8e7f200700 20 snap_mapper.remove_oid 0/cef3053c/1000ec7a4fc.00000000/head
    -3> 2015-10-05 19:30:49.183749 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0
    -2> 2015-10-05 19:30:49.183782 7f8e7f200700 15 filestore(/var/lib/ceph/osd/ceph-40) omap_get_values meta/-1/a468ec03/snapmapper/0 = 0
    -1> 2015-10-05 19:30:49.183791 7f8e7f200700 20 snap_mapper.remove_oid 0/d2200e36/1000a89628c.00000000/head
     0> 2015-10-05 19:30:49.186168 7f8e7f200700 -1 osd/SnapMapper.cc: In function 'int SnapMapper::remove_oid(const hobject_t&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7f8e7f200700 time 2015-10-05 19:30:49.183793
osd/SnapMapper.cc: 282: FAILED assert(check(oid))

 ceph version 9.0.3-2033-g3570ec6 (3570ec612a6e85169007e50533ea56c152a23f8e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f8eae279c7b]
 2: (SnapMapper::remove_oid(hobject_t const&, MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x1ed) [0x7f8eadd5eecd]
 3: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0x454) [0x7f8eadcc2dc4]
 4: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1f4) [0x7f8eadcc3834]
 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x10a) [0x7f8eadd16eba]
 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7f8eae26b6f6]
 7: (ThreadPool::WorkThread::entry()+0x10) [0x7f8eae26c5c0]
 8: (()+0x8182) [0x7f8eac630182]
 9: (clone()+0x6d) [0x7f8eaa97747d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this is mira093 osd.40 and osd.39, after upgrading from hammer -> infernalis.

pg is 0.53c


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #13862: pgs stuck inconsistent after infernalis upgradeResolvedDavid Zafman11/23/2015

Actions
Actions #1

Updated by Sage Weil over 8 years ago

looks like an object is in the wrong directory

root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# ls -al | grep 1000a89628c
-rw-r--r--   1 root root      48 Aug 26 09:56 1000a89628c.00000000__head_D2200E36__0
root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# attr -l 1000a89628c.00000000__head_D2200E36__0
Attribute "cephos.spill_out" has a 2 byte value for 1000a89628c.00000000__head_D2200E36__0
Attribute "ceph._" has a 255 byte value for 1000a89628c.00000000__head_D2200E36__0
Attribute "ceph._parent" has a 347 byte value for 1000a89628c.00000000__head_D2200E36__0
Attribute "ceph.snapset" has a 31 byte value for 1000a89628c.00000000__head_D2200E36__0
root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# attr -q -h 1000a89628c.00000000__head_D2200E36__0
attr: invalid option -- 'h'
Unrecognized option: ?
Usage: attr [-LRSq] -s attrname [-V attrvalue] pathname  # set value
       attr [-LRSq] -g attrname pathname                 # get value
       attr [-LRSq] -r attrname pathname                 # remove attr
       attr [-LRq]  -l pathname                          # list attrs 
      -s reads a value from stdin and -g writes a value to stdout
root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# attr -q -g ceph._ 1000a89628c.00000000__head_D2200E36__0 > /tmp/a
root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# ceph-dencoder import /tmp/a type object_info_t decode dump_json
{
    "oid": {
        "oid": "1000a89628c.00000000",
        "key": "",
        "snapid": -2,
        "hash": 3525316150,
        "max": 0,
        "pool": 0,
        "namespace": "" 
    },
    "version": "469776'609414",
    "prior_version": "408114'595463",
    "last_reqid": "mds.0.122:96384116",
    "user_version": 609414,
    "size": 48,
    "mtime": "2015-04-29 03:23:53.397117",
    "local_mtime": "2015-04-29 03:23:54.774958",
    "lost": 0,
    "flags": 52,
    "snaps": [],
    "truncate_seq": 0,
    "truncate_size": 0,
    "data_digest": 3949395613,
    "omap_digest": 4294967295,
    "watchers": {}
}

root@mira093:/var/lib/ceph/osd/ceph-40/current/0.53c_head# printf "%x\n" 3525316150
d2200e36

Actions #2

Updated by Sage Weil over 8 years ago

same assert was triggering several days before the infernalis upgrade, so this is nothing new.

Actions #3

Updated by Sage Weil over 8 years ago

  • Priority changed from Urgent to Normal

I think this is just cruft from something that happened ages ago.

Actions #4

Updated by Samuel Just over 8 years ago

  • Assignee set to Samuel Just
  • Priority changed from Normal to Urgent
Actions #5

Updated by Samuel Just over 8 years ago

  • Assignee deleted (Samuel Just)
  • Priority changed from Urgent to Normal
Actions #6

Updated by Sage Weil over 8 years ago

Sage Weil wrote:

I think this is just cruft from something that happened ages ago.

2015-04-29 03:23:54.774958, specifically

Actions #8

Updated by Sage Weil over 8 years ago

  • Subject changed from osd/SnapMapper.cc: 282: FAILED assert(check(oid)) to osd/SnapMapper.cc: 282: FAILED assert(check(oid)) on hammer->jewel upgrade
  • Priority changed from Normal to Urgent
  • Source changed from Development to Q/A
Actions #9

Updated by Yuri Weinstein over 8 years ago

Also on hammer -> infernalis
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-22_17:02:01-upgrade:hammer-x-infernalis-distro-basic-openstack/
Job: 48665
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2015-12-22_17:02:01-upgrade:hammer-x-infernalis-distro-basic-openstack/48665/teuthology.log

2015-12-22T19:27:53.872 INFO:tasks.thrashosds:joining thrashosds
2015-12-22T19:27:54.013 INFO:tasks.ceph.osd.1.target091046.stderr:osd/SnapMapper.cc: In function 'int SnapMapper::remove_oid(const hobject_t&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7f64a689d700 time 2015-12-22 19:27:53.913624
2015-12-22T19:27:54.013 INFO:tasks.ceph.osd.1.target091046.stderr:osd/SnapMapper.cc: 282: FAILED assert(check(oid))
2015-12-22T19:27:54.020 INFO:tasks.ceph.osd.1.target091046.stderr: ceph version 9.2.0-25-gf480cea (f480cea217008fa7b1e476d30dcb13023e6431d1)
2015-12-22T19:27:54.020 INFO:tasks.ceph.osd.1.target091046.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f64cb45d27b]
2015-12-22T19:27:54.020 INFO:tasks.ceph.osd.1.target091046.stderr: 2: (SnapMapper::remove_oid(hobject_t const&, MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x1ed) [0x7f64caf4ea1d]
2015-12-22T19:27:54.021 INFO:tasks.ceph.osd.1.target091046.stderr: 3: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0x454) [0x7f64caeb2084]
2015-12-22T19:27:54.021 INFO:tasks.ceph.osd.1.target091046.stderr: 4: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1f4) [0x7f64caeb2af4]
2015-12-22T19:27:54.021 INFO:tasks.ceph.osd.1.target091046.stderr: 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x10a) [0x7f64caf062ea]
2015-12-22T19:27:54.021 INFO:tasks.ceph.osd.1.target091046.stderr: 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7f64cb44ecf6]
2015-12-22T19:27:54.022 INFO:tasks.ceph.osd.1.target091046.stderr: 7: (ThreadPool::WorkThread::entry()+0x10) [0x7f64cb44fbc0]
2015-12-22T19:27:54.022 INFO:tasks.ceph.osd.1.target091046.stderr: 8: (()+0x8182) [0x7f64c9a7b182]
2015-12-22T19:27:54.022 INFO:tasks.ceph.osd.1.target091046.stderr: 9: (clone()+0x6d) [0x7f64c7dc247d]
2015-12-22T19:27:54.022 INFO:tasks.ceph.osd.1.target091046.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #10

Updated by Yuri Weinstein over 8 years ago

  • Release set to infernalis
  • Release set to jewel
  • ceph-qa-suite upgrade/hammer-x added
Actions #11

Updated by Sage Weil over 8 years ago

2015-12-22 19:27:53.913613 7f64a689d700 20 snap_mapper.remove_oid -1/00000000/temp_4.40s0_0_34564_1/head

could this be David's bug where the temp object isn't getting cleaned up?

Actions #12

Updated by Sage Weil over 8 years ago

  • Related to Bug #13862: pgs stuck inconsistent after infernalis upgrade added
Actions #13

Updated by David Zafman over 8 years ago

This object has a hash and is in pool 0. Doesn't seem related to 13862. Don't know why check() failed.
1000a89628c.00000000__head_D2200E36__0

This looks like a Hammer temporary object with no hash and pool == -1. The 13862 fix will attempt to remove it.
-1/00000000/temp_4.40s0_0_34564_1/head

The fix for 13862 will remove this second object type using a simple remove transaction as opposed to using the recursive_remove_collection() which uses mapper.remove_oid() presumably to handle snapshots in the transaction before doing the object remove transaction. The mapper.remove_oid() runs into the assert(). So 13862 is at least a partial fix for this bug.

Actions #14

Updated by Yuri Weinstein over 8 years ago

  • ceph-qa-suite upgrade/firefly-hammer-x added

Also in run:
http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-16_11:18:01-upgrade:firefly-hammer-x-infernalis-distro-basic-openstack/
Job: ['3764']
Log: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-01-16_11:18:01-upgrade:firefly-hammer-x-infernalis-distro-basic-openstack/3764/teuthology.log

2016-01-16T13:04:00.315 INFO:tasks.workunit.client.0.target078138.stdout:done waiting
2016-01-16T13:04:00.324 INFO:tasks.workunit.client.0.target078138.stdout:test/librados/TestCase.cc:336: Failure
2016-01-16T13:04:00.324 INFO:tasks.workunit.client.0.target078138.stdout:Value of: completion->get_return_value()
2016-01-16T13:04:00.325 INFO:tasks.workunit.client.0.target078138.stdout:  Actual: -2
2016-01-16T13:04:00.325 INFO:tasks.workunit.client.0.target078138.stdout:Expected: 0
2016-01-16T13:04:00.432 INFO:tasks.workunit.client.0.target078138.stdout:[  FAILED  ] LibRadosMiscPP.CopyScrubPP (88783 ms)
2016-01-16T13:04:00.500 INFO:tasks.ceph.mon.a.target078137.stderr:2016-01-16 13:04:00.460412 7f6223890700 -1 mon.a@1(peon).osd e1882 update_from_paxos full map CRC mismatch, resetting to canonical
2016-01-16T13:04:00.517 INFO:tasks.workunit.client.0.target078138.stdout:[----------] 18 tests from LibRadosMiscPP (111523 ms total)
2016-01-16T13:04:00.518 INFO:tasks.workunit.client.0.target078138.stdout:
2016-01-16T13:04:00.519 INFO:tasks.workunit.client.0.target078138.stdout:[----------] Global test environment tear-down
2016-01-16T13:04:00.519 INFO:tasks.workunit.client.0.target078138.stdout:[==========] 24 tests from 4 test cases ran. (115730 ms total)
2016-01-16T13:04:00.519 INFO:tasks.workunit.client.0.target078138.stdout:[  PASSED  ] 23 tests.
2016-01-16T13:04:00.519 INFO:tasks.workunit.client.0.target078138.stdout:[  FAILED  ] 1 test, listed below:
2016-01-16T13:04:00.520 INFO:tasks.workunit.client.0.target078138.stdout:[  FAILED  ] LibRadosMiscPP.CopyScrubPP
2016-01-16T13:04:00.520 INFO:tasks.workunit.client.0.target078138.stdout:
2016-01-16T13:04:00.520 INFO:tasks.workunit.client.0.target078138.stdout: 1 FAILED TEST
2016-01-16T13:04:00.530 INFO:tasks.workunit:Stopping ['rados/test-upgrade-v9.0.1.sh', 'cls'] on client.0...
2016-01-16T13:04:00.530 INFO:teuthology.orchestra.run.target078138:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/workunit.client.0 /home/ubuntu/cephtest/clone'
2016-01-16T13:04:00.537 INFO:tasks.ceph.osd.1.target078137.stderr:osd/SnapMapper.cc: In function 'int SnapMapper::remove_oid(const hobject_t&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7f3611c9c700 time 2016-01-16 13:04:00.492339
2016-01-16T13:04:00.537 INFO:tasks.ceph.osd.1.target078137.stderr:osd/SnapMapper.cc: 282: FAILED assert(check(oid))
2016-01-16T13:04:00.537 INFO:tasks.ceph.osd.1.target078137.stderr: ceph version 9.2.0-39-g1296c2b (1296c2baef3412f462ee2124af747a892ea8b7a9)
2016-01-16T13:04:00.538 INFO:tasks.ceph.osd.1.target078137.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f363393a255]
2016-01-16T13:04:00.538 INFO:tasks.ceph.osd.1.target078137.stderr: 2: (SnapMapper::remove_oid(hobject_t const&, MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x215) [0x7f363340ae15]
2016-01-16T13:04:00.538 INFO:tasks.ceph.osd.1.target078137.stderr: 3: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0x454) [0x7f363336a334]
2016-01-16T13:04:00.538 INFO:tasks.ceph.osd.1.target078137.stderr: 4: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x207) [0x7f363336adb7]
2016-01-16T13:04:00.539 INFO:tasks.ceph.osd.1.target078137.stderr: 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x11a) [0x7f36333c042a]
2016-01-16T13:04:00.539 INFO:tasks.ceph.osd.1.target078137.stderr: 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0x7f363392b806]
2016-01-16T13:04:00.539 INFO:tasks.ceph.osd.1.target078137.stderr: 7: (ThreadPool::WorkThread::entry()+0x10) [0x7f363392c6d0]
2016-01-16T13:04:00.539 INFO:tasks.ceph.osd.1.target078137.stderr: 8: (()+0x7dc5) [0x7f36319cedc5]
2016-01-16T13:04:00.540 INFO:tasks.ceph.osd.1.target078137.stderr: 9: (clone()+0x6d) [0x7f363027521d]
2016-01-16T13:04:00.540 INFO:tasks.ceph.osd.1.target078137.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #16

Updated by Sage Weil about 8 years ago

That last failure included the 13862 partial fix (altho it was an upgrade from 0.94.5).

Actions #17

Updated by Sage Weil about 8 years ago

grr, teuthology remote logs are gone.

Actions #18

Updated by David Zafman about 8 years ago

  • Status changed from 12 to Duplicate

Actually, the jewel upgrade was to sha1 e7107f87151e9674502088bbe2d183ff4528647f which didn't include the 13862 fix.

Please re-run upgrading to current Jewel (currently 4b97cd7bf515aabb4474b223486291029f6644ac) or v10.0.3 or later.

Actions #19

Updated by Dan Mick almost 8 years ago

This happened again on the lrc with the same pg. Sam helped me remove the offending object from all the OSDs that went down (about 12 of them) and then all the acting set OSDs.

Actions #20

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Duplicate to New

Reopening because this is now happening in jewel (10.2.8 integration branch). Something must have regressed.

http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-27_15:24:41-upgrade:hammer-x-wip-jewel-backports-distro-basic-vps/1332796/

Actions #21

Updated by Sage Weil almost 7 years ago

  • Assignee set to Sage Weil
Actions #22

Updated by Sage Weil almost 7 years ago

analyzing /a/smithfarm-2017-06-27_15:24:41-upgrade:hammer-x-wip-jewel-backports-distro-basic-vps/1332796 ...

- the object is 3:5c2fc0f8:::vpm04318680-33...
- currently in 3.as2
- failed assert because mask_bits = 5 and match = 10 (3.as2), but oid.hash & 31 is 26 because it should be in 3.1as2.
- split happened here:

2017-06-27 15:56:11.521298 7f9d38d03700 15 filestore(/var/lib/ceph/osd/ceph-4) _split_collection 3.as2_head bits: 5
2017-06-27 15:56:11.521335 7f9d38d03700 15 filestore(/var/lib/ceph/osd/ceph-4) collection_stat /var/lib/ceph/osd/ceph-4/current/3.as2_head
2017-06-27 15:56:11.521352 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) collection_stat /var/lib/ceph/osd/ceph-4/current/3.as2_head = 0
2017-06-27 15:56:11.521356 7f9d38d03700 15 filestore(/var/lib/ceph/osd/ceph-4) collection_stat /var/lib/ceph/osd/ceph-4/current/3.1as2_head
2017-06-27 15:56:11.521361 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) collection_stat /var/lib/ceph/osd/ceph-4/current/3.1as2_head = 0
2017-06-27 15:56:11.521369 7f9d38d03700 20 filestore dbobjectmap: seq is 692
2017-06-27 15:56:11.528297 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _set_global_replay_guard: 18301.0.1 done
2017-06-27 15:56:11.528321 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _set_replay_guard 18301.0.1 START
2017-06-27 15:56:11.529441 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _set_replay_guard 18301.0.1 done
2017-06-27 15:56:11.529463 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _set_replay_guard 18301.0.1 START
2017-06-27 15:56:11.530010 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _set_replay_guard 18301.0.1 done
2017-06-27 15:56:11.546513 7f9d38d03700 20 LFNIndex(/var/lib/ceph/osd/ceph-4/current/3.as2_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-4/current/3.as2_head/vpm04318680-15 01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061_25f195e8d3715936d04d_0_long
2017-06-27 15:56:11.549356 7f9d38d03700 20 LFNIndex(/var/lib/ceph/osd/ceph-4/current/3.as2_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-4/current/3.as2_head/vpm04318680-15 01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061_4d64ff7e5f2c878d95d2_0_long
2017-06-27 15:56:11.551752 7f9d38d03700 20 LFNIndex(/var/lib/ceph/osd/ceph-4/current/3.as2_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-4/current/3.as2_head/vpm04318680-15 01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061_89fe26e467ca09df10b5_0_long
2017-06-27 15:56:11.561472 7f9d38d03700 20 LFNIndex(/var/lib/ceph/osd/ceph-4/current/3.as2_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-4/current/3.as2_head/vpm04318680-33 01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061_961acae792ffce686b7c_0_long
2017-06-27 15:56:11.563680 7f9d38d03700 20 LFNIndex(/var/lib/ceph/osd/ceph-4/current/3.as2_head) lfn_unlink removing alt attr from /var/lib/ceph/osd/ceph-4/current/3.as2_head/vpm04318680-33 01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061_a71554d0bbce848f9876_0_long
2017-06-27 15:56:11.564846 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _close_replay_guard 18301.0.1
2017-06-27 15:56:11.564860 7f9d38d03700 20 filestore dbobjectmap: seq is 693
2017-06-27 15:56:11.567148 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _close_replay_guard 18301.0.1 done
2017-06-27 15:56:11.567198 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _close_replay_guard 18301.0.1
2017-06-27 15:56:11.567221 7f9d38d03700 20 filestore dbobjectmap: seq is 693
2017-06-27 15:56:11.568748 7f9d38d03700 10 filestore(/var/lib/ceph/osd/ceph-4) _close_replay_guard 18301.0.1 done

but there is no useful debugging to see why the object didn't get moved over. :/

there is another pg splitting in another thread (3.1es1), but i don't see how it could interfere here.

Actions #23

Updated by Sage Weil almost 7 years ago

I don't see any recent changes in jewel since 10.2.7, or anything between there and master that looks suspicious. I don't have the exactly commit that was tested, though (ca7ab74ae7884f24983d94b729cc262108ff6aba is not in ceph-ci.git).

Actions #24

Updated by Nathan Cutler almost 7 years ago

@Sage Weil: Pushed ca7ab74ae7884f24983d94b729cc262108ff6aba to ceph-ci as "wip-13381"

Actions #26

Updated by Sage Weil almost 7 years ago

  • Priority changed from Urgent to High
Actions #27

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF