Project

General

Profile

Bug #17916

osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)

Added by Samuel Just 7 months ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
jewel
Needs Doc:
No

Related issues

Copied to Ceph - Backport #19701: jewel: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) Resolved
Copied to Ceph - Backport #19702: kraken: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) Testing

History

#1 Updated by Nick Fisk 7 months ago

Logs from wip-17916

http://app.sys-pro.co.uk/ceph-osd.49.log.bz2

Relevant section below

-16> 2016-11-17 12:16:27.818387 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e778f3:::rbd_data.5009b238e1f29.00000000000e5f7c:head# '_' = 274
-15> 2016-11-17 12:16:27.818393 7fc61ec388c0 15 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e459cf:::rbd_data.99560238e1f29.0000000000113263:head# '_'
-14> 2016-11-17 12:16:27.818409 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e459cf:::rbd_data.99560238e1f29.0000000000113263:head# '_' = 274
-13> 2016-11-17 12:16:27.818414 7fc61ec388c0 15 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96ef35e3:::rbd_data.5009b238e1f29.00000000000f496b:head# '_'
-12> 2016-11-17 12:16:27.818426 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96ef35e3:::rbd_data.5009b238e1f29.00000000000f496b:head# '_' = 274
-11> 2016-11-17 12:16:27.818431 7fc61ec388c0 15 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head# '_'
-10> 2016-11-17 12:16:27.818443 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) error opening file /var/lib/ceph/osd/ceph-49/current/1.769_head/DIR_9/DIR_6/DIR_7/DIR_A/r
bd\udata.6fd18238e1f29.00000000002555c5__head_6B58A769__1 with flags=2: (2) No such file or directory
-9> 2016-11-17 12:16:27.818448 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head# '_' = -2
-8> 2016-11-17 12:16:27.818454 7fc61ec388c0 15 read_log missing 1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head
-7> 2016-11-17 12:16:27.818458 7fc61ec388c0 15 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c# '_'
-6> 2016-11-17 12:16:27.818469 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) error opening file /var/lib/ceph/osd/ceph-49/current/1.769_head/DIR_9/DIR_6/DIR_7/DIR_A/rbd\udata.6fd18238e1f29.00000000002555c5__6c_6B58A769__1 with flags=2: (2) No such file or directory
-5> 2016-11-17 12:16:27.818472 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c# '_' = -2
-4> 2016-11-17 12:16:27.818474 7fc61ec388c0 15 read_log missing 1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c
-3> 2016-11-17 12:16:27.818476 7fc61ec388c0 15 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96ead8c1:::rbd_data.4759a238e1f29.00000000000613de:head# '_'
-2> 2016-11-17 12:16:27.818491 7fc61ec388c0 10 filestore(/var/lib/ceph/osd/ceph-49) getattr 1.769_head/#1:96ead8c1:::rbd_data.4759a238e1f29.00000000000613de:head# '_' = 274
-1> 2016-11-17 12:16:27.818494 7fc61ec388c0 0 read_log: divergent prior version mismatch, oi is: 1:96ead8c1:::rbd_data.4759a238e1f29.00000000000613de:head(1500163'246032 client.2829311.1:12604882 dirty|omap_digest s 4194304 uv 246032 od ffffffff) and divergent prior entry is: 910666'73661,1:96ead8c1:::rbd_data.4759a238e1f29.00000000000613de:head
0> 2016-11-17 12:16:27.819922 7fc61ec388c0 -1 osd/PGLog.cc: In function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)' thread 7fc61ec388c0 time 2016-11-17 12:16:27.818503
osd/PGLog.cc: 1053: FAILED assert(oi.version == i->first)
ceph version 10.2.3-1-g69c3e47 (69c3e479474b254c8045e10a4121e307c2e0cc8c)

#2 Updated by Samuel Just 7 months ago

Hmm. The divergent prior entry seems to be bogus (and super old -- 170k/240k updates to the pg in the past) -- so we're lookin for a case in PGLog where we should have cleared it but didn't. Note: this has been rewritten in master since jewel, no such thing as divergent priors anymore. Can you give me the full upgrade history of this osd? When was it last restarted (it would have crashed on any restart between now and when this fault was originally introduced 170k updates ago).

#3 Updated by Nick Fisk 7 months ago

The snapshot removal would have deleted about 750k objects in the period of a few minutes, not sure if that would skew what you would think would be a reasonable oldness of the pglog? I think I identified that object deletion with the 4.7 kernel seems to make large numbers of OSD's flap compared to nodes which were running 4.4 kernel which experienced no flapping, after rolling back to 4.4 haven't had much flapping on any hosts.

Brief history of the OSD as well as I can recall

~August - OSD installed with 10.2.2
About 2 weeks ago - OSD upgraded to 10.2.3 and restarted
13/11 6:47 - FStrim runs on servers with mounted RBD's deleting around 1M objects. Several OSD's flap, but everything seems to recover alright before 13:00.
13/11 ~13:00 - A backup job removes large snapshot of around 750K objects. Lots of OSD's start flapping, 2 OSD's won't restart with above error

I have the logs from the OSD showing the last few thousand lines of history before the assert, if they would be useful, although only at standard debugging.

#4 Updated by Nick Fisk 7 months ago

Actually checking through the logs the OSD last restarted on the 9/11, with this error

2016-11-09 17:47:15.835991 7f0c8953c700 -1 os/filestore/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7f0c8953c700 time 2016-11-09 1
7:47:15.834729
os/filestore/FileJournal.cc: 1541: FAILED assert(0 == "unexpected aio error")

Which seems to have been caused by os-prober in Ubuntu re-scanning partitions. This is something I need to look into to stop it happening again.
https://bugs.launchpad.net/charms/+source/ceph/+bug/1384062
http://tracker.ceph.com/issues/9860

Not sure if that would have been enough to have caused this problem, as the OSD successfully restarted after this.

#5 Updated by Samuel Just 7 months ago

Hmm, that might be able to cause inconsistently applied transactions (though I don't think so...)

#6 Updated by Pawel Sadowski 7 months ago

Hi, since a few days we have similar behaviour in Hammer (0.94.6) -- I don't see
above "unexpected io error" anywhere in our logs.

I uploaded some logs with debug turned on:
ceph-post-file: de236475-8ba5-4f7a-bdf4-b9a56604b304

#7 Updated by Bartosz Rabiega 6 months ago

Hi,

I've been trying to investigate the case from Pawel Sadowski. Using the same OSD data OSD crashes during start with:
- Hammer 0.94.5
- Jewel 10.2.5
- Kraken 11.1.1

Trace from Kraken
2017-01-02 10:00:37.467614 7ffff7fe4980 -1 osd.180 1358606 PGs are upgrading
/build/ceph-11.1.1/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, std::ostringstream&, bool*, const DoutPrefixProvider*, std::set<std::basic_string<char> >*, bool) [with missing_type = pg_missing_set<true>; std::ostringstream = std::basic_ostringstream<char>]' thread 7ffff7fe4980 time 2017-01-02 10:00:46.042992
/build/ceph-11.1.1/src/osd/PGLog.h: 1291: FAILED assert(oi.version == i->first)
 ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x555556006adb]
 2: (void PGLog::read_log_and_missing<pg_missing_set<true> >(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, pg_missing_set<true>&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool*, DoutPrefixProvider const*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*, bool)+0x1ad3) [0x555555a99b53]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x4c4) [0x555555a56124]
 4: (OSD::load_pgs()+0x9b4) [0x5555559ab2e4]
 5: (OSD::init()+0x2026) [0x5555559b9e76]
 6: (main()+0x2a7e) [0x5555558eed9e]
 7: (__libc_start_main()+0xf5) [0x7ffff5722ec5]
 8: (()+0x415496) [0x555555969496]

Some values from GDB
(gdb) p divergent_priors
$9 = std::map with 2 elements = {
  [{
    version = 98602379, 
    epoch = 710996, 
    __pad = 0
  }] = {
    oid = {
      name = "rbd_header.42880159238e1f29" 
    }, 
    snap = {
      val = 18446744073709551614
    }, 
    hash = 2816812651, 
    max = false, 
    nibblewise_key_cache = 3059900026, 
    hash_reverse_bits = 3596920805, 
    static POOL_META = -1, 
    static POOL_TEMP_START = -2, 
    pool = 3, 
    nspace = "", 
    key = "" 
  },
  [{
    version = 98828514, 
    epoch = 716064, 
    __pad = 0
  }] = {
    oid = {
      name = "rbd_header.3d82dc56238e1f29" 
    }, 
    snap = {
      val = 18446744073709551614
    }, 
    hash = 1281861227, 
    max = false, 
    nibblewise_key_cache = 3060430532, 
    hash_reverse_bits = 3597002290, 
    static POOL_META = -1, 
    static POOL_TEMP_START = -2, 
    pool = 3, 
    nspace = "", 
    key = "" 
  }
}
(gdb) p info.last_complete
$11 = {
  version = 0, 
  epoch = 0, 
  __pad = 0
}
(gdb) p r
$7 = 0
(gdb) p i
$8 = {
  <std::iterator<std::bidirectional_iterator_tag, std::pair<eversion_t const, hobject_t>, long, std::pair<eversion_t const, hobject_t>*, std::pair<eversion_t const, hobject_t>&>> = {<No data fields>}, 
  members of std::reverse_iterator<std::_Rb_tree_iterator<std::pair<eversion_t const, hobject_t> > >: 
  current = {
    first = {
      version = 98828514, 
      epoch = 716064, 
      __pad = 0
    }, 
    second = {
      oid = {
        name = "rbd_header.3d82dc56238e1f29" 
      }, 
      snap = {
        val = 18446744073709551614
      }, 
      hash = 1281861227, 
      max = false, 
      nibblewise_key_cache = 3060430532, 
      hash_reverse_bits = 3597002290, 
      static POOL_META = -1, 
      static POOL_TEMP_START = -2, 
      pool = 3, 
      nspace = "", 
      key = "" 
    }
  }
}
(gdb) p oi
$13 = {
  soid = {
    oid = {
      name = "rbd_header.42880159238e1f29" 
    }, 
    snap = {
      val = 18446744073709551614
    }, 
    hash = 2816812651, 
    max = false, 
    nibblewise_key_cache = 3059900026, 
    hash_reverse_bits = 3596920805, 
    static POOL_META = -1, 
    static POOL_TEMP_START = -2, 
    pool = 3, 
    nspace = "", 
    key = "" 
  }, 
  version = {
    version = 121761395, 
    epoch = 1347447, 
    __pad = 0
  }, 
  prior_version = {
    version = 121761383, 
    epoch = 1347413, 
    __pad = 0
  }, 
  user_version = 96520706, 
  last_reqid = {
    name = {
      _type = 8 '\b', 
      _num = 1310719628, 
      static TYPE_MON = 1, 
      static TYPE_MDS = 2, 
      static TYPE_OSD = 4, 
      static TYPE_CLIENT = 8, 
      static TYPE_MGR = 16, 
      static NEW = -1
    }, 
    tid = 846894, 
    inc = 0
  }, 
  size = 0, 
  mtime = {
    tv = {
      tv_sec = 1480456896, 
      tv_nsec = 567129969
    }
  }, 
  local_mtime = {
    tv = {
      tv_sec = 0, 
      tv_nsec = 0
    }
  }, 
  flags = (object_info_t::FLAG_DIRTY | object_info_t::FLAG_OMAP | object_info_t::FLAG_DATA_DIGEST | object_info_t::FLAG_OMAP_DIGEST), 
  snaps = std::vector of length 0, capacity 0, 
  truncate_seq = 0, 
  truncate_size = 0, 
  watchers = std::map with 1 element = {
    [{
      first = 140116940621936, 
      second = {
        _type = 8 '\b', 
        _num = 1310719628, 
        static TYPE_MON = 1, 
        static TYPE_MDS = 2, 
        static TYPE_OSD = 4, 
        static TYPE_CLIENT = 8, 
        static TYPE_MGR = 16, 
        static NEW = -1
      }
    }] = {
      cookie = 140116940621936, 
      timeout_seconds = 30, 
      addr = {
        static TYPE_DEFAULT = entity_addr_t::TYPE_LEGACY, 
        type = 1, 
        nonce = 3800260800, 
        u = {
          sa = {
            sa_family = 2, 
            sa_data = "\000\000\nc(A\000\000\000\000\000\000\000" 
          }, 
          sin = {
            sin_family = 2, 
            sin_port = 0, 
            sin_addr = {
              s_addr = 1093165834
            }, 
            sin_zero = "\000\000\000\000\000\000\000" 
          }, 
          sin6 = {
            sin6_family = 2, 
            sin6_port = 0, 
            sin6_flowinfo = 1093165834, 
            sin6_addr = {
              __in6_u = {
                __u6_addr8 = '\000' <repeats 15 times>, 
                __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, 
                __u6_addr32 = {0, 0, 0, 0}
              }
            }, 
            sin6_scope_id = 0
          }
        }
      }
    }
  }, 
  data_digest = 4294967295, 
  omap_digest = 1044645001, 
  expected_object_size = 0, 
  expected_write_size = 0, 
  alloc_hint_flags = 0
}

#8 Updated by Bartosz Rabiega 6 months ago

Again this time epochs of objects from divergent priors are quite old (716064 comparing to 1347447) so it looks like the log wasn't cleared and OSD wasn't restarted for a long time. Right? Is there any better way to handle such cases?

#9 Updated by Bartosz Rabiega 5 months ago

Managed to find workaround:

1. Use strace to find which PG was accessed right before the assertion
2. Move/Remove the PG out of the broken OSD data
3. Start OSD - it should launch :)

#10 Updated by Bartosz Rabiega 5 months ago

Actaully in my case it was enough to delete single faulty object found by strace

Output from strace -e trace=file ceph-osd -i 180 -f

open("/var/lib/ceph/osd/ceph-180/current/3.266b_head/DIR_B/DIR_6/DIR_6/DIR_2/DIR_5/rbd\\uheader.42880159238e1f29__head_A7E5266B__3", O_RDWR) = 2374
osd/PGLog.cc: In function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, std::set<std::basic_string<char> >*)' thread 7f303db29900 time 2017-01-31 08:08:08.201051
osd/PGLog.cc: 979: FAILED assert(oi.version == i->first)

So I removed the file "/var/lib/ceph/osd/ceph-180/current/3.266b_head/DIR_B/DIR_6/DIR_6/DIR_2/DIR_5/rbd\\uheader.42880159238e1f29__head_A7E5266B__3" and OSD managed to start. What is more - the file is empty.

#11 Updated by Samuel Just 4 months ago

  • Status changed from New to Can't reproduce

#12 Updated by Greg Farnum 3 months ago

  • Status changed from Can't reproduce to In Progress
  • Assignee changed from Samuel Just to Greg Farnum

So, it sure looks to me like the only way the divergent_priors omap set gets cleared is if you call PGLog::reset_backfill(), and that's only invoked if you're a stray PG.

So...the primary never gets its divergent_priors entries cleared? Unless it's guaranteed to transit through the Stray state, but I don't think it is.

This makes some sense: it's unlikely for a primary to have divergent entries. I think it could still happen if
  • primary generates an entry and crashes/partitions without propagating it to the replicas
  • other OSDs generate a very few updates,
  • primary turns on and monitor marks it up
  • there have not been enough updates for it to do a pg_temp
  • it does log-based recovery

When doing a big snapshot delete is also a sensible time to detect this failure: the primary times out and crashes even after running smoothly before. Then on startup, it finds that there are divergent priors, and unlike any previous times, the objects don't exist in the pglog because the pglog is filled up with clone deletions rather than object data updates.

#13 Updated by Greg Farnum 3 months ago

...although in both the GDB dump above and the log I'm looking at, the crashing PG also has a last_complete of 0'0, which might refute my previous guess. Not sure yet.

#14 Updated by Josh Durgin 3 months ago

Yes, it looks like only reset_backfill() sets dirty_divergent_priors = true, so that seems like a reasonable hypothesis to me.

The last_complete being 0'0 sounds like it may be a separate issue possibly another effect of large scale snap trimming - the update of info.last_complete is skipped in PG::add_log_entry() if info.last_update != info.last_complete, so if one osd stayed primary and those got out of sync last_complete would stay 0'0.

#15 Updated by Greg Farnum 2 months ago

  • Status changed from In Progress to Need Review

#16 Updated by Nathan Cutler 2 months ago

  • Tracker changed from Bug to Backport
  • Description updated (diff)
  • Release jewel added

description

From mailing list thread: '[ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)'

#17 Updated by Greg Farnum 2 months ago

  • Tracker changed from Backport to Bug
  • Status changed from Need Review to Pending Backport
  • % Done set to 0
  • Regression set to No
  • Severity set to 3 - minor
  • Needs Doc set to No

If you've experienced this crash, and are sure you don't really have divergent objects, you can now set the osd_ignore_stale_divergent_priors config to true and it will bypass the crash (and clean them up as the OSD runs).
The code is merged to Jewel, though not yet released. Will identify and forward port the few relevant bits to Kraken and master soon.

@Nathan, despite going into the Jewel branch, this is definitely its own bug and not a backport of something else. ;)

#18 Updated by Nathan Cutler 2 months ago

@Nathan, despite going into the Jewel branch, this is definitely its own bug and not a backport of something else. ;)

Having backport tracker issues for everything merged to a stable branch (including things that are technically not backports) and setting the "Target version" field after the corresponding PR is merged enables us to easily and reliably get Redmine to produce a list of issues resolved in a given stable release.

Having being moved out of the backport tracker, this will no longer show up in the standard Redmine search for fixes included in 10.2.8. (It will show up in the release notes, of course.)

#19 Updated by Greg Farnum 2 months ago

  • Copied to Backport #19701: jewel: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) added

#20 Updated by Greg Farnum 2 months ago

  • Copied to Backport #19702: kraken: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) added

#21 Updated by Greg Farnum 2 months ago

  • Subject changed from jewel: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first) to osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
  • Status changed from Pending Backport to Resolved

#22 Updated by Nathan Cutler 2 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel, kraken

Also available in: Atom PDF