Activity
From 07/05/2018 to 08/03/2018
08/03/2018
- 11:45 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
We need to wait to turn off data_digest once all OSDs are running bluestore AND we must disallow a filestore OSD to...- 10:42 PM Bug #23492 (Resolved): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-e...
- 10:42 PM Backport #24864 (Resolved): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code...
- 03:11 PM Backport #24864: luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasu...
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23025
merged - 10:40 PM Feature #24949 (Resolved): luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
- 10:39 PM Backport #25128 (Resolved): mimic: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
- 10:39 PM Backport #26841 (Closed): mimic: luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_...
- 04:02 PM Backport #26841 (Closed): mimic: luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_...
- 10:35 PM Backport #25126 (Resolved): mimic: Allow repair of an object with a bad data_digest in object_inf...
- 10:35 PM Feature #25085 (Resolved): Allow repair of an object with a bad data_digest in object_info on all...
- 10:18 PM Bug #24875 (In Progress): OSD: still returning EIO instead of recovering objects on checksum errors
- 05:59 PM Backport #24888: luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_me...
- Radek, can you take a look at backporting this?
- 04:02 PM Backport #26840 (Resolved): luminous: librados application's symbol could conflict with the libce...
- https://github.com/ceph/ceph/pull/23483
- 04:02 PM Backport #26839 (Resolved): mimic: librados application's symbol could conflict with the libceph-...
- https://github.com/ceph/ceph/pull/24708
- 03:24 PM Backport #23772: luminous: ceph status shows wrong number of objects
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22680
merged - 03:22 PM Backport #24471: luminous: Ceph-osd crash when activate SPDK
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22686
merged - 03:15 PM Backport #24772: luminous: osd: may get empty info at recovery
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22862
merged - 12:57 PM Bug #25154 (Pending Backport): librados application's symbol could conflict with the libceph-common
- 08:10 AM Bug #24835: osd daemon spontaneous segfault
- The problem still persists with Mimic 13.2.1 (on the same cluster as above). Errors in ceph::buffer::list appear to h...
- 12:09 AM Backport #25199 (In Progress): luminous: FAILED assert(trim_to <= info.last_complete) in PGLog::t...
- 12:08 AM Backport #25219 (In Progress): luminous: osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- 12:07 AM Backport #25200 (In Progress): mimic: FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- 12:07 AM Backport #25220 (In Progress): mimic: osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- 12:07 AM Backport #24989 (In Progress): mimic: Limit pg log length during recovery/backfill so that we don...
- 12:06 AM Bug #23352: osd: segfaults under normal operation
- I've created a test package here based on 12.2.7 and including the one line patch above.
https://shaman.ceph.com/r...
08/02/2018
- 11:57 PM Bug #23352 (In Progress): osd: segfaults under normal operation
- https://github.com/ceph/ceph/pull/23404
- 08:56 PM Bug #23352: osd: segfaults under normal operation
- Brad - you can just use kjetil@medallia.com
- 08:26 AM Bug #23352: osd: segfaults under normal operation
- Thanks Kjetil,
I think you are right, we should hold the lock in update_osd_health(). Not sure how we all missed tha... - 08:24 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- /ceph/teuthology-archive/pdonnell-2018-08-02_13:06:29-multimds-wip-pdonnell-testing-20180802.044402-testing-basic-smi...
- 08:21 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- Run with cores/logs: /ceph/teuthology-archive/pdonnell-2018-08-02_13:06:29-multimds-wip-pdonnell-testing-20180802.044...
- 03:57 PM Bug #25182: Upmaps forgotten after restarting OSDs
- Hmm, I wasn't able to reproduce this...
- 03:30 PM Bug #25182: Upmaps forgotten after restarting OSDs
- It is expected that the upmaps may evaporate if the "raw" CRUSH mapping changes. This shouldn't happen for osd up/do...
- 03:59 AM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- *master PR*: https://github.com/ceph/ceph/pull/23377
- 03:57 AM Backport #25227 (In Progress): luminous: OSD: still returning EIO instead of recovering objects o...
- 03:56 AM Backport #25226 (In Progress): mimic: OSD: still returning EIO instead of recovering objects on c...
08/01/2018
- 11:37 PM Backport #25227 (Resolved): luminous: OSD: still returning EIO instead of recovering objects on c...
- https://github.com/ceph/ceph/pull/23379
- 11:32 PM Backport #25226 (Resolved): mimic: OSD: still returning EIO instead of recovering objects on chec...
- https://github.com/ceph/ceph/pull/23378
- 11:26 PM Bug #25211 (Fix Under Review): bug in PerfCounters
- 12:23 PM Bug #25211: bug in PerfCounters
- https://github.com/ceph/ceph/pull/23362
- 12:16 PM Bug #25211 (Resolved): bug in PerfCounters
- when we call PerfCounters::inc() and read_avg() at the same time, maybe the result is not what we want.
show the c... - 10:58 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- 10:18 PM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
- another option would be to only partially revert, and keep just the bits that ignore the older deleted log files.
- 02:03 PM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
- an alternative option is to whip up a tool to rebuild the manifest to remove the dummy File4 with kDeletedLogNumberHa...
- 12:45 PM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
- it's a regression in rocksdb. the rocksdb in mimic (eaee6d3beab3429232ceb188377a3f94e844fca7) is f4a857da0b720691effc...
- 06:28 AM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
- i create a vstart.sh cluster using mimic branch, and ceph-monstore-tool from master is able to open it just fine.
... - 09:54 PM Feature #24949: luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
- mimic "backport" is actually a forward port from luminous
- 05:37 PM Feature #24949 (Pending Backport): luminous: Allow scrub to fix Luminous 12.2.6 corruption of dat...
- 09:50 PM Backport #25220 (Resolved): mimic: osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- https://github.com/ceph/ceph/pull/23403
- 09:50 PM Backport #25219 (Resolved): luminous: osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- https://github.com/ceph/ceph/pull/23211
- 09:47 PM Bug #24484 (Resolved): osdc: wrong offset in BufferHead
- 09:47 PM Backport #24584 (Resolved): luminous: osdc: wrong offset in BufferHead
- 03:37 PM Backport #24584: luminous: osdc: wrong offset in BufferHead
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22865
merged - 05:56 PM Bug #23352: osd: segfaults under normal operation
- For MgrClient::update_osd_health, does the move-assignment compile into updating a pointer to a std::vector, or does ...
- 05:41 AM Bug #23352: osd: segfaults under normal operation
- Just adding another "me too" on this. I've hit this on Luminous 12.2.7 also under Ubuntu 16.04.4 with 4.15.0-24-gener...
- 02:08 AM Bug #23352: osd: segfaults under normal operation
- ...
- 05:43 PM Bug #25108: object errors found in be_select_auth_object() aren't logged the same
- Kefu:
my concern is that, we don't reset object_error before moving to another ScrubMap. so once we identify an erro... - 05:41 PM Bug #25108 (In Progress): object errors found in be_select_auth_object() aren't logged the same
- 05:38 PM Feature #25085 (Pending Backport): Allow repair of an object with a bad data_digest in object_inf...
- 05:38 PM Backport #25127 (Resolved): luminous: Allow repair of an object with a bad data_digest in object_...
- 03:44 PM Bug #25184 (Pending Backport): osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- 02:07 PM Bug #25209 (Fix Under Review): cls/test_cls_numops.sh aborts
- -https://github.com/ceph/ceph/pull/23364-
i think https://github.com/ceph/ceph/pull/23432 is a better fix. - 05:46 AM Bug #25209: cls/test_cls_numops.sh aborts
- i think we should revert https://github.com/ceph/ceph/pull/22990
- 05:44 AM Bug #25209: cls/test_cls_numops.sh aborts
- ...
- 05:28 AM Bug #25209 (Resolved): cls/test_cls_numops.sh aborts
- ...
- 01:06 PM Bug #25181 (Duplicate): /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty())
- 01:06 PM Bug #24612: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()
- /a/sage-2018-07-31_21:57:28-rados-wip-sage-testing-2018-07-31-1436-distro-basic-smithi/2844443
/a/sage-2018-07-30_13...
07/31/2018
- 10:53 PM Bug #25174: osd: assert failure with FAILED assert(repop_queue.front() == repop) In function 'vo...
- Do we have logs for this failure somewhere?
- 10:47 PM Backport #25199: luminous: FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- This is dependent on a couple of other backports. Assigning it to myself.
- 10:45 PM Backport #25199 (Resolved): luminous: FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- https://github.com/ceph/ceph/pull/23211
- 10:45 PM Backport #24068 (Resolved): luminous: osd sends op_reply out of order
- 07:48 PM Backport #24068: luminous: osd sends op_reply out of order
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23137
merged - 10:45 PM Backport #25204 (Resolved): mimic: rados python bindings use prval from stack
- https://github.com/ceph/ceph/pull/23863
- 10:45 PM Backport #25203 (Resolved): luminous: rados python bindings use prval from stack
- https://github.com/ceph/ceph/pull/23864
- 10:45 PM Backport #25200 (Resolved): mimic: FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- https://github.com/ceph/ceph/pull/23403
- 09:24 PM Bug #25198 (Pending Backport): FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- 06:27 PM Bug #25198 (Fix Under Review): FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- https://github.com/ceph/ceph/pull/23354
- 05:48 PM Bug #25198 (Resolved): FAILED assert(trim_to <= info.last_complete) in PGLog::trim()
- ...
- 08:02 PM Bug #23352: osd: segfaults under normal operation
- Latest crash just happened here, no messages not in the OSD log, but crash dump is generated and dmesg shows:
[Tue... - 07:25 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/sage-2018-07-31_14:52:20-rados:thrash-wip-sage2-testing-2018-07-30-1049-distro-basic-smithi/2843268
- 07:08 PM Bug #25175 (Pending Backport): rados python bindings use prval from stack
- https://github.com/ceph/ceph/pull/23334
- 03:03 PM Bug #25194 (Can't reproduce): Negative stats found by deep-scrub
http://pulpito.ceph.com/dzafman-2018-07-30_12:09:07-rados-wip-zafman-testing-distro-basic-smithi/2839428
log_cha...- 02:52 PM Feature #21710: add wildcard for namespaces
- Not at all.
- 12:43 AM Feature #21710: add wildcard for namespaces
- Hi Douglas, I started in on this and forgot to reassign the ticket! Mind if I take this one?
- 03:57 AM Tasks #25186 (In Progress): setup repo for building dependencies like boost, rocksdb, which are n...
- we need to build boost, spdk, dpdk, fio, rocksdb, gperftools, seastar for preparing the build dependencies for each P...
- 12:08 AM Bug #25184 (Fix Under Review): osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
07/30/2018
- 11:50 PM Bug #25184 (Resolved): osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
- https://github.com/ceph/ceph/pull/23340
- 09:36 PM Bug #25182 (Resolved): Upmaps forgotten after restarting OSDs
- Problem:
I have a small cluster at home and I noticed that during the upgrade from 12.2.5 -> 12.2.7 and the upgrade ... - 08:47 PM Backport #25178 (In Progress): mimic: rados: not all exceptions accept keyargs
- 07:23 PM Backport #25178 (Resolved): mimic: rados: not all exceptions accept keyargs
- https://github.com/ceph/ceph/pull/23335
- 08:17 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/sage-2018-07-30_13:46:50-rados-wip-sage3-testing-2018-07-28-1512-distro-basic-smithi/2838971
- 08:16 PM Bug #25181 (Duplicate): /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty())
- ...
- 07:26 PM Bug #25112: osd,mon: increase mon_max_pg_per_osd to 250
- Please note that the value has been changed from 300->250 for this tracker. The PR reflects the correct value.
- 02:23 PM Bug #25112 (Pending Backport): osd,mon: increase mon_max_pg_per_osd to 250
- 07:23 PM Backport #25177 (Resolved): luminous: osd,mon: increase mon_max_pg_per_osd to 300
- https://github.com/ceph/ceph/pull/23862
- 07:23 PM Backport #25176 (Resolved): mimic: osd,mon: increase mon_max_pg_per_osd to 300
- https://github.com/ceph/ceph/pull/23861
- 07:15 PM Bug #25175 (Resolved): rados python bindings use prval from stack
- these methods include
- omap_get_vals
- omap_get_keys
- omap-get-vals-by-keys - 06:53 PM Bug #24686 (Resolved): change default filestore_merge_threshold to -10
- 06:53 PM Backport #24748 (Resolved): luminous: change default filestore_merge_threshold to -10
- 04:45 PM Backport #24748: luminous: change default filestore_merge_threshold to -10
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22814
merged - 06:51 PM Backport #24083 (Resolved): luminous: rados: not all exceptions accept keyargs
- 04:43 PM Backport #24083: luminous: rados: not all exceptions accept keyargs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22979
merged - 06:35 PM Bug #25174 (Can't reproduce): osd: assert failure with FAILED assert(repop_queue.front() == repop...
- branch: luminous
description: rados:downstream:singleton/{all/ec-lost-unfound.yaml msgr-failures/many.yaml
... - 05:07 PM Bug #25153 (Fix Under Review): output format is invalid of the crush tree json dumper
- 11:56 AM Bug #25153: output format is invalid of the crush tree json dumper
- Reference the pull request: https://github.com/ceph/ceph/pull/23319
- 11:50 AM Bug #25153 (Resolved): output format is invalid of the crush tree json dumper
- The output json string is invalid for "ceph osd crush tree --format=json" command. It contains an array of "nodes" an...
- 01:42 PM Bug #25155 (Can't reproduce): mon crash from 'ceph osd erasure-code-profile set lrcprofile name=l...
- ...
- 12:15 PM Bug #25154: librados application's symbol could conflict with the libceph-common
- https://github.com/ceph/ceph/pull/23320
- 12:15 PM Bug #25154 (Resolved): librados application's symbol could conflict with the libceph-common
- quoting from Zongyou Yao's mail from ceph-devel ML
> Internally, we have a program using librados C++ api to perio... - 07:20 AM Bug #24785 (Resolved): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- 07:20 AM Backport #25143 (Resolved): luminous: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd ...
- Merged.
- 07:19 AM Backport #25142 (Resolved): mimic: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev...
- Merged.
07/28/2018
- 07:55 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326
- 02:46 PM Bug #25146 (Resolved): "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:paralle...
- This is on mew mimic-x suite https://github.com/ceph/ceph/pull/23292
Run: http://pulpito.ceph.com/yuriw-2018-07-27_2... - 09:12 AM Backport #25143 (In Progress): luminous: mimic selinux denials comm="tp_fstore_op / comm="ceph-o...
- 09:11 AM Backport #25143 (Resolved): luminous: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd ...
- https://github.com/ceph/ceph/pull/23296
- 09:11 AM Backport #25142 (In Progress): mimic: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd ...
- 09:10 AM Backport #25142 (Resolved): mimic: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev...
- https://github.com/ceph/ceph/pull/23295
- 09:11 AM Backport #25145 (Resolved): luminous: Automatically set expected_num_objects for new pools with >...
- https://github.com/ceph/ceph/pull/24395
- 09:11 AM Backport #25144 (Resolved): mimic: Automatically set expected_num_objects for new pools with >=10...
- https://github.com/ceph/ceph/pull/23860
07/27/2018
- 11:55 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Mimic back-port:
https://github.com/ceph/ceph/pull/23295
Luminous back-port:
https://github.com/ceph/ceph/pu... - 10:18 PM Bug #24785 (Pending Backport): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-...
- 07:01 AM Bug #24785 (Fix Under Review): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-...
- 07:00 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- The manual testing suggests this should fix this issue:
https://github.com/ceph/ceph/pull/23278 - 03:03 AM Bug #23352: osd: segfaults under normal operation
- Dan van der Ster wrote:
> Can we see that state from the coredump somehow? Basically none of our clusters should hav...
07/26/2018
- 07:35 PM Backport #23670 (Need More Info): luminous: auth: ceph auth add does not sanity-check caps
- non-trivial backport. One attempt was already made - https://github.com/ceph/ceph/pull/21361 - but it was implicated ...
- 07:31 PM Backport #23670 (Rejected): luminous: auth: ceph auth add does not sanity-check caps
- see discussion in https://github.com/ceph/ceph/pull/21361
- 07:17 PM Feature #24949: luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
- https://github.com/ceph/ceph/pull/23236
Includes backport from master of https://github.com/ceph/ceph/pull/23217 - 07:15 PM Backport #25128 (Resolved): mimic: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
- https://github.com/ceph/ceph/pull/23272
(includes backport of https://tracker.ceph.com/issues/25085 from master) - 07:12 PM Backport #25127 (Resolved): luminous: Allow repair of an object with a bad data_digest in object_...
- https://github.com/ceph/ceph/pull/23236
- 07:11 PM Backport #25126 (Resolved): mimic: Allow repair of an object with a bad data_digest in object_inf...
- https://github.com/ceph/ceph/pull/23272
- 05:47 PM Bug #24687 (Pending Backport): Automatically set expected_num_objects for new pools with >=100 PG...
- 05:17 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- Removed pgcalc message while pgcalc updates are considered
- 05:19 PM Cleanup #25124 (New): Add message to consult pgcalc for expected_num_objects
- Currently we warn the user when attempting to create a filestore pool that appears to be intended to store a large nu...
- 03:58 PM Bug #25106: Ceph-osd coredumps on launch
- Either the patch here: https://github.com/ceph/ceph/pull/22954
Doesn't fix the bug, or this is not a duplicate iss... - 03:58 AM Bug #25108: object errors found in be_select_auth_object() aren't logged the same
I ran a subtest of osd-scrub-repair based on pull request https://github.com/ceph/ceph/pull/23217. I also added a ...- 03:18 AM Bug #24664: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
- Need help with the luminous backport, which is needed to fix a failure in upgrade/luminous-x.
- 12:40 AM Bug #25112 (Fix Under Review): osd,mon: increase mon_max_pg_per_osd to 250
- https://github.com/ceph/ceph/pull/23251
- 12:19 AM Bug #25112 (Resolved): osd,mon: increase mon_max_pg_per_osd to 250
- Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1603615
- 12:21 AM Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse task
- It appears the crash can be explained on the same basis as in the case of "bug #24664":https://tracker.ceph.com/issue...
07/25/2018
- 10:41 PM Bug #25106 (Duplicate): Ceph-osd coredumps on launch
- this will be fixed in 13.2.1
- 06:10 PM Bug #25106 (Duplicate): Ceph-osd coredumps on launch
- See https://tracker.ceph.com/issues/24993
The problem:
ceph-volume lvm create --bluestore --data /dev/sda <- wor... - 10:39 PM Bug #24667: osd: SIGSEGV in MMgrReport::encode_payload
- downgrading due to lack of recurrence
- 07:59 PM Bug #25108 (Resolved): object errors found in be_select_auth_object() aren't logged the same
object errors found in be_select_auth_object() aren't logged the same as errors found in be_compare_scrub_objects()...- 01:59 PM Backport #25101 (In Progress): mimic: jewel->luminous: osdmap crc mismatch
- 01:58 PM Backport #25101 (Resolved): mimic: jewel->luminous: osdmap crc mismatch
- 01:57 PM Backport #25101 (Resolved): mimic: jewel->luminous: osdmap crc mismatch
- https://github.com/ceph/ceph/pull/23226
- 01:58 PM Backport #25100 (Resolved): luminous: jewel->luminous: osdmap crc mismatch
- 01:57 PM Backport #25100 (In Progress): luminous: jewel->luminous: osdmap crc mismatch
- 01:56 PM Backport #25100 (Resolved): luminous: jewel->luminous: osdmap crc mismatch
- https://github.com/ceph/ceph/pull/23227
- 12:27 PM Bug #25057: jewel->luminous: osdmap crc mismatch
- luminous: https://github.com/ceph/ceph/pull/23227
mimic: https://github.com/ceph/ceph/pull/23226 - 12:07 PM Bug #25057: jewel->luminous: osdmap crc mismatch
- The problem was that CRUSH_TUNABLES5 was associated with kraken instead of jewel in 0ceb5c0, backported to luminous ...
- 12:00 PM Bug #25057 (Pending Backport): jewel->luminous: osdmap crc mismatch
- https://github.com/ceph/ceph/pull/23220
- 09:01 AM Bug #23352: osd: segfaults under normal operation
- Dan van der Ster wrote:
> * The OSD health metric changes sure are a juicy candidate to be the root cause -- but we ... - 08:11 AM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> I was also thinking that, since the OSDHealthMetric related code only triggers when there are ... - 02:37 AM Bug #23352: osd: segfaults under normal operation
- Thanks Roberto,
Your core, as well as the last uploaded by Alex show the now familiar corruption to the vtable of ...
07/24/2018
- 08:54 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
- https://github.com/ceph/ceph/pull/23211
- 06:34 PM Backport #24988 (In Progress): luminous: Limit pg log length during recovery/backfill so that we ...
- 08:52 PM Feature #25085 (In Progress): Allow repair of an object with a bad data_digest in object_info on ...
- 08:51 PM Feature #25085: Allow repair of an object with a bad data_digest in object_info on all replicas
- https://github.com/ceph/ceph/pull/23217
- 08:46 PM Feature #25085 (Resolved): Allow repair of an object with a bad data_digest in object_info on all...
We've seen this due to a bug in Luminous 12.2.6, but it may have been seen in other cases.- 08:44 PM Bug #25084 (Resolved): Attempt to read object that can't be repaired loops forever
If all replicas are of an object are bad causes a loop of continuous recovery and calls to rep_repair_primary_objec...- 07:18 PM Bug #25057 (In Progress): jewel->luminous: osdmap crc mismatch
- 06:41 PM Bug #25057: jewel->luminous: osdmap crc mismatch
- /a/teuthology-2018-07-20_04:23:01-upgrade:jewel-x-luminous-distro-basic-smithi/2799173
is an instance where the mo... - 11:07 AM Bug #25076 (Duplicate): MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fu...
- Teuthology log: http://qa-proxy.ceph.com/teuthology/smithfarm-2018-07-24_02:10:24-upgrade:luminous-x-mimic-distro-bas...
- 09:27 AM Bug #23352: osd: segfaults under normal operation
- Hi Brad,
We spotted again this issue in one of our clusters, just 2 hours after we upgraded from 12.2.5 -> 12.2.7... - 09:10 AM Bug #23352: osd: segfaults under normal operation
- Thanks Alex,
I'll check out the core tomorrow and let you know.
I have been working on instrumenting the ceph-o... - 02:06 AM Documentation #4640 (Resolved): rados.8 should document import/export
- https://github.com/ceph/ceph/pull/23186
07/23/2018
- 09:52 PM Bug #24909: RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as an op)
- Jason Dillaman wrote:
> https://github.com/ceph/ceph/pull/23029
merged - 09:40 PM Bug #21496 (Fix Under Review): doc: Manually editing a CRUSH map, Word 'type' missing.
- https://github.com/ceph/ceph/pull/23192
- 07:37 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
- Jos Collin wrote:
> Remy, Please create a PR.
Done. - 05:16 PM Feature #1203: osd: priority or fairness osd operations
- https://github.com/ceph/dmclock
- 04:52 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
- Patrick Donnelly wrote:
> Please seek assistance for these kinds of issues on ceph-users mailing list.
Hi Patrick... - 04:46 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
- Please seek assistance for these kinds of issues on ceph-users mailing list.
- 02:27 PM Bug #23352: osd: segfaults under normal operation
- After the upgrade to 12.2.7 I am still seeing crashes on OSDs. Please check and advise if a separate tracker should b...
- 10:28 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Robert Sander wrote:
> On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do... - 09:54 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Brad Hubbard wrote:
> For the data_digest_mismatch_info error with client activity stopped, read the data from thi... - 09:18 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Oops, my mistake, terribly sorry. I gave you the procedure for an omap_digest_mismatch_info error.
For the data_di... - 08:10 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Brad Hubbard wrote:
> 1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-k... - 10:09 AM Bug #24835: osd daemon spontaneous segfault
- After spending a week trying to get Ubuntu/systemd to allow a core dump to be created, we finally have two different ...
07/22/2018
- 10:09 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- In the case of pg 2.34 above where the only error is "data_digest_mismatch_info" and all the data digests except the ...
- 12:55 PM Bug #25057 (Resolved): jewel->luminous: osdmap crc mismatch
- The upgrade/jewel-x runs for 12.2.6 and 12.2.7 threw osdmap crc mismatch errors.
07/21/2018
- 06:32 PM Backport #25055 (In Progress): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-st...
- 06:26 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- https://github.com/ceph/ceph/pull/23163
- 04:12 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- I have the same issue
- 12:08 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
- Remy, Please create a PR.
- 11:56 AM Bug #24923 (Pending Backport): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- https://github.com/ceph/ceph/pull/21520
07/20/2018
- 11:42 PM Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- 04:43 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Running here: http://pulpito.ceph.com/vasu-2018-07-20_16:43:09-ceph-deploy-mimic-distro-basic-ovh/
- 03:03 PM Bug #25017 (Duplicate): log [ERR] : 1.3 past_intervals [182,196) start interval does not contain ...
- 12:38 PM Bug #25017 (Duplicate): log [ERR] : 1.3 past_intervals [182,196) start interval does not contain ...
- ...
- 11:56 AM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
- This sounds familiar: http://tracker.ceph.com/issues/16211
We used the workaround to set and rm a dummy key/val and ... - 08:34 AM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
- Sorry that was just a single example to keep it short. listomapkeys doesn't return any data for any bucket in this cl...
- 07:31 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- ...
- 01:39 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Can you post the output of 'rados list-inconsistent-obj 2.53 --format=json-pretty' ?
- 02:14 AM Bug #25011 (New): competing scrubs stuck reserving local -> remote
- In this run: http://pulpito.ceph.com/yuriw-2018-07-18_20:14:43-rados-mimic-distro-basic-smithi/2794751/
osd.0 and ... - 01:06 AM Backport #24068 (In Progress): luminous: osd sends op_reply out of order
- 01:06 AM Backport #25010 (In Progress): mimic: osd sends op_reply out of order
- 12:59 AM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
- https://github.com/ceph/ceph/pull/23136
07/19/2018
- 09:59 PM Bug #23827: osd sends op_reply out of order
- http://pulpito.ceph.com/yuriw-2018-07-18_21:37:13-powercycle-mimic-distro-basic-smithi/2796128/ indicates that this n...
- 12:57 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- I have now added "osd skip data digest = true" as per release notes and restarted all OSDs.
I still have inconsist... - 08:36 AM Bug #24994 (New): active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
- Hi,
a deep scrub revealed 59 active+clean+inconsistent PGs at one customer's cluster and 50 active+clean+inconsist... - 06:11 AM Backport #24989 (Need More Info): mimic: Limit pg log length during recovery/backfill so that we ...
- 06:10 AM Backport #24988 (Need More Info): luminous: Limit pg log length during recovery/backfill so that ...
- 06:10 AM Backport #24992 (Resolved): mimic: valgrind-leaks.yaml: expected valgrind issues and found none
- https://github.com/ceph/ceph/pull/23744
07/18/2018
- 09:42 PM Backport #24989: mimic: Limit pg log length during recovery/backfill so that we don't run out of ...
- We can hold off on this backport for now. Need to let this bake in master for a while.
- 08:00 PM Backport #24989 (Resolved): mimic: Limit pg log length during recovery/backfill so that we don't ...
- https://github.com/ceph/ceph/pull/23403
- 09:42 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
- We can hold off on this backport for now. Need to let this bake in master for a while.
Also, this backport is going ... - 08:00 PM Backport #24988 (Resolved): luminous: Limit pg log length during recovery/backfill so that we don...
- https://github.com/ceph/ceph/pull/23211
- 09:38 PM Bug #24975 (Pending Backport): valgrind-leaks.yaml: expected valgrind issues and found none
- This issue has been fixed in master by https://github.com/ceph/ceph/pull/22261
Needs to be backported to mimic. - 09:14 PM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
- This appears to be another instance of #23352.
- 09:12 PM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
- Did you check that this bucket actually has any entries? These commands are tested in our suite.
- 08:46 PM Bug #24990 (Resolved): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- ...
- 06:10 PM Feature #23979 (Pending Backport): Limit pg log length during recovery/backfill so that we don't ...
- 04:15 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
- Alon Avrahami wrote:
> Hi,
>
>
> We have ceph cluster installed with Luminous 12.2.2 using bluestore.
> All no... - 01:24 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
- Hi,
We have ceph cluster installed with Luminous 12.2.2 using bluestore.
All nodes are Intel servers with 1.6TB... - 03:42 PM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
- 02:32 PM Backport #24472: mimic: Ceph-osd crash when activate SPDK
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22684
merged - 03:36 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
- 03:35 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
- 02:20 PM Backport #24865: mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-...
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23024
merged - 03:14 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- 02:24 PM Backport #24951: mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/23084
nerged - 02:22 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- https://github.com/ceph/ceph/pull/23096 merged
- 11:20 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
- https://github.com/ceph/ceph/pull/16777
07/17/2018
- 10:48 PM Bug #24975 (Resolved): valgrind-leaks.yaml: expected valgrind issues and found none
- ...
- 10:43 PM Bug #24974 (New): Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
- ...
- 08:32 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
- 08:10 PM Backport #24583: mimic: osdc: wrong offset in BufferHead
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22869
merged - 06:21 PM Feature #23979 (Fix Under Review): Limit pg log length during recovery/backfill so that we don't ...
- https://github.com/ceph/ceph/pull/23098
- 05:39 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- 01:37 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- 09:58 AM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
- ceph-osd parent process need to restart log service after fork, or ceph-osd will not work correctly when the option l...
07/16/2018
- 09:18 PM Bug #24950: Running osd_skip_data_digest in a mixed cluster is not ideal
- https://github.com/ceph/ceph/pull/23083
- 09:14 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
If osd_skip_data_digest in a mixed BlueStore/FileStore cluster is dangerous because we loose data_digest integrity ...- 09:17 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- https://github.com/ceph/ceph/pull/23084
- 09:08 PM Feature #24949 (Resolved): luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
I'm thinking that while osd_distrust_data_digest=true we should automatically ignore data_digest errors and repair ...- 07:36 PM Bug #23352: osd: segfaults under normal operation
- We actually got one on July 15: Jul 14 23:54:42 roc04r-sc3a080 kernel: [6988357.283555] safe_timer[19917]: segfault a...
- 03:54 AM Bug #23352: osd: segfaults under normal operation
- The latest core uploaded by Dan in comment 66 is slightly different to the others we've seen so far.
Once again th... - 02:24 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- https://github.com/ceph/ceph/pull/23072
- 02:24 PM Bug #24687 (Fix Under Review): Automatically set expected_num_objects for new pools with >=100 PG...
- Because a value for expected_num_objects is too difficult to determine automatically, instead we print a suggestion t...
- 11:16 AM Bug #24938 (New): luminous: rados listomapkeys & listomapvals don't return data.
- Hi,
rados listomapkeys & rados listomapvals don't return data when running Luminous, tested on 12.2.4 and 12.2.6:
... - 08:52 AM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
- My environment :
[root@gz-ceph-52-203 log]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-... - 12:57 AM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- Noting the same issue, per ceph-users list post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028...
07/15/2018
- 05:46 AM Documentation #24924 (Resolved): doc: typo in crush-map docs
- Each time the OSD starts, it verifies it is in the correct location in the CRUSH map and, if it is not, it moved its...
07/14/2018
- 09:04 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- Undersized
The placement group fewer copies than the configured pool replication level.
Missing "has"
- 07:57 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
- For the luminous regression, this will reproduce the issue:...
07/13/2018
- 11:02 PM Feature #24917 (New): Gracefully deal with upgrades when bluestore skipping of data_digest become...
Once the data_digest is no longer being used, but is still set from an earlier version, we can get EIO from read bu...- 09:26 PM Backport #24083 (In Progress): luminous: rados: not all exceptions accept keyargs
- PR: https://github.com/ceph/ceph/pull/22979
- 03:52 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
- 05:09 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Could cephfs trigger this issue? There have been two reports of cephfs_metadata pool crc errors on the users ML this ...
- 03:51 PM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
- 03:18 PM Backport #24891: mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22997
merged - 03:00 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- FTR, this crc issue is probably due to an incomplete backport to 12.2.6 of the skip_digest changes for bluestore:
... - 01:55 PM Bug #24909 (Fix Under Review): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
- https://github.com/ceph/ceph/pull/23029
- 01:47 PM Bug #24909 (In Progress): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints ...
- 01:47 PM Bug #24909 (Resolved): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as ...
- While running performance testing with Ceph metrics gathering statistics on the cluster, I noticed that while my RBD ...
- 12:58 PM Backport #24908 (In Progress): luminous: luminous->mimic: missing primary copy of xxx, wil try co...
- 12:57 PM Backport #24908 (Resolved): luminous: luminous->mimic: missing primary copy of xxx, wil try copie...
- https://github.com/ceph/ceph/pull/23028
- 12:26 PM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- 12:26 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
- original fix is fe5038c7f9577327f82913b4565712c53903ee48
luminosu backport https://github.com/ceph/ceph/pull/23028 - 12:06 PM Bug #23871 (Pending Backport): luminous->mimic: missing primary copy of xxx, wil try copies on 3,...
- 11:31 AM Backport #24888 (Need More Info): luminous: osd: crash in OpTracker::unregister_inflight_op via O...
- non-trivial backport. There are two conflicts. The first conflict can be resolved by cherry-picking 17a192ba5cdbe2129...
- 11:23 AM Backport #24889 (In Progress): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
- 11:22 AM Backport #24864 (In Progress): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-c...
- 11:20 AM Backport #24865 (In Progress): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code...
07/12/2018
- 11:56 PM Bug #24801 (In Progress): PG num_bytes becomes huge
- 07:38 PM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
- 07:38 PM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
- 04:36 PM Backport #24617: mimic: ValueError: too many values to unpack due to lack of subdir
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22888
merged - 02:05 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- Is this the relevant fix? https://github.com/ceph/ceph/commit/4667280f8afe6cd68dfffea61d7530581f3dd0eb
Alessandro'... - 12:27 PM Backport #24890 (In Progress): luminous: FAILED assert(0 == "ERROR: source must exist") in FileSt...
- 10:18 AM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- https://github.com/ceph/ceph/pull/22976
- 11:03 AM Backport #24891 (In Progress): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- 10:18 AM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
- https://github.com/ceph/ceph/pull/22997
- 10:50 AM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
- 10:50 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
- 12:06 AM Backport #24204: mimic: LibRadosMiscPool.PoolCreationRace segv
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22291
merged - 10:50 AM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
- 10:49 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
- 12:05 AM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22492
merged - 10:48 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
- 12:03 AM Backport #24747: mimic: change default filestore_merge_threshold to -10
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22813
merged - 10:48 AM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
- 10:47 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
- 12:03 AM Backport #24473: mimic: cosbench stuck at booting cosbench driver
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22887
merged - 10:46 AM Bug #24487 (Resolved): osd: choose_acting loop
- 10:46 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
- 12:02 AM Backport #24618: mimic: osd: choose_acting loop
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged - 10:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
- 10:46 AM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
- 12:02 AM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged - 10:45 AM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
- 12:00 AM Backport #24805: mimic: rgw workload makes osd memory explode
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22960
merged - 10:36 AM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
- 10:18 AM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
- https://github.com/ceph/ceph/pull/23026
- 10:18 AM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
- 03:03 AM Bug #24664 (Pending Backport): osd: crash in OpTracker::unregister_inflight_op via OSD::get_healt...
- 03:01 AM Bug #24597 (Pending Backport): FAILED assert(0 == "ERROR: source must exist") in FileStore::_coll...
07/11/2018
- 11:48 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- ...
- 11:47 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- Happened again in 12.2.4:...
- 11:33 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
- /a/nojha-2018-07-06_23:31:26-rados-wip-23979-2018-07-06-distro-basic-smithi/2744661/
- 11:24 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Cool, I will pickup and run your test, atm the load on workers is high, should have the results tomorrow eod.
- 10:25 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- OK, it looks like we missed this in the previous tracker issue that mentioned it (it was actually a three part fix an...
- 11:23 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
- 11:21 PM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
- Does this show up in the monitor's log in /var/log/ceph/ ?
- 11:15 PM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
- https://github.com/ceph/ceph/pull/22771
- 11:13 PM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
- Looks the same as #24640
- 11:11 PM Bug #24835 (Need More Info): osd daemon spontaneous segfault
- Unfortunately there's not much to go on - if this happens again perhaps you can grab a core file or a crash dump will...
- 10:09 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- mimic backport: https://github.com/ceph/ceph/pull/22997
- 03:54 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Factors leading to this:
- ec pool (e.g., rgw workload0
- rados ops that result in pg log 'error' entries (e.g., ... - 12:37 PM Bug #24597 (In Progress): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collectio...
- https://github.com/ceph/ceph/pull/22974
- 01:16 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Aha, in that case wip-24192 should fix it. Running it through testing again...
- 12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- 12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- I believe this is caused by b50186bfe6c8981700e33c8a62850e21779d67d5, which does...
- 09:38 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- Ah, the error was reported on luminous, which doesn't do the repair, and I guess I missed it on master. Sorry for the...
- 09:01 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
The do_sparse_read() path doesn't attempt to repair a checksum error. Could that be the real issue?
The do_read...- 08:25 PM Bug #24875 (Resolved): OSD: still returning EIO instead of recovering objects on checksum errors
- A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:...
- 08:31 PM Bug #24876 (New): snaptrim_error state cannot be cleared without a new snaptrim
- A user on the list reported they had PGs in state "active+clean+snaptrim_error". Investigating, I found that the only...
- 08:11 PM Backport #24771: mimic: osd: may get empty info at recovery
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22861
mergedReviewed-by: Sage Weil <sage@redhat.com> - 07:27 PM Bug #24874 (New): ec fast reads can trigger read errors in log
- fast read finishes......
- 04:11 PM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
- This looks like #24597 for the 12.2.5 case, at least. I wonder if the original 12.2.3 is something else (time warp d...
- 03:51 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...
07/10/2018
- 10:10 PM Bug #24866 (Resolved): FAILED assert(0 == "past_interval start interval mismatch") in check_past_...
- ...
- 08:30 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
- https://github.com/ceph/ceph/pull/23024
- 08:29 PM Backport #24864 (Resolved): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code...
- https://github.com/ceph/ceph/pull/23025
- 04:51 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- This was a ceph-volume test with rbd workload, no upgrades, just fresh install, full logs at
http://pulpito.ceph.c... - 02:41 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- This points to a deeper issue. The target context seems to always be 'unlabeled_t'. That context means something like...
- 12:23 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Filing under RADOS because it appears to be OSD specific.
- 01:42 PM Bug #23492 (Pending Backport): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-e...
- 12:46 PM Bug #24850 (New): IPv6 scoped address not parseable by entity_addr_t
- An IPv6 link-local scoped address is not currently parseable since it contains a "%<interface name>" suffix in the ad...
- 12:14 PM Bug #24835: osd daemon spontaneous segfault
- The log (attached) does not contain any information on the crash. It shows only the automatic restart of the crashed ...
- 09:54 AM Backport #24847 (In Progress): jewel: rgw workload makes osd memory explode
- 09:54 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22959
- 09:48 AM Backport #24806 (In Progress): luminous: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22962
- 09:42 AM Backport #24805 (In Progress): mimic: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22960
- 09:41 AM Bug #23352: osd: segfaults under normal operation
- We see periodically with osd_enable_op_tracker = false
Last time ... - 12:55 AM Bug #23352: osd: segfaults under normal operation
- That is correct, Brad. No crashes for 7 days now.
- 09:33 AM Bug #24768: rgw workload makes osd memory explode
- jewel backport: https://github.com/ceph/ceph/pull/22959
i knew that jewel is (almost) EOL. just in case anyone is ... - 04:01 AM Backport #24845 (Resolved): luminous: tools/ceph-objectstore-tool: split filestore directories of...
- https://github.com/ceph/ceph/pull/23418
07/09/2018
- 10:43 PM Bug #23352: osd: segfaults under normal operation
- Alex, So that's a week without issue when previously you were getting a crash every 3-4 days right?
- 01:36 PM Bug #23352: osd: segfaults under normal operation
- No issues so far since injecting osd_enable_op_tracker=false
- 08:40 PM Feature #21366 (Pending Backport): tools/ceph-objectstore-tool: split filestore directories offli...
- 06:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- https://github.com/ceph/ceph/pull/22954
- 06:02 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- The problem is that int global_init_shutdown_stderr(CephContext *cct) is not being run at a time in the process lifec...
- 05:02 PM Bug #24835: osd daemon spontaneous segfault
- Can you provide the backtrace out of the OSD log? Or even the whole log?
- 02:13 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
- We experience spontaneous segmentation faults of osd daemons in our mimic production cluster:...
- 04:36 PM Bug #24838 (Resolved): mon: auth checks not correct for pool ops
- The mon was not enforcing caps for pool ops correctly (which are used for managing unmanaged snapshots or even pool d...
- 04:32 PM Bug #24837 (Resolved): auth: cephx signature check is weak/broken
- The signature check code was validating only the first (32-byte) of two blocks, and thus did not cover all of the crc...
- 04:30 PM Bug #24836 (Resolved): auth: cephx authorizer subject to replay
- The cephx authorizer does not have any challenge or nonce, and thus (if sniffed) can be reused by another session.
... - 04:09 PM Bug #24368: osd: should not restart on permanent failures
- I don't think the issue has moved beyond the PR linked above to change the systemd settings. I sent this out to one o...
- 08:42 AM Bug #24368: osd: should not restart on permanent failures
- guotao Yao wrote:
> I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quick... - 08:12 AM Bug #24368: osd: should not restart on permanent failures
- I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quickly by systemd. It cau...
07/06/2018
- 09:55 PM Bug #24322 (Resolved): slow mon ops from osd_failure
- 09:55 PM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
- 09:50 PM Backport #24350: mimic: slow mon ops from osd_failure
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22297
merged - 09:54 PM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
- 09:54 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
- 09:49 PM Backport #24246: mimic: Manager daemon y is unresponsive during teuthology cluster teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22333
merged - 09:54 PM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
- 09:49 PM Backport #24375: mimic: mon: auto compaction on rocksdb should kick in more often
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22361
merged - 09:52 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
- 09:51 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
- 09:51 PM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
- 09:42 PM Backport #24468: mimic: tell ... config rm <foo> not idempotent
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22552
merged - 09:50 PM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
- 09:42 PM Backport #24332: mimic: local_reserver double-reservation of backfilled pg
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22559
merged - 09:49 PM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
- 09:49 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
- 09:40 PM Backport #24599: mimic: failed to load OSD map for epoch X, got 0 bytes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22651
merged - 09:48 PM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
- 09:39 PM Backport #24494: mimic: osd: segv in Session::have_backoff
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22730
merged - 09:47 PM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
- 09:47 PM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
- 09:45 PM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
- 09:44 PM Backport #24292: mimic: common: JSON output from rados bench write has typo in max_latency key
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22406
merged - 09:06 PM Backport #24806 (Resolved): luminous: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22962
- 09:06 PM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22960
- 06:44 PM Bug #24768 (Pending Backport): rgw workload makes osd memory explode
- 06:12 PM Bug #24801: PG num_bytes becomes huge
The OSD logs and this bug point to a slight flaw in https://github.com/ceph/ceph/pull/22797. I add the adjustment ...- 05:57 PM Bug #24801 (Resolved): PG num_bytes becomes huge
dzafman-2018-07-05_12:45:56-rados-wip-19753-distro-basic-smithi/2739140
description: rados/thrash/{0-size-min-si...- 04:45 PM Backport #23772 (In Progress): luminous: ceph status shows wrong number of objects
- 01:39 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- ...
- 01:28 AM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
2732821
2732693
2732523...- 01:01 AM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
http://pulpito.ceph.com/dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
Multiple jobs
2732818
...
07/05/2018
- 10:40 PM Bug #24785 (Resolved): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- ...
- 09:33 PM Bug #24664 (In Progress): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_met...
- https://github.com/ceph/ceph/pull/22877
- 08:39 PM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
- Ganging up with another backport to prevent merge conflicts.
- 08:27 PM Backport #24618 (In Progress): mimic: osd: choose_acting loop
- 08:22 PM Backport #24617 (In Progress): mimic: ValueError: too many values to unpack due to lack of subdir
- 08:15 PM Backport #24473 (In Progress): mimic: cosbench stuck at booting cosbench driver
- 12:44 PM Bug #24768 (Fix Under Review): rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22858
- 09:17 AM Backport #24583 (In Progress): mimic: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22869
- 07:25 AM Backport #24584 (In Progress): luminous: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22865
Also available in: Atom