Activity
From 06/05/2018 to 07/04/2018
07/04/2018
- 11:11 PM Backport #24772 (In Progress): luminous: osd: may get empty info at recovery
- 10:52 PM Backport #24772 (Resolved): luminous: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22862
- 11:03 PM Backport #24771 (In Progress): mimic: osd: may get empty info at recovery
- 10:52 PM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22861
- 07:24 PM Bug #24588: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22704 is the fix
- 07:23 PM Bug #24588 (Pending Backport): osd: may get empty info at recovery
- 07:17 PM Bug #24768 (Resolved): rgw workload makes osd memory explode
- From ML,...
- 12:47 PM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> Having reviewed the code in question again I was afraid that may be the case. If you can provi... - 09:38 AM Bug #23352: osd: segfaults under normal operation
- Having reviewed the code in question again I was afraid that may be the case. If you can provide the crash dump Dan, ...
- 07:36 AM Bug #23352: osd: segfaults under normal operation
- I *injected* osd_enable_op_tracker=false yesterday ...
- 07:40 AM Bug #24123 (Resolved): "process (unknown)" in ceph logs
- 07:39 AM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
- 07:38 AM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
- 07:38 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
- 07:37 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
- 07:31 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi Martin,
Have you try my workaround above?
Best regards, - 06:18 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi everyone,
What’s the workaround for this issue? Not being able to add new osds is getting more and more urgent... - 01:10 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Final bisect results:
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
2e476...
07/03/2018
- 10:51 AM Backport #24748 (In Progress): luminous: change default filestore_merge_threshold to -10
- 07:55 AM Backport #24748 (Resolved): luminous: change default filestore_merge_threshold to -10
- https://github.com/ceph/ceph/pull/22814
- 10:47 AM Backport #24747 (In Progress): mimic: change default filestore_merge_threshold to -10
- 07:55 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
- https://github.com/ceph/ceph/pull/22813
- 10:47 AM Bug #24686: change default filestore_merge_threshold to -10
- *master PR*: https://github.com/ceph/ceph/pull/22761
- 12:36 AM Bug #24686 (Pending Backport): change default filestore_merge_threshold to -10
- 10:24 AM Feature #13507: scrub APIs to read replica
- Update backport field?
- 04:16 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi Sergey,
Have you try that after "ceph osd require-osd-release mimic"?
My workaround is below.
1. Build pa... - 03:08 AM Bug #23352: osd: segfaults under normal operation
- Thanks Alex!
- 01:50 AM Bug #23352: osd: segfaults under normal operation
- I set it Brad, watching the status. We normal get one failure in 3-4 days.
- 01:03 AM Bug #23352: osd: segfaults under normal operation
- We are investigating the potential race between get_health_metrics and the op_tracker code.
In the mean time, for ...
07/02/2018
- 10:47 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I finded workaround solution how add new osd to "mimic" cluster:
1. Purge osd from cluster which displayed as "dow... - 04:52 PM Backport #24215: mimic: "process (unknown)" in ceph logs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22311
merged - 04:52 PM Backport #24500: mimic: osd: eternal stuck PG in 'unfound_recovery'
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22545
merged - 04:51 PM Backport #24355: mimic: osd: pg hard limit too easy to hit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22621
merged - 06:12 AM Bug #23352: osd: segfaults under normal operation
- Pretty sure this all revolves around the racy code highlighted in #24037 and, unfortunately, the PR does *not* fix al...
06/29/2018
- 11:27 PM Bug #23875 (In Progress): Removal of snapshot with corrupt replica crashes osd
- Tentative pull request https://github.com/ceph/ceph/pull/22476 is an improvement but doesn't address comment 3
- 11:25 PM Bug #19753 (In Progress): Deny reservation if expected backfill size would put us over backfill_f...
- 05:59 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- Also include >1024 PGs overall
- 09:59 AM Bug #23145: OSD crashes during recovery of EC pg
- @sage weil,
tks, due to env is not exists. I couldn't get the logs for the arguments debug_osd=20.
from the previou...
06/28/2018
- 05:53 PM Bug #24645: Upload to radosgw fails when there are degraded objects
- When the cluster is in recovery this is expected that we're waiting for the OSDs to respond
- 05:16 PM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
- I discovered that commit 9099ca5 - "fix the dencoder of entity_addr_t" introduced this kind of interoperability which...
- 08:50 AM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
- I investigated further with gdb. Lines 478-501 from msg/msg_types.h seem to be the culprit. Here sa_family is decoded...
- 05:14 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 05:08 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/nojha-2018-06-27_22:32:36-rados-wip-23979-distro-basic-smithi/2715571/
- 02:50 PM Bug #24686 (In Progress): change default filestore_merge_threshold to -10
- 02:18 PM Bug #24686 (Resolved): change default filestore_merge_threshold to -10
- Performance evaluations of medium to large size Ceph clusters have demonstrated negligible performance impact from un...
- 02:49 PM Bug #24687 (Resolved): Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- Field experience has demonstrated significant performance impact from filestore split and merge activity. The expecte...
- 10:15 AM Bug #24685 (Resolved): config options: possible inconsistency between flag 'can_update_at_runtime...
- I'm wondering if there is a inconsistency between the 'can_update_at_runtime' flag and the 'flags' list for the confi...
- 08:47 AM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
- If I execute the same command that systemd uses, I get a great readable error message:...
- 08:45 AM Bug #24683 (New): ceph-mon binary doesn't report to systemd why it dies
- Following the quick start guide I get at a point where the monitor is supposed to come up but it doesn't. It doesn't ...
- 05:53 AM Bug #24587: librados api aio tests race condition
- Good news, this is just a bug in the tests. They're submitting a write and then a read without waiting for the write ...
- 03:04 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Thanks Sage. I'll try to get my hands on another environment and see if I can reproduce and get more details. Will up...
06/27/2018
- 09:17 PM Bug #24615: error message for 'unable to find any IP address' not shown
- Sounds like the log isn't being flushed before exiting
- 09:13 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
- This should be fixed in later versions - hammer is end of life.
The crash was:... - 09:06 PM Bug #24667: osd: SIGSEGV in MMgrReport::encode_payload
- Possibly related to a memory corruption we've been seeing related to mgr health reporting on the osd.
- 07:54 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- https://github.com/ceph/ceph/pull/22744 disabled build_past_itnervals_parallel in luminous (by default; can be turned...
- 05:51 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Well, I can work around the issue.. I the build_past_itnervals_parallel() is removed entirely in mimic and I can do t...
- 07:12 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- > Any chance you can gdb one of the core files for a crashing OSD to identify which PG is it asserting on? and perhap...
- 06:10 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Sage Weil wrote:
> Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or m... - 05:19 PM Bug #24678 (Can't reproduce): ceph-mon segmentation fault after setting pool size to 1 on degrade...
- We have an issue with starting any from 3 monitors after changing pool size from 3 to 1. The cluster was in a degrade...
- 03:09 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I also have this issue on new installed mimic cluster.
Don't know if this is important, the problem has appeared aft... - 12:00 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
- We are using a ceph cluster in a mixed FreeBSD/Linux environment. The ceph cluster is based on FreeBSD. Linux clients...
- 09:16 AM Bug #23352: osd: segfaults under normal operation
- We're getting a few crashes like this per week here on 12.2.5.
Here's a fileStore OSD:... - 04:10 AM Backport #24494 (In Progress): mimic: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22730
- 04:09 AM Backport #24495 (In Progress): luminous: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22729
- 12:41 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...
06/26/2018
- 11:32 PM Bug #23492 (In Progress): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasur...
- 11:29 PM Feature #13507 (New): scrub APIs to read replica
- 11:28 PM Bug #24366 (Resolved): omap_digest handling still not correct
- 11:27 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
- 11:27 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
- 09:08 PM Bug #23352: osd: segfaults under normal operation
- Matt,
Can you provide a coredump or full backtrace? - 01:54 PM Bug #23352: osd: segfaults under normal operation
- Also confirmed on Ubuntu 18.04/Ceph 13.2.0:
ceph-mgr.log
> 2018-06-24 11:14:47.317 7ff17b0db700 -1 mgr.server s... - 02:54 AM Bug #23352: osd: segfaults under normal operation
- confirmed
ceph-mgr.log
@2018-06-20 08:46:05.528656 7fb998ff2700 -1 mgr.server send_report send_report osd,215.0x5... - 07:14 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or maybe a split combine...
- 07:10 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- ...
- 07:06 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- ...
- 06:42 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Dexter John Genterone wrote:
> Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ce... - 07:11 PM Bug #24667 (Can't reproduce): osd: SIGSEGV in MMgrReport::encode_payload
- ...
- 07:07 PM Bug #24666 (New): pybind: InvalidArgumentError is missing 'errno' argument
- Instead of being derived from 'Error', the 'InvalidArgumentError' should be derived from 'OSError' which will handle ...
- 01:49 PM Bug #24664 (Resolved): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
- ...
- 09:47 AM Bug #24660 (New): admin/build-doc fails during autodoc on rados module: "AttributeError: __next__"
- I'm trying to send a doc patch and am running @admin/build-doc@ in my local environment explained in [[http://docs.ce...
06/25/2018
- 09:50 PM Bug #23352: osd: segfaults under normal operation
- Same here
2018-06-24 19:42:41.348699 7f3e53a46700 -1 mgr.server send_report send_report osd,226.0x55678069c850 sen... - 09:34 PM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> Can anyone confirm seeing the "unknown health metric" messages in the mgr logs prior to the se... - 02:50 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
- After a deep-scrub on the primary OSD for the pg we get:...
- 01:03 PM Bug #24650 (New): mark unfound lost revert: out of order trim
- OSD crashes in a few seconds after command 'ceph pg X.XX mark_unfound_lost revert'.
-10> 2018-06-25 15:52:14.49... - 07:53 AM Bug #24645 (New): Upload to radosgw fails when there are degraded objects
- Hi,
we use Ceph RadosGW for storing and serving milions of small images. Everything is working well until recovery... - 06:46 AM Backport #24471 (In Progress): luminous: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22686
- 05:01 AM Backport #24472 (In Progress): mimic: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22684
06/22/2018
- 11:25 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ceph-osd-logs.tar.gz
After runn... - 12:45 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Hi Sage,
We've experienced this again on a new environment we setup. Took a snippet of the logs, hope it's enough:... - 04:46 PM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- 04:45 PM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 04:25 PM Backport #23675: luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21368
merged - 04:44 PM Bug #23921 (Resolved): pg-upmap cannot balance in some case
- 04:43 PM Backport #24048 (Resolved): luminous: pg-upmap cannot balance in some case
- 04:25 PM Backport #24048: luminous: pg-upmap cannot balance in some case
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22115
merged - 04:43 PM Bug #24025 (Resolved): RocksDB compression is not supported at least on Debian.
- 04:42 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
- 04:24 PM Backport #24279: luminous: RocksDB compression is not supported at least on Debian.
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22215
merged - 04:40 PM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
- original mimic backport https://github.com/ceph/ceph/pull/22288 was merged, but deemed insufficient
- 04:38 PM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
- 04:23 PM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
- merged https://github.com/ceph/ceph/pull/22296
- 04:15 PM Bug #24635 (New): luminous: LibRadosTwoPoolsPP.SetRedirectRead failed
- Probably a race with the redirect code.
From http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-22_03:31:56-rados-w... - 01:19 PM Bug #23352: osd: segfaults under normal operation
- Yeah. We got in log mgr before segfault ceph-osd:
> mgr.server send_report send_report osd,74.0x560276d34ed8 sent me... - 03:54 AM Bug #23352: osd: segfaults under normal operation
- 03:54 AM Bug #23352: osd: segfaults under normal operation
- In several of the crashes we are seeing lines like the following prior to the crash....
- 12:42 PM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- I have a bit different effect on v12.2.5, but may be related:
I have similar logs:... - 08:44 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
- 12:23 AM Backport #24351: luminous: slow mon ops from osd_failure
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22568
merged - 08:44 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
- 08:43 AM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
- 12:21 AM Backport #24258: luminous: crush device class: Monitor Crash when moving Bucket into Default root
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22381
mergedReviewed-by: Nathan Cutler <ncutler@suse.com> - 08:43 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
- 12:20 AM Backport #24290: luminous: common: JSON output from rados bench write has typo in max_latency key
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22391
merged - 08:41 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
- 12:18 AM Backport #24356: luminous: osd: pg hard limit too easy to hit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22592
merged - 08:41 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
- https://github.com/ceph/ceph/pull/22889
- 08:41 AM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
- https://github.com/ceph/ceph/pull/22888
- 12:32 AM Bug #24615 (Resolved): error message for 'unable to find any IP address' not shown
- Hi,
In my ceph.conf I have the option:...
06/21/2018
- 11:40 PM Bug #24613 (New): luminous: rest/test.py fails with expected 200, got 400
- ...
- 10:57 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- possibly related lumious run: http://pulpito.ceph.com/yuriw-2018-06-11_16:27:32-rados-wip-yuri3-testing-2018-06-11-14...
- 10:15 PM Bug #23352: osd: segfaults under normal operation
- Another instance: http://pulpito.ceph.com/yuriw-2018-06-19_21:29:48-rados-wip-yuri-testing-2018-06-19-1953-luminous-d...
- 09:01 PM Bug #24487 (Pending Backport): osd: choose_acting loop
- 05:58 PM Bug #24487 (Fix Under Review): osd: choose_acting loop
- https://github.com/ceph/ceph/pull/22664
- 06:48 PM Bug #24612 (Resolved): FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()
- ...
- 04:51 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/2686362
- 09:19 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- We also hit this today, happen to have osd log with --debug_osd = 20
FWIW, the cluster has an inconsistent PG and ... - 01:34 AM Bug #24601 (Resolved): FAILED assert(is_up(osd)) in OSDMap::get_inst(int)
- ...
- 12:52 AM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
- ...
06/20/2018
- 10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- Sage Weil wrote:
> Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash... - 10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash?
- 09:50 PM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 10:11 PM Bug #23145: OSD crashes during recovery of EC pg
- Two basic theories:
1. There is a bug that prematurely advances can_rollback_to
2. One of Peter's OSDs warped bac... - 10:05 PM Bug #23145: OSD crashes during recovery of EC pg
- Sage Weil wrote:
> Zengran Zhang wrote:
> > osd in last peering stage will call pg_log.roll_forward(at last of PG:... - 10:03 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
- Yong Wang, can you provide a full osd log with debug osd = 20 for the primary osd for the PG leading up to the crash...
- 09:22 PM Bug #23145: OSD crashes during recovery of EC pg
- Zengran Zhang wrote:
> osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there p... - 01:46 AM Bug #23145: OSD crashes during recovery of EC pg
- @Sage Weil @Zengran Zhang
could you shared something about this bug recently? - 01:44 AM Bug #23145: OSD crashes during recovery of EC pg
- hi all,did it has any updates please?
- 10:02 PM Backport #24599 (In Progress): mimic: failed to load OSD map for epoch X, got 0 bytes
- 10:01 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
- https://github.com/ceph/ceph/pull/22651
- 09:47 PM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
- This is likely due to filestore becoming overloaded (hence waiting on throttles) and hitting the filestore op thread ...
- 09:38 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
- 09:37 PM Bug #24515: "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow...
- Kefu, can you take a look at this?
- 09:36 PM Bug #24531: Mimic MONs have slow/long running ops
- Joao, could you take a look at this?
- 09:34 PM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
- As John described, this is not a bug in ceph but due to failing hardware or the filesystem below.
- 09:25 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
- re-open if it recurs
- 09:19 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- 09:12 PM Bug #22085 (Can't reproduce): jewel->luminous: "[ FAILED ] LibRadosAioEC.IsSafe" in upgrade:jew...
- assuming this is the mon crush testing timeout, logs are gone so can't be sure
- 08:10 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- backport for mimic: https://github.com/ceph/ceph/pull/22651
- 08:07 PM Bug #24423 (Pending Backport): failed to load OSD map for epoch X, got 0 bytes
- 07:46 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
- ...
- 06:32 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
- ...
- 03:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Now that I've looked at the code there is nothing surprising about the map handling. There is code in dequeue_op()...- 12:37 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
I was able to reproduce by running a loop of a single test case in qa/standalone/erasure-code/test-erasure-eio.sh
...- 01:00 PM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- 12:52 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
- 12:52 PM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
- 12:21 PM Backport #24383 (In Progress): mimic: osd: stray osds in async_recovery_targets cause out of orde...
- https://github.com/ceph/ceph/pull/22642
- 08:42 AM Bug #24588 (Fix Under Review): osd: may get empty info at recovery
- -https://github.com/ceph/ceph/pull/22362-
- 01:42 AM Bug #24588 (Resolved): osd: may get empty info at recovery
- 2018-06-15 20:34:16.421720 7f89d2c24700 -1 /home/zzr/ceph.sf/src/osd/PG.cc: In function 'void PG::start_peering_inter...
- 08:40 AM Bug #24593: s390x: Ceph Monitor crashed with Caught signal (Aborted)
- I expect that only people in possession of s390x hardware will be able to debug this
I see that there is another t... - 05:33 AM Bug #24593 (New): s390x: Ceph Monitor crashed with Caught signal (Aborted)
- We are trying to setup ceph cluster on s390x platform.
ceph-mon service crashed with an error: *** Caught signal ... - 05:50 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
- 03:22 AM Feature #24591: FileStore hasn't impl to get kv-db's statistics
- https://github.com/ceph/ceph/pull/22633
- 03:22 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
- In BlueStore, you can see kv-db's statistics by "ceph daemon osd.X dump_objectstore_kv_stats", but FileStore hasn't i...
- 03:22 AM Feature #22147: Set multiple flags in a single command line
- I don’t think we should skip it entirely. Many of the places that implement a check like that are using a common flag...
06/19/2018
- 11:44 PM Bug #24487 (In Progress): osd: choose_acting loop
- This happens when an osd which is part of the acting set and not a part the upset, gets chosen as an async_recovery_t...
- 10:51 PM Backport #23673: jewel: auth: ceph auth add does not sanity-check caps
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21367
merged - 10:50 PM Backport #23905: jewel: Deleting a pool with active watch/notify linger ops can result in seg fault
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21754
merged - 10:49 PM Feature #22147: Set multiple flags in a single command line
- It seems fair to assume that "unset" should support this also.
Question: should settings that require --yes-i-real... - 10:40 PM Bug #24587: librados api aio tests race condition
- http://pulpito.ceph.com/yuriw-2018-06-13_14:55:30-rados-wip-yuri4-testing-2018-06-12-2037-jewel-distro-basic-smithi/2...
- 10:38 PM Bug #24587 (Resolved): librados api aio tests race condition
- Seen in a jewel integration branch with no OSD changes:
http://pulpito.ceph.com/yuriw-2018-06-12_22:32:43-rados-wi... - 09:58 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- I did a run based on d9284902e1b2e292595696caf11cdead18acec96 which is a branch off of master.
http://pulpito.ceph... - 07:24 PM Backport #24584 (Resolved): luminous: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22865
- 07:24 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22869
- 06:02 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- 06:01 PM Backport #22406 (Rejected): jewel: osd: deletes are performed inline during pg log processing
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 06:01 PM Backport #22405 (Rejected): jewel: store longer dup op information
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 06:00 PM Backport #22400 (Rejected): jewel: PR #16172 causing performance regression
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 04:10 PM Bug #24484 (Pending Backport): osdc: wrong offset in BufferHead
- 11:54 AM Bug #24448: (Filestore) ABRT report for package ceph has reached 10 occurrences
- OSD killed by signal, something like OOM incidents perhaps?
- 11:53 AM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
- http://tracker.ceph.com/issues/24423
- 11:51 AM Bug #24559 (Fix Under Review): building error for QAT decompress
- 02:10 AM Bug #24559 (Fix Under Review): building error for QAT decompress
- The parameter of decompress changes from 'bufferlist::iterator' to 'bufferlist::const_iterator', but chis change miss...
- 11:34 AM Bug #24549: FileStore::read assert (ABRT report for package ceph has reached 1000 occurrences)
- Presumably this is underlying FS failures tripping asserts rather than a bug (perhaps people using ZFS on centos, or ...
- 07:26 AM Backport #24355 (In Progress): mimic: osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22621
06/18/2018
- 05:51 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- 11:45 AM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
- FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)
... - 07:11 AM Backport #24356 (In Progress): luminous: osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22592
06/16/2018
- 02:16 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- How to fix installed Mimic (upgraded from Luminous) with this fix? Is there any way to make startup OSD not requestin...
06/15/2018
- 11:40 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I've fixed it here: https://github.com/ceph/ceph/pull/22585
- 01:36 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Not sure if this is related, but for a few days, I'm not able to modify crushmap (like adding or removing OSD) on a l...
- 09:23 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Seeing the same here with a new Mimic cluster.
I purged a few OSDs (deployment went wrong) and now they can't star... - 03:56 PM Bug #24057: cbt fails to copy results to the archive dir
- 02:48 PM Bug #24531: Mimic MONs have slow/long running ops
- ...
- 02:41 PM Bug #24531: Mimic MONs have slow/long running ops
- What's the output of "ceph versions" on this cluster?
We had issues in the lab with OSD failure reports not gettin... - 02:20 PM Bug #24531 (Resolved): Mimic MONs have slow/long running ops
- When setting up a Mimic 13.2.0 cluster I saw a message like this:...
- 08:39 AM Bug #24529 (New): monitor report empty client io rate when clock not synchronized
- we run rados bench when cluster is warn and clock is not synchronized. on the other hand, we watch io speed from resu...
- 05:08 AM Backport #24351 (In Progress): luminous: slow mon ops from osd_failure
- https://github.com/ceph/ceph/pull/22568
06/14/2018
- 10:21 PM Bug #21142 (Need More Info): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- 10:20 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Tim, Dexter, is this something that is reproducible in your environment? I haven't seen this one, which makes me ver...
- 07:41 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
This might be caused by 52dd99e3011bfc787042fe105e02c11b28867c4c which was included in https://github.com/ceph/ceph...- 07:27 PM Bug #24526: Mimic OSDs do not start after deleting some pools with size=1
- I solved this issue by monkey-patching OSD code:...
- 03:48 PM Bug #24526: Mimic OSDs do not start after deleting some pools with size=1
- P.S: This happened just after deleting some pool with size=1 - several OSDs died immediately and the latest error mes...
- 03:24 PM Bug #24526 (New): Mimic OSDs do not start after deleting some pools with size=1
- After some amount of test actions involving creating pools with size=min_size=1 and then deleting them, most OSDs fai...
- 07:06 PM Feature #24527 (New): Need a pg query that doens't include invalid peer information
Some fields in the peer info remain unchanged after a peer transitions from being the primary. This information ma...- 01:13 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I am getting the same issue.
I also upgraded to Luminous to Mimic.
I used: ceph osd purge - 11:48 AM Backport #24198 (Resolved): luminous: mon: slow op on log message
- 11:47 AM Backport #24216 (Resolved): luminous: "process (unknown)" in ceph logs
- 11:46 AM Bug #24167 (Resolved): Module 'balancer' has failed: could not find bucket -14
- 11:46 AM Backport #24213 (Resolved): mimic: Module 'balancer' has failed: could not find bucket -14
- 11:45 AM Backport #24214 (Resolved): luminous: Module 'balancer' has failed: could not find bucket -14
- 05:54 AM Backport #24332 (In Progress): mimic: local_reserver double-reservation of backfilled pg
- https://github.com/ceph/ceph/pull/22559
06/13/2018
- 10:01 PM Backport #24198: luminous: mon: slow op on log message
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22109
merged - 10:00 PM Backport #24216: luminous: "process (unknown)" in ceph logs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22290
merged - 09:59 PM Backport #24214: luminous: Module 'balancer' has failed: could not find bucket -14
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22308
merged - 08:13 PM Bug #24515 (New): "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c ha...
- This seems to be rhel specific
Run: http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smith... - 05:19 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
- 05:19 PM Backport #24059 (Resolved): luminous: Deleting a pool with active notify linger ops can result in...
- 04:46 PM Backport #24468 (In Progress): mimic: tell ... config rm <foo> not idempotent
- 04:35 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
- 04:34 PM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
- 12:56 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
- h1. ENV
*ceph version*... - 11:29 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
- hi,
which is the expected fix release version?
Thanks, - 10:16 AM Backport #24501 (In Progress): luminous: osd: eternal stuck PG in 'unfound_recovery'
- 10:16 AM Backport #24500 (In Progress): mimic: osd: eternal stuck PG in 'unfound_recovery'
06/12/2018
- 08:01 AM Backport #24501 (Resolved): luminous: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22546
- 08:01 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22545
- 08:00 AM Backport #24495 (Resolved): luminous: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22729
- 08:00 AM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22730
- 03:22 AM Bug #24486 (Pending Backport): osd: segv in Session::have_backoff
06/11/2018
- 09:32 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I am going to add this test for upgrade as well, steps to recreate...
- 04:19 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I have also experienced this issue while continuing the Bluestore conversion of OSDs on my Ceph cluster, after carryi...
- 02:16 PM Backport #24059: luminous: Deleting a pool with active notify linger ops can result in seg fault
- Casey Bodley wrote:
> https://github.com/ceph/ceph/pull/22143
merged - 02:33 AM Bug #24487: osd: choose_acting loop
- It looks like the "choose_async_recovery_ec candidates by cost are: 178,2(0)" line is different in the second case.. ...
- 01:45 AM Bug #24487 (Resolved): osd: choose_acting loop
- ec pg looping between [2,3,0,1] and [-,3,0,1].
osd.3 says...
06/10/2018
- 06:41 PM Bug #24486 (Fix Under Review): osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22497
- 06:34 PM Bug #24486 (Resolved): osd: segv in Session::have_backoff
- ...
- 04:41 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
- ...
- 03:30 PM Bug #24484 (Fix Under Review): osdc: wrong offset in BufferHead
- 03:15 PM Bug #24484: osdc: wrong offset in BufferHead
- this bug will lead to an exception "buffer::end_of_buffer" which is thrown in function "buffer::list::substr_of"
Thi... - 03:08 PM Bug #24484: osdc: wrong offset in BufferHead
- PR: https://github.com/ceph/ceph/pull/22495
- 03:07 PM Bug #24484 (Resolved): osdc: wrong offset in BufferHead
- The offset of BufferHead should be "opos - bh->start()"
- 02:12 AM Backport #24329 (In Progress): mimic: assert manager.get_num_active_clean() == pg_num on rados/si...
06/09/2018
- 07:21 PM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
- 05:56 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
- https://github.com/ceph/ceph/pull/22485
- 06:50 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
- Maybe i have the same issue during upgrade Jewel->Luminous http://tracker.ceph.com/issues/24481?next_issue_id=24480&p...
- 02:23 PM Bug #24373 (Pending Backport): osd: eternal stuck PG in 'unfound_recovery'
- 11:20 AM Backport #24478 (Resolved): luminous: read object attrs failed at EC recovery
- https://github.com/ceph/ceph/pull/24327
- 11:18 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
- https://github.com/ceph/ceph/pull/22887
- 11:18 AM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22684
- 11:18 AM Backport #24471 (Resolved): luminous: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22686
- 11:18 AM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
- https://github.com/ceph/ceph/pull/22552
- 06:07 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
06/08/2018
- 11:03 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I can't reproduce this on any new Mimic cluster, it only happens on clusters upgraded from Luminous (which is why we ...
- 09:04 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I'm trying to make new OSDs with ceph-volume osd create --dmcrypt --bluestore --data /dev/sdg and am getting the same...
- 07:05 PM Bug #24454 (Duplicate): failed to recover before timeout expired
- #24452
- 12:29 PM Bug #24454 (Duplicate): failed to recover before timeout expired
- tons of this on current master
http://pulpito.ceph.com/kchai-2018-06-06_04:56:43-rados-wip-kefu-testing-2018-06-06... - 07:05 PM Bug #24452 (Fix Under Review): Backfill hangs in a test case in master not mimic
- https://github.com/ceph/ceph/pull/22478
- 02:48 PM Bug #24452: Backfill hangs in a test case in master not mimic
Final messages on primary during backfill about pg 1.0....- 04:57 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
../qa/run-standalone.sh "osd-backfill-stats.sh TEST_backfill_down_out" 2>&1 | tee obs.log
This test times out wa...- 02:34 PM Backport #23912: luminous: mon: High MON cpu usage when cluster is changing
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21968
merged - 02:33 PM Backport #24245: luminous: Manager daemon y is unresponsive during teuthology cluster teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22331
merged - 02:31 PM Backport #24374: luminous: mon: auto compaction on rocksdb should kick in more often
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22360
merged - 08:18 AM Bug #23352: osd: segfaults under normal operation
- Experiencing the a safe_timer segfault with a freshly deployed cluster. No data on the cluster yet. Just an empty poo...
06/07/2018
- 03:20 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- We are also seeing this when creating OSDs with IDs that existed previously.
I verified that the old osd was delet... - 01:21 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22456
- 01:14 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- Okay, I see the problem. Two fixes: first, reset every pg on down->up (simpler approach), but the bigger issue is th...
- 12:58 PM Bug #24450: OSD Caught signal (Aborted)
- I have the same problem.
http://tracker.ceph.com/issues/24423 - 12:03 PM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
- Hi,
I have done a rolling_upgrade to mimic with ceph-ansible. It works perfect! Now, I want to deploy new OSDs, bu... - 11:46 AM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
- https://retrace.fedoraproject.org/faf/reports/bthash/fe768f98e5fff65f0c850668c4bdae8d4da7e086/
https://retrace.fedor...
06/06/2018
- 09:11 PM Bug #24264 (Closed): ssd-primary crush rule not working as intended
- I don't think there's a good way to express that requirement in the current crush language. The rule in the docs does...
- 09:06 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
- Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.
... - 07:15 AM Bug #23145: OSD crashes during recovery of EC pg
- -3> 2018-06-06 15:00:40.462930 7fffddb25700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2...
- 02:45 AM Bug #23145: OSD crashes during recovery of EC pg
- @Sage Weil
@Zengran Zahng
we meet the some question, and osd crash not recover until now.
env is 12.2.5 ec 2+1 b... - 06:02 AM Backport #24293 (In Progress): jewel: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22431
- 02:34 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- Attached full log (download ceph-osd.3.log.gz).
Points are:... - 12:33 AM Bug #24371 (Pending Backport): Ceph-osd crash when activate SPDK
06/05/2018
- 05:34 PM Bug #24365 (Pending Backport): cosbench stuck at booting cosbench driver
- 01:33 AM Bug #24365 (Fix Under Review): cosbench stuck at booting cosbench driver
- https://github.com/ceph/ceph/pull/22405
- 04:04 PM Bug #24408 (Pending Backport): tell ... config rm <foo> not idempotent
- 11:00 AM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
- After upgrading to Mimic I deleted a non-lvm OSD and recreated it with 'ceph-volume lvm prepare --bluestore --data /d...
- 10:37 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- the same to https://tracker.ceph.com/issues/21475. and i already modify bluestore_deferred_throttle_bytes = 0
bluest... - 10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 2018-06-05T17:46:28.273183+08:00 node54 ceph-osd: /work/build/rpmbuild/BUILD/infinity-3.2.5/src/os/bluestore/BlueStor...
- 10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 鹏 张 wrote:
> ceph version: 12.2.5
> data pool use Ec module 2 + 1.
> When restart one osd,it case crash and restar... - 10:26 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 1.-45> 2018-06-05 17:47:56.886142 7f8972974700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2)...
- 10:25 AM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- ceph version: 12.2.5
data pool use Ec module 3 + 1.
When restart one osd,it case crash and restart more and more.
... - 04:42 AM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
- Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step acc... - 03:32 AM Backport #24291 (In Progress): jewel: common: JSON output from rados bench write has typo in max_...
- https://github.com/ceph/ceph/pull/22407
- 02:37 AM Bug #23875: Removal of snapshot with corrupt replica crashes osd
If update_snap_map() ignores the error from remove_oid() we still crash because an op from the primary related to...- 02:20 AM Backport #24292 (In Progress): mimic: common: JSON output from rados bench write has typo in max_...
- https://github.com/ceph/ceph/pull/22406
Also available in: Atom