Activity
From 07/01/2019 to 07/30/2019
07/30/2019
- 11:54 PM Bug #41016 (Resolved): Improve upmap change reporting in logs
- 1. do not silently skip mappings in _apply_upmap() or anywhere else, when they aren't going to be applied
2. maybe_r... - 11:40 PM Backport #40744 (Resolved): nautilus: core: lazy omap stat collection
- 10:12 PM Backport #40744: nautilus: core: lazy omap stat collection
- Brad Hubbard wrote:
> https://github.com/ceph/ceph/pull/29188
merged - 10:26 PM Backport #40652: nautilus: os/bluestore: fix >2GB writes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28966
merged - 10:25 PM Backport #40652: nautilus: os/bluestore: fix >2GB writes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28966
merged - 10:21 PM Backport #40655: nautilus: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29173
merged - 10:17 PM Backport #40667: nautilus: PG scrub stamps reset to 0.000000
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/28869
merged - 10:16 PM Backport #40730: nautilus: mon: auth mon isn't loading full KeyServerData after restart
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28993
merged - 10:16 PM Backport #39693: nautilus: _txc_add_transaction error (39) Directory not empty not handled on ope...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29115
merged - 06:50 PM Feature #40640 (Fix Under Review): Network ping monitoring
- 04:09 PM Backport #39692: mimic: _txc_add_transaction error (39) Directory not empty not handled on operat...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29217
merged - 11:13 AM Tasks #40937: Problem "open vSwitch" networkbond set_numa_affinity
- During the reboot the following messages are in the log:...
- 07:59 AM Tasks #40937: Problem "open vSwitch" networkbond set_numa_affinity
- It's all working so far. The question is, does the cpu assignment work? To do this I manually set the following comma...
- 11:06 AM Bug #23402: objecter: does not resend op on split interval
- duplicated by:https://tracker.ceph.com/issues/22544
- 07:26 AM Documentation #41004 (In Progress): doc: pg_num should always be a power of two
- 06:21 AM Documentation #41004: doc: pg_num should always be a power of two
- https://github.com/ceph/ceph/pull/29364
- 06:20 AM Documentation #41004 (Resolved): doc: pg_num should always be a power of two
- Hi,
I updated the pg_num section in the docs just a little to be more strict.
I think we should make it crystal c... - 06:37 AM Backport #40625 (In Progress): nautilus: OSDs get killed by OOM due to a broken switch
- https://github.com/ceph/ceph/pull/29391
07/29/2019
- 09:09 PM Tasks #40937: Problem "open vSwitch" networkbond set_numa_affinity
- Does this visibly break anything or is it just a message in the logs?
- 07:33 PM Bug #40998 (Can't reproduce): ceph-objectstore-tool remove broken
- After rebuilding the problem went away.
- 05:57 PM Bug #40998 (Can't reproduce): ceph-objectstore-tool remove broken
It seems like the remove function no longer works properly with snaps. Test errors are detected with OSDs down the...- 12:23 PM Bug #40791 (Closed): high variance in pg size
- https://github.com/ceph/ceph/pull/29364
closing this. - 10:01 AM Bug #40994 (New): unittest_erasure_code_shec_all Failed with Timeout
- https://jenkins.ceph.com/job/ceph-pull-requests/30238/console...
- 09:45 AM Backport #40993 (Rejected): mimic: Ceph status in some cases does not report slow ops
- We had 2 instances when running 13.2.6 they didn't report the slow ops of failing disks.
This is from 1 cluster:
<p... - 06:38 AM Backport #40537 (In Progress): nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
- https://github.com/ceph/ceph/pull/29372
07/28/2019
07/26/2019
- 10:18 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- I dumped out the log (500 lines, attached) almost exclusively this sequence so we're not communicating with osd.3.
... - 06:58 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- Looking at a live process it doesn't seem to be a deadlock or "hang" but more like some sort of livelock where the 'm...
- 06:29 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- Still working on this but steps to reproduce......
- 08:32 AM Bug #40637 (Resolved): osd: report omap/data/metadata usage
- 08:32 AM Backport #40638 (Resolved): luminous: osd: report omap/data/metadata usage
- 08:30 AM Backport #40940 (Need More Info): nautilus: Update rocksdb to v6.1.2
- 06:43 AM Feature #40955 (Fix Under Review): Extend the scrub sleep time when the period is outside [osd_sc...
- 04:03 AM Feature #40955: Extend the scrub sleep time when the period is outside [osd_scrub_begin_hour, osd...
- PR: https://github.com/ceph/ceph/pull/29342
- 01:30 AM Bug #40969 (Resolved): rocksdb: enable rocksdb_rmrange=true by default and make delete range opti...
07/25/2019
- 09:45 PM Backport #40638: luminous: osd: report omap/data/metadata usage
- Josh Durgin wrote:
> https://github.com/ceph/ceph/pull/28851
merged - 07:44 PM Backport #40940: nautilus: Update rocksdb to v6.1.2
- We want this to bake in master for a while.
- 08:56 AM Backport #40940 (Resolved): nautilus: Update rocksdb to v6.1.2
- https://github.com/ceph/ceph/pull/29440
- 04:24 PM Bug #40963 (Resolved): mimic: MQuery during Deleting state
- ...
- 12:29 PM Feature #40955 (Resolved): Extend the scrub sleep time when the period is outside [osd_scrub_begi...
- We already have osd_scrub_begin_week_day, osd_scrub_end_week_day, osd_scrub_begin_hour and osd_scrub_end_hour to tell...
- 12:10 PM Backport #40654 (Resolved): mimic: Lower the default value of osd_deep_scrub_large_omap_object_ke...
- 12:09 PM Backport #38552 (Resolved): mimic: core: lazy omap stat collection
- 12:01 PM Bug #36739 (Resolved): ENOENT in collection_move_rename on EC backfill target
- 12:01 PM Backport #38880 (Resolved): luminous: ENOENT in collection_move_rename on EC backfill target
- 12:00 PM Bug #39006 (Resolved): ceph tell osd.xx bench help : gives wrong help
- 11:59 AM Backport #39373 (Resolved): luminous: ceph tell osd.xx bench help : gives wrong help
- 10:32 AM Feature #39339: prioritize backfill of metadata pools, automatically
- since this is only going to be backported to nautilus and since there are two PRs involved, and since one of those PR...
- 08:57 AM Backport #40949 (Resolved): mimic: Better default value for osd_snap_trim_sleep
- https://github.com/ceph/ceph/pull/29732
- 08:57 AM Backport #40948 (Resolved): nautilus: Better default value for osd_snap_trim_sleep
- https://github.com/ceph/ceph/pull/29678
- 08:57 AM Backport #40947 (Resolved): luminous: Better default value for osd_snap_trim_sleep
- https://github.com/ceph/ceph/pull/31857
- 08:56 AM Backport #40943 (Resolved): mimic: mon/OSDMonitor.cc: better error message about min_size
- https://github.com/ceph/ceph/pull/29618
- 08:56 AM Backport #40942 (Resolved): nautilus: mon/OSDMonitor.cc: better error message about min_size
- https://github.com/ceph/ceph/pull/29617
- 08:56 AM Backport #40941 (Rejected): luminous: mon/OSDMonitor.cc: better error message about min_size
- 07:28 AM Tasks #40937 (New): Problem "open vSwitch" networkbond set_numa_affinity
- Hello,
after installing ceph 14.2.1 (Proxmox 6.0-4-6.0) i have the following message example in syslog when starti...
07/24/2019
- 11:01 PM Backport #40654: mimic: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29174
merged - 10:59 PM Backport #38552: mimic: core: lazy omap stat collection
- Brad Hubbard wrote:
> https://github.com/ceph/ceph/pull/29189
merged - 10:43 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28110
merged - 10:43 PM Backport #39373: luminous: ceph tell osd.xx bench help : gives wrong help
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28112
merged - 10:06 PM Feature #39339 (In Progress): prioritize backfill of metadata pools, automatically
- Sorry, https://github.com/ceph/ceph/pull/29181 is yet to merge.
- 10:03 PM Feature #39339 (Pending Backport): prioritize backfill of metadata pools, automatically
- One backport for nautilus: https://github.com/ceph/ceph/pull/29275
- 09:16 PM Bug #40785 (Need More Info): In case of osd full scenario 100% pgs went to unknown state, when ad...
- Which ceph version are you running? Can you provide the "ceph -s" output?
- 08:27 PM Bug #40791: high variance in pg size
- Jan Fajerski wrote:
> Greg Farnum wrote:
> > PGs split by splitting their hash range in half. So if you have not-a-... - 11:07 AM Bug #40791: high variance in pg size
- Greg Farnum wrote:
> PGs split by splitting their hash range in half. So if you have not-a-power-of-two, some of the... - 01:34 PM Bug #40825: test_osd_came_back (tasks.mgr.test_progress.TestProgress) ... FAIL
- ...
- 11:12 AM Backport #40502 (Need More Info): luminous: osd: rollforward may need to mark pglog dirty
- Is this a follow-on fix for https://github.com/ceph/ceph/pull/27015 which is only in master? Please clarify.
Marki... - 11:12 AM Backport #40503 (Need More Info): mimic: osd: rollforward may need to mark pglog dirty
- Is this a follow-on fix for https://github.com/ceph/ceph/pull/27015 which is only in master? Please clarify.
Marki... - 11:11 AM Backport #40504 (Need More Info): nautilus: osd: rollforward may need to mark pglog dirty
- Is this a follow-on fix for https://github.com/ceph/ceph/pull/27015 which is only in master? Please clarify.
Marki... - 11:11 AM Bug #40403: osd: rollforward may need to mark pglog dirty
- Is this a follow-on fix for https://github.com/ceph/ceph/pull/27015 which is only in master? Please clarify.
Marki... - 11:05 AM Backport #40465 (In Progress): nautilus: osd beacon sometimes has empty pg list
- 11:04 AM Backport #40464 (In Progress): mimic: osd beacon sometimes has empty pg list
- 11:03 AM Backport #40180 (In Progress): nautilus: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
- 11:02 AM Backport #40179 (In Progress): mimic: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
- 10:59 AM Backport #38856 (In Progress): mimic: should set EPOLLET flag on del_event()
- 10:59 AM Backport #38852 (In Progress): mimic: .mgrstat failed to decode mgrstat state; luminous dev version?
- 10:56 AM Backport #38436 (In Progress): luminous: crc cache should be invalidated when posting preallocate...
- 10:54 AM Backport #38437 (In Progress): mimic: crc cache should be invalidated when posting preallocated r...
- 10:50 AM Backport #38351 (In Progress): mimic: Limit loops waiting for force-backfill/force-recovery to ha...
- 10:49 AM Backport #40274 (In Progress): nautilus: librados 'buffer::create' and related functions are not ...
- https://github.com/ceph/ceph/pull/29244
- 10:30 AM Documentation #38896 (Resolved): Minor rados related documentation fixes
- 10:29 AM Backport #38902 (Resolved): luminous: Minor rados related documentation fixes
- 10:27 AM Backport #38610 (In Progress): luminous: mon: osdmap prune
- 09:11 AM Backport #38277 (In Progress): mimic: osd_map_message_max default is too high?
- 09:04 AM Backport #38206 (In Progress): mimic: osds allows to partially start more than N+2
- 08:56 AM Backport #38163 (Need More Info): mimic: maybe_remove_pg_upmaps incorrectly cancels valid pending...
- part of a complicated, interrelated set of PRs - assigning to the author of the luminous backport https://github.com/...
- 08:17 AM Bug #40835: OSDCap.PoolClassRNS test aborts
- 01:24 AM Feature #40528 (Pending Backport): Better default value for osd_snap_trim_sleep
- 12:12 AM Bug #40915 (Pending Backport): Update rocksdb to v6.1.2
07/23/2019
- 09:16 PM Bug #40915 (Fix Under Review): Update rocksdb to v6.1.2
- 09:16 PM Bug #40915 (Resolved): Update rocksdb to v6.1.2
- 08:10 PM Bug #40910 (Resolved): mon/OSDMonitor.cc: better error message about min_size
- 07:49 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- It seems this will be in 14.2.3. When the fix comes out, will my crashed OSDs work again, or should I just purge and ...
- 03:10 PM Backport #39694 (Need More Info): luminous: _txc_add_transaction error (39) Directory not empty n...
- non-trivial backport
- 03:01 PM Backport #39692 (In Progress): mimic: _txc_add_transaction error (39) Directory not empty not han...
- 09:42 AM Backport #38450 (Need More Info): mimic: src/osd/OSDMap.h: 1065: FAILED assert(__null != pool)
- A naive backport - https://github.com/ceph/ceph/pull/26594 - was closed because it resulted in the following build fa...
- 09:06 AM Bug #40198 (Resolved): Setting noscrub causing extraneous deep scrubs
- 09:06 AM Backport #40265 (Resolved): nautilus: Setting noscrub causing extraneous deep scrubs
- 08:24 AM Backport #40891 (Resolved): nautilus: Pool settings aren't populated to OSD after restart.
- https://github.com/ceph/ceph/pull/32123
- 08:24 AM Backport #40890 (Resolved): mimic: Pool settings aren't populated to OSD after restart.
- https://github.com/ceph/ceph/pull/32125
- 08:24 AM Backport #40889 (Rejected): luminous: Pool settings aren't populated to OSD after restart.
- 08:23 AM Backport #40885 (Resolved): nautilus: ceph mgr module ls -f plain crashes mon
- https://github.com/ceph/ceph/pull/29566
- 08:23 AM Backport #40884 (Resolved): mimic: ceph mgr module ls -f plain crashes mon
- https://github.com/ceph/ceph/pull/29593
- 08:23 AM Backport #40883 (Rejected): luminous: ceph mgr module ls -f plain crashes mon
- 08:22 AM Backport #39475 (Resolved): mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- 08:18 AM Backport #40650 (Resolved): luminous: os/bluestore: fix >2GB writes
- 08:18 AM Backport #40651 (Resolved): mimic: os/bluestore: fix >2GB writes
- 08:16 AM Bug #40720 (In Progress): mimic, nautilus: make bitmap allocator the default allocator for bluestore
- Changing status to "In Progress" so the backport-create-issue script doesn't create backport issues.
- 08:15 AM Bug #40720: mimic, nautilus: make bitmap allocator the default allocator for bluestore
- master/nautilus PR containing the commit to be backported: https://github.com/ceph/ceph/pull/21825
mimic backport PR... - 08:10 AM Bug #38682 (Resolved): should report EINVAL in ErasureCode::parse() if m<=0
- 08:10 AM Backport #38751 (Resolved): mimic: should report EINVAL in ErasureCode::parse() if m<=0
- 04:37 AM Backport #38551 (In Progress): luminous: core: lazy omap stat collection
- https://github.com/ceph/ceph/pull/29190
- 03:44 AM Backport #38552 (In Progress): mimic: core: lazy omap stat collection
- https://github.com/ceph/ceph/pull/29189
- 03:14 AM Backport #40744 (In Progress): nautilus: core: lazy omap stat collection
- Nautilus only requires https://github.com/ceph/ceph/pull/28070 as it already has https://github.com/ceph/ceph/pull/26...
- 12:15 AM Feature #39339 (Fix Under Review): prioritize backfill of metadata pools, automatically
- https://github.com/ceph/ceph/pull/29180
https://github.com/ceph/ceph/pull/29181
07/22/2019
- 09:12 PM Backport #40265: nautilus: Setting noscrub causing extraneous deep scrubs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28768
merged - 07:01 PM Bug #40720 (Pending Backport): mimic, nautilus: make bitmap allocator the default allocator for b...
- Luminous PR: https://github.com/ceph/ceph/pull/28972
- 06:40 PM Bug #40804 (Pending Backport): ceph mgr module ls -f plain crashes mon
- 06:40 PM Bug #40483 (Pending Backport): Pool settings aren't populated to OSD after restart.
- 06:38 PM Bug #40635 (Resolved): IndexError: list index out of range in thrash_pg_upmap
- 05:37 PM Feature #40870 (Resolved): Implement mon_memory_target
- Use the priority cache tuner for mon caches. Also implement a config observer to handle changes to mon cache sizes.
- 05:28 PM Backport #40653 (In Progress): luminous: Lower the default value of osd_deep_scrub_large_omap_obj...
- https://github.com/ceph/ceph/pull/29175
- 05:25 PM Backport #40654 (In Progress): mimic: Lower the default value of osd_deep_scrub_large_omap_object...
- https://github.com/ceph/ceph/pull/29174
- 05:21 PM Backport #40655 (In Progress): nautilus: Lower the default value of osd_deep_scrub_large_omap_obj...
- https://github.com/ceph/ceph/pull/29173
- 03:31 PM Bug #40868 (New): src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
- qa/workunits/rbd/test_librbd_python.sh failure...
- 02:01 PM Backport #39513 (Resolved): mimic: osd: segv in _preboot -> heartbeat
- 01:58 PM Backport #39311 (Resolved): mimic: crushtool crash on Fedora 28 and newer
- 01:56 PM Backport #39374 (Resolved): mimic: ceph tell osd.xx bench help : gives wrong help
- 01:56 PM Bug #39154 (Resolved): Don't mark removed osds in when running "ceph osd in any|all|*"
- 01:56 PM Backport #39422 (Resolved): mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
- 01:55 PM Bug #38034 (Resolved): pg stuck in backfill_wait with plenty of disk space
- 01:55 PM Backport #38341 (Resolved): mimic: pg stuck in backfill_wait with plenty of disk space
- 10:39 AM Backport #40840 (Need More Info): nautilus: Explicitly requested repair of an inconsistent PG can...
- non-trivial because it depends on d938b28565c801b1a6de8e8ce585f2389595311b which itself does not apply to nautilus cl...
- 08:20 AM Backport #40840 (Resolved): nautilus: Explicitly requested repair of an inconsistent PG cannot be...
- https://github.com/ceph/ceph/pull/29748
- 10:30 AM Backport #40639 (Resolved): mimic: osd: report omap/data/metadata usage
- 09:52 AM Backport #40744 (Need More Info): nautilus: core: lazy omap stat collection
- Requires backport of https://github.com/ceph/ceph/pull/26614 and https://github.com/ceph/ceph/pull/28070 - the latter...
- 09:52 AM Backport #38552 (Need More Info): mimic: core: lazy omap stat collection
- -not sure of the status here?-
https://github.com/ceph/ceph/pull/28070 is non-trivial, hence assigned to the devel... - 09:51 AM Backport #38551 (Need More Info): luminous: core: lazy omap stat collection
- -not sure of the status here?-
https://github.com/ceph/ceph/pull/28070 is non-trivial, hence assigned to the devel... - 04:03 AM Bug #40835 (New): OSDCap.PoolClassRNS test aborts
- 02:59 AM Bug #40835 (Can't reproduce): OSDCap.PoolClassRNS test aborts
- 12:01 AM Bug #40835: OSDCap.PoolClassRNS test aborts
- ...
- 12:00 AM Bug #40835 (Resolved): OSDCap.PoolClassRNS test aborts
- ...
07/19/2019
- 07:50 PM Bug #40635: IndexError: list index out of range in thrash_pg_upmap
- https://github.com/ceph/ceph/pull/29144 too
- 07:35 PM Bug #40777: hit assert in AuthMonitor::update_from_paxos
- Ah, ENOENT might be a code bug. Unless you have debug logs of the monitor from when it was writing that data to disk ...
- 07:31 PM Bug #40791: high variance in pg size
- Lars Marowsky-Brée wrote:
> This is Luminous, 12.2.12 by now.
>
> Balancing on bytes (reweight-by-utilization) wa... - 12:30 PM Bug #40791: high variance in pg size
- Jan Fajerski wrote:
> Greg Farnum wrote:
> > It sure looks like the PG count isn't a power of two, so some of them ... - 06:44 AM Bug #40791: high variance in pg size
- Greg Farnum wrote:
> It sure looks like the PG count isn't a power of two, so some of them are simply half size comp... - 06:24 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- Edward Kalk wrote:
> Sometimes when this happens, the OSDs repeatedly crash and Linux system prevents them from bein... - 11:18 AM Bug #40831 (New): compression segfaults with zstd 1.3.8 and incompatibilities with zstd 1.4.0
- Hey,
I'm currently working on packaging ceph 14.2.1 for Arch Linux (still some kinks to work out, once that is don... - 10:46 AM Bug #24835: osd daemon spontaneous segfault
- Thanks for the response Soenke. If we haven't seen it again in a few months time I guess we can close this.
- 10:40 AM Bug #24835: osd daemon spontaneous segfault
- We haven't seen this bug for 6 weeks now after updating to Nautilus (and changing configuration from ceph.conf to cep...
- 02:42 AM Bug #24835: osd daemon spontaneous segfault
- Soenke or Christian,
Are you still seeing this issue? - 08:15 AM Bug #24419: ceph-objectstore-tool unable to open mon store
- I have the same problem.
- 02:38 AM Bug #36250 (Can't reproduce): ceph-osd process crashing
- 02:36 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
07/18/2019
- 10:01 PM Bug #38841: Objects degraded higher than 100%
The number of degraded objects is based on object replicas not the number of objects. So let's say every pool is h...- 09:51 PM Bug #40825: test_osd_came_back (tasks.mgr.test_progress.TestProgress) ... FAIL
- 09:36 PM Bug #40825 (Duplicate): test_osd_came_back (tasks.mgr.test_progress.TestProgress) ... FAIL
- the test marks osd out, verifies a progress event is there, then marks it in, and asserts that there are no progress ...
- 07:52 PM Backport #39475: mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28206
merged - 07:49 PM Backport #40651: mimic: os/bluestore: fix >2GB writes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28967
merged - 07:48 PM Bug #40720: mimic, nautilus: make bitmap allocator the default allocator for bluestore
- merged https://github.com/ceph/ceph/pull/28970
- 07:47 PM Backport #38751: mimic: should report EINVAL in ErasureCode::parse() if m<=0
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28995
merged - 07:45 PM Backport #39693 (In Progress): nautilus: _txc_add_transaction error (39) Directory not empty not ...
- https://github.com/ceph/ceph/pull/29115
- 04:45 PM Bug #40820: standalone/scrub/osd-scrub-test.sh +3 day failed assert
The prior osdmap was issuing these messages for 7 seconds....- 04:42 PM Bug #40820: standalone/scrub/osd-scrub-test.sh +3 day failed assert
This test gives the mon 2 seconds to propagate changes. A scrub_min_interval change to a pool probably didn't reac...- 04:01 PM Bug #40820 (Closed): standalone/scrub/osd-scrub-test.sh +3 day failed assert
- ...
- 02:07 PM Bug #40755 (Resolved): _txc_add_transaction error (2) No such file or directory not handled on op...
- 04:24 AM Bug #40777: hit assert in AuthMonitor::update_from_paxos
- Greg Farnum wrote:
> That assert means there was a read error when the monitor tried to get data off of disk. Check ... - 04:03 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- I understand, thanks Han.
- 12:16 AM Bug #39304 (Resolved): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_a...
- 12:16 AM Backport #39720 (Resolved): mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3...
07/17/2019
- 11:01 PM Backport #39513: mimic: osd: segv in _preboot -> heartbeat
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28220
merged - 10:44 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- once this is backported at released (#39693) we should confirm this fixes the problematic osd
- 10:29 PM Bug #40809 (New): qa: "Failed to send signal 1: None" in rados
- Run: http://pulpito.ceph.com/yuriw-2019-07-15_19:24:27-rados-wip-yuri4-testing-2019-07-15-1517-mimic-distro-basic-smi...
- 10:18 PM Backport #39311: mimic: crushtool crash on Fedora 28 and newer
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27986
merged - 10:17 PM Backport #39720: mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_...
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/28089
merged - 10:17 PM Bug #40791: high variance in pg size
- This is Luminous, 12.2.12 by now.
Balancing on bytes (reweight-by-utilization) was unable to resolve the issue pre... - 09:16 PM Bug #40791: high variance in pg size
- It sure looks like the PG count isn't a power of two, so some of them are simply half size compared to the others. (S...
- 09:15 PM Bug #40791 (Need More Info): high variance in pg size
- Which ceph version are you using?
- 10:17 PM Backport #39374: mimic: ceph tell osd.xx bench help : gives wrong help
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28097
merged - 10:16 PM Backport #39422: mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28142
merged - 10:13 PM Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28201
merged - 09:33 PM Bug #23879: test_mon_osdmap_prune.sh fails
Another time on mimic so I assume Nautilus needs a fix too.
http://qa-proxy.ceph.com/teuthology/yuriw-2019-07-09_1...- 09:26 PM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- ...
- 09:19 PM Bug #40726: "OSD::osd_op_tp thread 0x7f6dafcf0700' had timed out after 15"
- This happens occasionally on Mira nodes; but if it pops up repeatedly on the same node or test suite that may be evid...
- 09:15 PM Bug #40774 (Resolved): mon: interval_set.h: 490: FAILED ceph_assert(p->first > start+len)
- 09:12 PM Bug #40777 (Closed): hit assert in AuthMonitor::update_from_paxos
- That assert means there was a read error when the monitor tried to get data off of disk. Check your disk!
- 08:56 PM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
http://qa-proxy.ceph.com/teuthology/yuriw-2019-07-09_15:21:18-rados-wip-yuri-testing-2019-07-08-2007-mimic-distro-b...- 08:56 PM Bug #40070 (Rejected): mon/OSDMonitor: target_size_bytes integer overflow
- this is by design. the target_size is new in nautilus, so we don't encode it in the map until require_osd_release >=...
- 08:54 PM Bug #40081: mon: luminous crash attempting to decode maps after nautilus quorum has been formed
- https://github.com/ceph/ceph/pull/28672 (nautilus backport PR)
- 08:43 PM Bug #40000: osds do not bound xattrs and/or aggregate xattr data in pg log
- from the ML,...
- 08:42 PM Bug #40000 (Need More Info): osds do not bound xattrs and/or aggregate xattr data in pg log
- The message dump is 260M (once de-hexified), but the decode of the pg_log_t in the message indicates it is 2484154195...
- 05:50 PM Bug #40483 (Fix Under Review): Pool settings aren't populated to OSD after restart.
- https://github.com/ceph/ceph/pull/29093
- 04:57 PM Bug #40755 (Fix Under Review): _txc_add_transaction error (2) No such file or directory not handl...
- https://github.com/ceph/ceph/pull/29092
- 03:45 PM Bug #40793 (Rejected): mgr mon commands pile up
- This was a side-effect of #40792. A targeted mon command was queued for down mon, which forced the MonClient to keep...
- 03:38 PM Bug #40792 (Fix Under Review): monc: send_command to specific down mon breaks other mon msgs
- https://github.com/ceph/ceph/pull/29090
- 02:45 PM Bug #40804 (Fix Under Review): ceph mgr module ls -f plain crashes mon
- https://github.com/ceph/ceph/pull/29089
- 02:28 PM Bug #40804 (Resolved): ceph mgr module ls -f plain crashes mon
- 11:05 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- Hi Brad Hubbard
We manually generated some buckets after deployed the cluster.In order to avoid id repetition,we... - 04:50 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- 04:49 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- qingbo han wrote:
> hi Brad Hubbard:
> I think your theory is correct. I run ceph pg query correctly when I uli...
07/16/2019
- 05:27 PM Bug #40793 (Rejected): mgr mon commands pile up
- on lab cluster, a mon was down for a few days. on restart,...
- 05:24 PM Bug #40792 (Resolved): monc: send_command to specific down mon breaks other mon msgs
- On lab cluser, mgr regularly sends mgrbeacons. all is fine.
but, if one mon is down, *and* we send the smart scra... - 05:00 PM Bug #40791 (Closed): high variance in pg size
- We're seeing a cluster that has a history of being very unbalanced in terms of OSD utilisation. The balancer in upmap...
- 03:08 PM Bug #40620 (Pending Backport): Explicitly requested repair of an inconsistent PG cannot be schedu...
- 03:06 PM Bug #40635 (Fix Under Review): IndexError: list index out of range in thrash_pg_upmap
- https://github.com/ceph/ceph/pull/29069
- 03:02 PM Bug #40635: IndexError: list index out of range in thrash_pg_upmap
- Looks like this triggers when there are no pools, and the pg dump pg_stats is thus empty.
- 02:45 PM Bug #40635: IndexError: list index out of range in thrash_pg_upmap
- /a/sage-2019-07-15_19:52:54-rados-wip-sage-testing-2019-07-15-0918-distro-basic-smithi/4121793
- 09:18 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- hi Brad Hubbard:
I think your theory is correct. I run ceph pg query correctly when I ulimit -s 16384.You said s... - 04:28 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- Hello Han,
Many thanks to Radoslaw Zarzynski for the fruitful discussion we had regarding this issue last night. I... - 08:14 AM Bug #40785 (Need More Info): In case of osd full scenario 100% pgs went to unknown state, when ad...
- After populating the more data, osds were being nearfull and full. When added more storage in this situation, all pgs...
- 02:14 AM Bug #40777: hit assert in AuthMonitor::update_from_paxos
- ...
07/15/2019
- 07:41 PM Backport #40639: mimic: osd: report omap/data/metadata usage
- Josh Durgin wrote:
> https://github.com/ceph/ceph/pull/28852
merged - 05:49 PM Bug #40774 (Fix Under Review): mon: interval_set.h: 490: FAILED ceph_assert(p->first > start+len)
- https://github.com/ceph/ceph/pull/29051
- 04:23 PM Bug #40777: hit assert in AuthMonitor::update_from_paxos
- Is this reproducible? If so, can you add mon logs (ideally both for peons and leader), at 'debug mon = 10', 'debug pa...
- 09:17 AM Bug #40777 (New): hit assert in AuthMonitor::update_from_paxos
- I created the ceph cluster by the rook(https://github.com/rook/rook), and ceph version is 12.2.7 stable.
After I reb... - 01:29 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- Still looking into this. The issue in the new core is the same as the original coredump.
07/13/2019
- 04:27 PM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- Seems on mimic as well
http://pulpito.ceph.com/teuthology-2019-07-13_06:00:03-smoke-mimic-testing-basic-smithi/
...
07/12/2019
- 11:24 PM Bug #40774: mon: interval_set.h: 490: FAILED ceph_assert(p->first > start+len)
- similar failure: /ceph/teuthology-archive/pdonnell-2019-07-11_22:52:33-fs-wip-pdonnell-testing-20190711.203149-distro...
- 11:23 PM Bug #40774 (Resolved): mon: interval_set.h: 490: FAILED ceph_assert(p->first > start+len)
- While removing snapshots:...
- 11:07 PM Bug #40772: mon: pg size change delayed 1 minute because osdmap 35 delay
- Kefu can you take a look? See the attached monitor logs.
- 10:48 PM Bug #40772: mon: pg size change delayed 1 minute because osdmap 35 delay
This looks to be a monitor issue. We see that osdmap 35 may be getting hung up during the critical period 00:29:49 ...- 09:21 PM Bug #40772 (New): mon: pg size change delayed 1 minute because osdmap 35 delay
osd-recovery-prio.sh TEST_recovery_pool_priority fails intermittently due to a delay in recovery starting on a pg. ...- 10:21 PM Bug #40725 (Resolved): osd-scrub-snaps.sh fails
- 02:19 AM Bug #40725 (Fix Under Review): osd-scrub-snaps.sh fails
- 09:11 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
- ...
- 03:24 PM Bug #40765 (Duplicate): mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
- Run: http://pulpito.ceph.com/teuthology-2019-07-12_06:00:03-smoke-mimic-testing-basic-smithi/
Jobs: '4113997', '4113... - 02:08 PM Bug #40755 (Resolved): _txc_add_transaction error (2) No such file or directory not handled on op...
- ...
- 01:58 PM Bug #40635: IndexError: list index out of range in thrash_pg_upmap
- /a/sage-2019-07-11_17:46:52-rados-wip-sage-testing-2019-07-11-1048-distro-basic-smithi/4111022
- 12:33 PM Bug #38124 (Resolved): OSD down on snaptrim.
- 12:33 PM Backport #39698 (Resolved): mimic: OSD down on snaptrim.
07/11/2019
- 10:35 PM Backport #40638 (In Progress): luminous: osd: report omap/data/metadata usage
- 10:34 PM Backport #40638 (Duplicate): luminous: osd: report omap/data/metadata usage
- 02:00 PM Backport #40638 (In Progress): luminous: osd: report omap/data/metadata usage
- 10:34 PM Feature #38550 (Duplicate): osd: Implement lazy omap usage statistics per osd
- 10:28 PM Backport #40744 (Resolved): nautilus: core: lazy omap stat collection
- https://github.com/ceph/ceph/pull/29188
- 10:22 PM Feature #38136: core: lazy omap stat collection
- Requires backport of https://github.com/ceph/ceph/pull/26614 and https://github.com/ceph/ceph/pull/28070
- 10:22 PM Backport #38552 (In Progress): mimic: core: lazy omap stat collection
- Requires backport of https://github.com/ceph/ceph/pull/26614 and https://github.com/ceph/ceph/pull/28070
- 10:21 PM Backport #38551 (In Progress): luminous: core: lazy omap stat collection
- Requires backport of https://github.com/ceph/ceph/pull/26614 and https://github.com/ceph/ceph/pull/28070
- 07:24 PM Backport #40650: luminous: os/bluestore: fix >2GB writes
- Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/28965
merged - 04:39 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- Rebooted node 4, on node 1 and 2, 2 OSDs each crashed and will not start.
The logs are similar, seems to be the BUG ... - 02:36 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- ^^this results in the Production VMs becoming unresponsive as their disks are unavailable when we have multiple OSDs ...
- 02:33 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- Sometimes when this happens, the OSDs repeatedly crash and Linux system prevents them from being started. it takes 10...
- 02:28 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- Was Bug 38724:
```ceph-osd.9.log: -3> 2019-07-11 09:15:13.569 7fc7b8243700 -1 bluestore(/var/lib/ceph/osd/ceph-9... - 02:28 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- OSD 9, 15, 10, 13 crashed this AM.
```ceph.log:2019-07-11 09:15:15.501601 mon.synergy0 (mon.0) 4248 : cluster [IN... - 04:28 PM Bug #40740 (New): "Error: finished tid 3 when last_acked_tid was 5" in upgrade:luminous-x-mimic
- Run: http://pulpito.ceph.com/teuthology-2019-07-11_02:25:02-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 41101... - 03:18 PM Backport #39693: nautilus: _txc_add_transaction error (39) Directory not empty not handled on ope...
- Edward Kalk wrote:
> found a few things that seem like fixes for this on github... : https://github.com/ceph/ceph/pu... - 02:50 PM Backport #38276 (Resolved): luminous: osd_map_message_max default is too high?
- 02:36 PM Backport #38751 (In Progress): mimic: should report EINVAL in ErasureCode::parse() if m<=0
- 02:34 PM Backport #38750 (Resolved): luminous: should report EINVAL in ErasureCode::parse() if m<=0
- 02:01 PM Backport #40639 (In Progress): mimic: osd: report omap/data/metadata usage
- 01:59 PM Backport #40730 (In Progress): nautilus: mon: auth mon isn't loading full KeyServerData after res...
- 01:58 PM Backport #40730 (Resolved): nautilus: mon: auth mon isn't loading full KeyServerData after restart
- https://github.com/ceph/ceph/pull/28993
- 01:58 PM Backport #40732 (Resolved): mimic: mon: auth mon isn't loading full KeyServerData after restart
- https://github.com/ceph/ceph/pull/30181
- 01:58 PM Backport #40731 (Rejected): luminous: mon: auth mon isn't loading full KeyServerData after restart
- 01:13 PM Backport #39537 (In Progress): luminous: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent...
- 01:12 PM Backport #39538 (Resolved): mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->ge...
- 01:12 PM Bug #39582 (Resolved): Binary data in OSD log from "CRC header" message
- 01:12 PM Backport #39737 (Resolved): mimic: Binary data in OSD log from "CRC header" message
- 01:11 PM Backport #39744 (Resolved): mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos res...
- 10:21 AM Bug #40726 (New): "OSD::osd_op_tp thread 0x7f6dafcf0700' had timed out after 15"
- osd.7 was marked down by itself because of unhealthy heartbeat....
- 04:37 AM Bug #40725: osd-scrub-snaps.sh fails
- David, mind taking a look?
- 04:37 AM Bug #40725 (Resolved): osd-scrub-snaps.sh fails
- ...
- 01:54 AM Feature #40420: Introduce an ceph.conf option to disable HEALTH_WARN when nodeep-scrub/scrub flag...
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-June/035406.html
https://pad.ceph.com/p/health-mute - 01:18 AM Bug #40641: OSD failure after PGInfo back to previous versions, resulting in PGLog error rollback
- Neha Ojha wrote:
> How did you find out there were unrecoverable objects? Was there any indication in the logs?
T...
07/10/2019
- 09:10 PM Bug #40641: OSD failure after PGInfo back to previous versions, resulting in PGLog error rollback
- How did you find out there were unrecoverable objects? Was there any indication in the logs?
- 09:06 PM Bug #40674 (Resolved): TEST_corrupt_snapset_scrub_rep fails
- 09:04 PM Bug #40718 (Duplicate): touch in txn on (old) nautilus osd
- 02:44 PM Bug #40718 (Duplicate): touch in txn on (old) nautilus osd
- ...
- 07:51 PM Bug #40722: "IOError: [Errno 2] No such file or directory: '/tmp/pip-build-o9ggCd/unknown/setup.p...
- @Alfredo can you pls take a look?
- 07:00 PM Bug #40722 (New): "IOError: [Errno 2] No such file or directory: '/tmp/pip-build-o9ggCd/unknown/s...
- Run: http://pulpito.ceph.com/teuthology-2019-07-10_05:10:03-ceph-disk-mimic-distro-basic-mira/
Jobs: '4108064', '410... - 05:39 PM Bug #40721: backfill caught in loop from block
- original blocked request is...
- 04:42 PM Bug #40721: backfill caught in loop from block
- actually, this retry is triggered on every osdmap.
- 04:42 PM Bug #40721 (Can't reproduce): backfill caught in loop from block
- ...
- 05:08 PM Bug #40720 (Fix Under Review): mimic, nautilus: make bitmap allocator the default allocator for b...
- 04:29 PM Bug #40720 (Resolved): mimic, nautilus: make bitmap allocator the default allocator for bluestore
- The default for nautilus is already bitmap allocator.
We might just need to cherry-pick 231b7dd9c5dc1d22e93a8f81d07e... - 03:49 PM Backport #40650 (In Progress): luminous: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28965
- 03:49 PM Backport #40651 (In Progress): mimic: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28967
- 03:48 PM Backport #40652 (In Progress): nautilus: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28966
- 03:11 PM Bug #40712: ceph-mon crash with assert(err == 0) after rocksdb->get
- I also opened an issue in rocksdb: https://github.com/facebook/rocksdb/issues/5558, and I attached the db file in thi...
- 12:18 PM Bug #40712 (New): ceph-mon crash with assert(err == 0) after rocksdb->get
- (1)I found a very strange problem in our environment that the ceph-mon crashed with below error in log:...
- 03:02 PM Backport #39693: nautilus: _txc_add_transaction error (39) Directory not empty not handled on ope...
- found a few things that seem like fixes for this on github... : https://github.com/ceph/ceph/pull/27929/commits
- 02:06 PM Backport #39693: nautilus: _txc_add_transaction error (39) Directory not empty not handled on ope...
- Will this fix be included in : https://tracker.ceph.com/projects/ceph/roadmap#v14.2.2 ?
- 03:01 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- found a few things that seem like fixes for this on github... : https://github.com/ceph/ceph/pull/27929/commits
- 02:49 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- We hit this bug again : "2019-07-10 09:16:27.728 7f73b844c700 -1 bluestore(/var/lib/ceph/osd/ceph-5) _txc_add_transac...
- 02:06 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- will this be included in : https://tracker.ceph.com/projects/ceph/roadmap#v14.2.2 . ?
- 11:56 AM Bug #39555 (In Progress): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- oops, reverting - I had not seen Joao's question
- 11:54 AM Bug #39555 (Pending Backport): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- 06:11 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- hi Brad Hubbard
I failed to reproduce segfault in python several times.I had upload coredump in ceph, the id ...
07/09/2019
- 08:53 PM Backport #39693: nautilus: _txc_add_transaction error (39) Directory not empty not handled on ope...
- I dropped notes in "http://tracker.ceph.com/issues/38724". Not sure I understand the status. "pending backport" says ...
- 04:30 PM Backport #38276: luminous: osd_map_message_max default is too high?
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28640
merged - 04:26 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- I am confused by the "Copied to RADOS - Backport #39693: nautilus" status. "Pending Backport 07/03/2019"
Does this m... - 04:15 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- We have hit this as well, it was triggered when I rebooted a node. A few OSD on other hosts crashed. Here's some log:...
- 04:04 PM Backport #38750: luminous: should report EINVAL in ErasureCode::parse() if m<=0
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28111
merged - 02:21 PM Bug #40634 (Pending Backport): mon: auth mon isn't loading full KeyServerData after restart
- 10:50 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- The pull request provided with the fix has been merged (https://github.com/ceph/ceph/pull/28204). Does anyone still s...
- 01:58 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- Hello Han,
I don't see any glaring differences in the binaries so far but I did notice this in the dmesg output.
...
07/08/2019
- 02:05 PM Bug #40692 (New): Ceph daemons failing to start when large unix groups exist
- While tracking down this [1] error I found where the error came from in the [2] code and looked into the getgrnam_r f...
07/07/2019
- 02:48 AM Bug #38827 (Resolved): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandl...
07/05/2019
- 02:54 PM Bug #40674 (Fix Under Review): TEST_corrupt_snapset_scrub_rep fails
- https://github.com/ceph/ceph/pull/28901
- 03:38 AM Bug #40674 (Resolved): TEST_corrupt_snapset_scrub_rep fails
- ...
07/04/2019
- 05:59 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
- Anyway, I generated the ceph.conf file used on client machines using the command "ceph config generate-minimal-conf"....
- 05:49 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
- As described by Manuel Rios in https://tracker.ceph.com/issues/40282 , the workaround is include configs:
public_n... - 02:36 PM Bug #20973: src/osdc/ Objecter.cc: 3106: FAILED assert(check_latest_map_ops.find(op->tid) == chec...
- ...
- 12:58 AM Bug #40641: OSD failure after PGInfo back to previous versions, resulting in PGLog error rollback
- Neha Ojha wrote:
> Did you see a crash in the logs somewhere? Can you tell us which osd failed and why and also atta... - 12:53 AM Backport #40667 (In Progress): nautilus: PG scrub stamps reset to 0.000000
07/03/2019
- 11:43 PM Bug #40668 (Resolved): mon_osd_report_timeout should not be allowed to be less than 2x the value ...
- We should have a safety built in that will not allow the mon_osd_report_timeout to be set less than a value that is 2...
- 10:48 PM Feature #40640: Network ping monitoring
See also https://pad.ceph.com/p/Network_ping_monitoring
Examples, with warning threshold set to 1 microsecond.
...- 12:52 AM Feature #40640 (Resolved): Network ping monitoring
The simplest version of this would be to see warnings if heartbeat ping response time exceeds certain thresholds.- 10:35 PM Backport #40667 (Resolved): nautilus: PG scrub stamps reset to 0.000000
- https://github.com/ceph/ceph/pull/28869
- 10:29 PM Bug #40073 (Pending Backport): PG scrub stamps reset to 0.000000
- 09:51 PM Bug #40666 (New): osd fails to get latest map
- ...
- 09:35 PM Bug #40483: Pool settings aren't populated to OSD after restart.
- 09:34 PM Fix #40564: Objecter does not have perfcounters for op latency
- The TMAP* operations are obsolete and deprecated/removed. Adding some latency stats would be useful, though. Update...
- 09:30 PM Bug #40622 (Resolved): PG stuck in active+clean+remapped
- This looks like crush is just failing to find a good replica because 50% of the osds in a rack are down. Try using t...
- 09:29 PM Bug #40620 (Fix Under Review): Explicitly requested repair of an inconsistent PG cannot be schedu...
- 01:45 AM Bug #40620: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD...
- PR: https://github.com/ceph/ceph/pull/28839
- 09:28 PM Bug #40641 (Need More Info): OSD failure after PGInfo back to previous versions, resulting in PGL...
- Did you see a crash in the logs somewhere? Can you tell us which osd failed and why and also attach the osd logs?
- 03:50 AM Bug #40641 (Need More Info): OSD failure after PGInfo back to previous versions, resulting in PGL...
- Ceph Version 12.2.7...
- 09:24 PM Bug #40635: IndexError: list index out of range in thrash_pg_upmap
- side node: using random.choose(seq) would do the same thing
- 06:35 PM Bug #40662 (Rejected): Too many deep scrubs with noscrub set and nodeep-scrub unset
- 06:06 PM Bug #40662 (Rejected): Too many deep scrubs with noscrub set and nodeep-scrub unset
We intended to add a 1 hour backoff to scrub handling when noscrub is set. This will result in too many deep scrub...- 05:18 PM Bug #38403 (Duplicate): osd: leaked from OSDMap::apply_incremental
- #20491
- 01:53 PM Backport #40655 (Resolved): nautilus: Lower the default value of osd_deep_scrub_large_omap_object...
- https://github.com/ceph/ceph/pull/29173
- 01:53 PM Backport #40654 (Resolved): mimic: Lower the default value of osd_deep_scrub_large_omap_object_ke...
- https://github.com/ceph/ceph/pull/29174
- 01:53 PM Backport #40653 (Resolved): luminous: Lower the default value of osd_deep_scrub_large_omap_object...
- https://github.com/ceph/ceph/pull/29175
- 01:52 PM Backport #40652 (Resolved): nautilus: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28966
- 01:52 PM Backport #40651 (Resolved): mimic: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28967
- 01:52 PM Backport #40650 (Resolved): luminous: os/bluestore: fix >2GB writes
- https://github.com/ceph/ceph/pull/28965
- 01:45 PM Bug #40577 (Resolved): vstart.sh can't work.
- 01:31 PM Bug #40649 (New): set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option ...
- When using any rbd command on a client machine, for example, "rbd ls poolname", those messages are always displayed:
... - 01:28 PM Bug #40583 (Pending Backport): Lower the default value of osd_deep_scrub_large_omap_object_key_th...
- 10:37 AM Bug #40646: FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstdc++-docs-8....
- temporary workaround posted at https://github.com/ceph/ceph/pull/28859
- 10:28 AM Bug #40646: FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstdc++-docs-8....
- https://bugzilla.redhat.com/show_bug.cgi?id=1726630
alternatively, we can pin on the previous version: devtoolset... - 10:04 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
- ...
- 09:41 AM Bug #40642 (Duplicate): Bluestore crash due to mass activation on another pool
- looks like a duplicate for https://tracker.ceph.com/issues/38724
- 04:55 AM Bug #40642 (Duplicate): Bluestore crash due to mass activation on another pool
- Newly deployed Nautilus cluster with SSD and HDD pools on Ubuntu 18.04.2 with kernel 4.15.0-54.
When adding a doz... - 07:57 AM Documentation #40643 (New): clearify begin hour + end hour
- documentation doesn't mention if it's allowed to have a scrubbing window across midnight, and if it is, it should spe...
- 12:20 AM Backport #40639 (Resolved): mimic: osd: report omap/data/metadata usage
- https://github.com/ceph/ceph/pull/28852
- 12:19 AM Backport #40638 (Resolved): luminous: osd: report omap/data/metadata usage
- https://github.com/ceph/ceph/pull/28851
- 12:17 AM Bug #40637 (Resolved): osd: report omap/data/metadata usage
- This is to track the backport of https://github.com/ceph/ceph/pull/18096. This is helpful to tell when a given OSD's ...
07/02/2019
- 11:30 PM Bug #39175 (Resolved): RGW DELETE calls partially missed shortly after OSD startup
- That's great. Marking this bug as "Resolved" since migrating to BlueStore fixed the issue.
- 11:12 PM Bug #40636 (Resolved): os/bluestore: fix >2GB writes
- This is related to https://tracker.ceph.com/issues/23527#note-6
- 11:09 PM Bug #40635 (Resolved): IndexError: list index out of range in thrash_pg_upmap
- ...
- 11:06 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/sage-2019-07-02_17:58:21-rados-wip-sage-testing-2019-07-02-1056-distro-basic-smithi/4087740
- 11:05 PM Bug #40634: mon: auth mon isn't loading full KeyServerData after restart
- https://github.com/ceph/ceph/pull/28850
- 11:05 PM Bug #40634 (Fix Under Review): mon: auth mon isn't loading full KeyServerData after restart
- https://github.com/ceph/ceph/pull/28850
- 10:59 PM Bug #40634 (Resolved): mon: auth mon isn't loading full KeyServerData after restart
- /a/sage-2019-07-02_17:58:21-rados-wip-sage-testing-2019-07-02-1056-distro-basic-smithi/4087648
symptom is a failed... - 08:29 PM Backport #40625 (Resolved): nautilus: OSDs get killed by OOM due to a broken switch
- https://github.com/ceph/ceph/pull/29391
- 02:46 PM Bug #40622 (Resolved): PG stuck in active+clean+remapped
- A cluster have 6 servers, in 3 racks, 2 servers per a rack.
A replication rule distributes replicas to the 3 racks: ... - 01:33 PM Bug #40620 (Resolved): Explicitly requested repair of an inconsistent PG cannot be scheduled time...
- Since osd_scrub_during_recovery=false is used as default, when a OSD has some recovering PG, it will not schedule any...
- 01:11 PM Bug #23117: PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once
- Ceph version was 13.2.5 on the reinstalled host and 13.2.4 on the other hosts.
- 01:09 PM Bug #23117: PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once
- We also hit this problem with a cluster which had replicated pools with a replication factor of 3 and a CRUSH rule wi...
- 09:54 AM Bug #40586 (Pending Backport): OSDs get killed by OOM due to a broken switch
- 12:27 AM Bug #40586: OSDs get killed by OOM due to a broken switch
- Greg Farnum wrote:
> Is this something you're working on, Xie?
Ah, sorry, forgot to link the pr, should be all se... - 09:45 AM Bug #40533 (Resolved): thrashosds/test_pool_min_size races with radosbench tests
- 09:33 AM Bug #40410: ceph pg query Segmentation fault in 12.2.10
- Brad Hubbard wrote:
> Interesting, thanks Han.
>
> Would you mind uploading an sosreport from one node where the ... - 02:00 AM Documentation #40488 (Resolved): Describe in documentation that EC can't recover below min_size p...
07/01/2019
- 09:44 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- After switching both of these over to using BlueStore for the SSDs, the problem has gone away! Thanks!
- 09:17 PM Bug #40586: OSDs get killed by OOM due to a broken switch
- Is this something you're working on, Xie?
- 09:17 PM Documentation #40568 (Fix Under Review): monmaptool: document the new --addv argument
- 08:03 PM Backport #39698: mimic: OSD down on snaptrim.
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28202
merged - 08:00 PM Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28259
merged - 07:57 PM Backport #39737: mimic: Binary data in OSD log from "CRC header" message
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28503
merged - 07:57 PM Backport #39744: mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28540
merged - 07:25 PM Feature #40610 (New): ceph-objectstore-tool option to "add-clone-metadata"
For a scenario where for some reason snapset information gets corrupt and there is a clone in the objectstore, reco...- 05:41 AM Bug #40576: src/osd/PrimaryLogPG.cc: 10513: FAILED assert(head_obc)
- I removed the object from osd.12 and osd.16 and the cluster was able to return to HEALTH_OK. I appreciate the help i...
- 02:02 AM Bug #40601 (Fix Under Review): osd: osd being wrongly reported down because of getloadavg taking ...
Also available in: Atom