Activity
From 03/27/2019 to 04/25/2019
04/25/2019
- 11:58 PM Bug #38930 (Duplicate): ceph osd safe-to-destroy wrongly approves any out osd
We can backport pull request https://github.com/ceph/ceph/pull/27503 for http://tracker.ceph.com/issues/39099 which...- 11:55 PM Bug #38930 (Pending Backport): ceph osd safe-to-destroy wrongly approves any out osd
- 11:54 PM Bug #39099 (Pending Backport): Give recovery for inactive PGs a higher priority
- 09:39 PM Bug #39490 (In Progress): osd: failed to encode map e26 with expected crc
- should be fixed by https://github.com/ceph/ceph/pull/27623
- 08:28 PM Bug #39490 (Resolved): osd: failed to encode map e26 with expected crc
upgrade:nautilus-x/parallel/{0-cluster/{openstack.yaml start.yaml} 1-ceph-install/nautilus.yaml 1.1-pg-log-override...- 08:34 PM Bug #36748: ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
- ...
- 08:13 PM Bug #38483: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- /a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892156/
- 08:10 PM Bug #37797: radosbench tests hit ENOSPC
- This one appeared again.
/a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892141/ - 12:27 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- We are running ceph 13.2.5 on Centos Linux 7.5.1804, and the ceph cluster consists of 5 ceph-mon. Every 30 seconds, w...
- 08:07 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- ...
- 07:46 AM Backport #39476 (Resolved): nautilus: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/28141
- 07:46 AM Backport #39475 (Resolved): mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/28206
- 07:46 AM Backport #39474 (Resolved): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/32349
- 07:45 AM Backport #39419 (In Progress): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).el...
- 06:13 AM Bug #39443: "ceph daemon" does not support ceph args
- Sure, 'debug_ms' was just an example to illustrate the problem. Yes the most (if not all) ceph args do not make sense...
- 03:53 AM Bug #39333 (Fix Under Review): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
04/24/2019
- 09:31 PM Bug #35808: ceph osd ok-to-stop result dosen't match the real situation
- This may be fixed by https://github.com/ceph/ceph/pull/27503
- 07:40 PM Bug #39441: osd acting cycle
- https://github.com/ceph/ceph/pull/24004 was not backported to mimic, which might explain why the octopus osd is calcu...
- 06:51 PM Bug #39443: "ceph daemon" does not support ceph args
- I'm not sure this is a problem — "ceph daemon" is just for talking to a local Unix socket; it doesn't engage in any o...
- 10:32 AM Bug #39443 (New): "ceph daemon" does not support ceph args
- This works:...
- 06:49 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- We're also seeing Bus Errors instead of segfaults in the OpHistory cleanup at #24664 so these may be related...
- 06:47 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
- 06:41 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Bryan Stillwell wrote:
> I could grab you the debug logs, but that could take a while. Which knobs do you want me t... - 01:26 PM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
- PRs:
* https://github.com/ceph/teuthology/pull/1274
* https://github.com/ceph/ceph/pull/27265
Gist:
* https:... - 01:24 PM Bug #39449 (Resolved): Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxH...
- ...
- 01:18 PM Backport #39431 (In Progress): luminous: Degraded PG does not discover remapped data on originati...
- 12:35 PM Backport #39433 (In Progress): mimic: Degraded PG does not discover remapped data on originating OSD
- 12:33 PM Backport #39432 (In Progress): nautilus: Degraded PG does not discover remapped data on originati...
04/23/2019
- 10:09 PM Bug #39441 (Resolved): osd acting cycle
- osd.9 (mimic)...
- 06:07 PM Bug #26958 (Pending Backport): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
- 01:50 AM Bug #26958 (Fix Under Review): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
- 06:05 PM Bug #38296 (Pending Backport): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- 06:04 PM Bug #39439 (Fix Under Review): osd: segv in _preboot -> heartbeat
- https://github.com/ceph/ceph/pull/27729
- 06:01 PM Bug #39439 (Resolved): osd: segv in _preboot -> heartbeat
- ...
- 05:47 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /ceph/teuthology-archive/pdonnell-2019-04-17_06:12:56-kcephfs-wip-pdonnell-testing-20190417.032809-distro-basic-smith...
- 01:07 PM Backport #39433 (Resolved): mimic: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27745
- 01:07 PM Backport #39432 (Resolved): nautilus: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27744
- 01:07 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27751
- 01:05 PM Backport #39422 (Resolved): mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
- https://github.com/ceph/ceph/pull/28142
- 01:05 PM Backport #39421 (Resolved): nautilus: Don't mark removed osds in when running "ceph osd in any|al...
- https://github.com/ceph/ceph/pull/28072
- 01:05 PM Backport #39420 (Resolved): luminous: Don't mark removed osds in when running "ceph osd in any|al...
- https://github.com/ceph/ceph/pull/27728
- 01:04 PM Backport #39419 (Resolved): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elect...
- https://github.com/ceph/ceph/pull/27771
- 11:03 AM Bug #24419: ceph-objectstore-tool unable to open mon store
- Were you able to figure out why?
- 10:52 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- 2019-04-23 13:36:20.668791 osd.2 [WRN] Monitor daemon marked osd.2 down, but it is still running
2019-04-23 13:40:36... - 10:51 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- I add "debug ms = 1" line in [osd] adm view log monitor in /var/log/ceph
...
mon.greend02-n02ceph02@1(peon) e3 ms... - 06:41 AM Backport #39042 (In Progress): luminous: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27715
04/22/2019
- 11:32 PM Bug #38296 (In Progress): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- 05:52 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc... - 05:52 PM Bug #38296 (Fix Under Review): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc... - 05:48 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- saw this again: ...
- 09:17 PM Bug #39402 (New): Can't remove ghost PGs
- This is on the downstream long-running cluster. I can grant SSH access to whomever needs it.
This bug is similar ... - 06:26 PM Bug #39263 (Pending Backport): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
- Only this commit needs to be backported to nautilus https://github.com/ceph/ceph/pull/27622/commits/ccb86682361cf20bd...
- 05:08 PM Bug #39398 (Fix Under Review): osd: fast_info need update when pglog rewind
- 08:43 AM Bug #39398 (Duplicate): osd: fast_info need update when pglog rewind
- When the pglog need rewind, the info.last_update will need to change to
older value, current impl of PG::_prepare_wr... - 01:50 PM Bug #37679 (Fix Under Review): osd: pull object from the shard who missing it
- 07:50 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- http://qa-proxy.ceph.com/teuthology/xxg-2019-04-19_03:19:09-rados-wip-yanj-testing-fixpeerings-190418-distro-basic-sm...
- 02:28 AM Bug #37439 (Pending Backport): Degraded PG does not discover remapped data on originating OSD
04/20/2019
- 01:47 PM Bug #39154 (Pending Backport): Don't mark removed osds in when running "ceph osd in any|all|*"
04/19/2019
- 04:07 AM Bug #39390: filestore pre-split may not split enough directories
- https://github.com/ceph/ceph/pull/27689
- 03:50 AM Bug #39390 (Resolved): filestore pre-split may not split enough directories
- Current HashIndex::pre_split_folder() use the following snippet to figure the number of levels for split....
04/18/2019
- 10:04 PM Backport #39389 (In Progress): nautilus: Too much log output generated from PrimaryLogPG::do_bac...
- 10:02 PM Backport #39389 (Resolved): nautilus: Too much log output generated from PrimaryLogPG::do_backfi...
- https://github.com/ceph/ceph/pull/27687
- 09:53 PM Bug #39383 (Pending Backport): Too much log output generated from PrimaryLogPG::do_backfill()
- 09:06 PM Bug #39383 (In Progress): Too much log output generated from PrimaryLogPG::do_backfill()
- 02:55 PM Bug #39383 (Resolved): Too much log output generated from PrimaryLogPG::do_backfill()
Caused by 834d3c19a77- 12:30 PM Bug #39054: osd push failed because local copy is 4394'133607637
- Greg Farnum wrote:
> As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll ha... - 12:24 PM Bug #39054: osd push failed because local copy is 4394'133607637
- thank you
- 09:44 AM Backport #39381 (Rejected): luminous: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- 09:38 AM Feature #39066 (Pending Backport): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- 09:36 AM Backport #38873 (In Progress): luminous: Rados.get_fsid() returning bytes in python3
- 09:35 AM Backport #38872 (Resolved): mimic: Rados.get_fsid() returning bytes in python3
- 09:26 AM Bug #38992 (Resolved): unable to link rocksdb library if use system rocksdb
- 09:26 AM Backport #38993 (Resolved): nautilus: unable to link rocksdb library if use system rocksdb
- 09:24 AM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- 09:24 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
- 09:19 AM Backport #39375 (Resolved): nautilus: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28035
- 09:19 AM Backport #39374 (Resolved): mimic: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28097
- 09:19 AM Backport #39373 (Resolved): luminous: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28112
- 06:41 AM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- would save a lot of diskspace if you could fix that :)
- 05:38 AM Bug #39282: EIO from process_copy_chunk_manifest
- https://github.com/ceph/ceph/pull/27667
- 04:46 AM Bug #38846 (Fix Under Review): dump_pgstate_history doesn't really produce useful json output, ne...
- 03:27 AM Bug #39154 (In Progress): Don't mark removed osds in when running "ceph osd in any|all|*"
- 02:08 AM Bug #39154 (Fix Under Review): Don't mark removed osds in when running "ceph osd in any|all|*"
- 02:46 AM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
The fix checks for down OSD when all PGs aren't active+clean and doesn't trust num_pgs which is 0 after marking a d...- 12:47 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- Turn up debug_ms to 5 maybe. It's very likely you need to look more closely at your network.
04/17/2019
- 11:33 PM Feature #39066: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- merged https://github.com/ceph/ceph/pull/27228
- 11:32 PM Backport #38872: mimic: Rados.get_fsid() returning bytes in python3
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27259
merged - 10:25 PM Bug #39006 (Pending Backport): ceph tell osd.xx bench help : gives wrong help
- 10:24 PM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
- OK, can you open two new trackers then please. One for each specific problem?
- 12:24 PM Bug #39306: ceph config: impossible to set osd_scrub_chunk_max
- Yes! I have discovered TWO problems:
1. Problem with _min_: ceph config set osd osd_scrub_chunk_min = 1 WORKS (!) ... - 10:06 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- A couple of dout(0) should be dout(20) or some dout(10) for some less repetitive ones.
- 08:26 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- during backfilling after a failed disk the log files get spammed with do_backfill messages. log files easily grow bey...
- 10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- (not surprisingly, MON_DOWN is in the ceph.log too, and the run would have failed with that had it not failed for som...
- 09:53 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- mon.c is failing to connect to mon.a:...
- 10:00 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- I could grab you the debug logs, but that could take a while. Which knobs do you want me to turn up?
This is what... - 09:47 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- The reason the OSDs are rebooting is that we're applying the latest OS updates for CentOS, so it should be a proper s...
- 09:28 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Or maybe I misread that; is the claim that an OSD reboots, *then* a delete happens, and then later on you discover th...
- 09:27 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Is there any chance of getting good debug logs of the event *while* it happens (ie, not just after scrub detects the ...
- 09:42 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- Yeah, changing the default ec profile also works
- 09:19 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- see https://github.com/ceph/ceph/pull/27656 ?
- 09:15 PM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
- This was a deliberate choice. https://github.com/ceph/ceph/pull/26894 made teh change, based on a discussion on anot...
- 09:14 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- Hmm, is this a default EC mode or just something we let users set?
The change was deliberate in PR https://github.... - 09:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
- This looks like CRUSH's fault. Can you check with tunables you are running? (ceph osd crush show-tunables)
Using ... - 09:08 PM Bug #39263 (Fix Under Review): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
- https://github.com/ceph/ceph/pull/27622
- 09:01 PM Bug #39286 (Fix Under Review): primary recovery local missing object did not update obc
- https://github.com/ceph/ceph/pull/27575
- 08:04 PM Backport #38993: nautilus: unable to link rocksdb library if use system rocksdb
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27601
merged - 08:02 PM Backport #39325: nautilus: ceph-objectstore-tool rename dump-import to dump-export
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27610
merged - 08:01 PM Backport #39310: nautilus: crushtool crash on Fedora 28 and newer
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27620
merged - 07:55 PM Bug #39366 (Can't reproduce): ClsLock.TestRenew failure
- ...
- 07:49 PM Feature #39339: prioritize backfill of metadata pools, automatically
I forgot that it is possible that backfill/recovery could be moving data around for several reasons. In those case...- 06:50 PM Feature #39339: prioritize backfill of metadata pools, automatically
Recovery is also about restoring objects to the right level of replication. Because the log is known to represent a...- 03:54 PM Feature #39339: prioritize backfill of metadata pools, automatically
- Also, this ceph command requires the operator to do it, the point of the tracker is that this should be default behav...
- 03:38 PM Feature #39339: prioritize backfill of metadata pools, automatically
- is backfill any different than recovery priority? If not, should it be? By "backfill" I mean the emergency situatio...
- 02:06 PM Feature #39339: prioritize backfill of metadata pools, automatically
- ceph osd pool set <pool> recovery_priority <value>
I think a value of 1 or 2 makes sense (default if unset is 0). - 01:59 AM Feature #39339 (In Progress): prioritize backfill of metadata pools, automatically
- Neha Ojna suggested filing this feature request.
One relatively easy way to minimize damage in a double-failure sc... - 05:18 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
- This backport does not require the third commit https://github.com/ceph/ceph/pull/26996/commits/71996da6be171cd310f8c...
- 05:17 PM Backport #38881 (In Progress): nautilus: ENOENT in collection_move_rename on EC backfill target
- https://github.com/ceph/ceph/pull/27654
- 03:13 PM Feature #39362 (New): ignore osd_max_scrubs for forced repair
- On clusters with quite full PGs, it is common (i.e. ~100% sure) that a `ceph pg repair <pgid>` does not start immedia...
- 12:56 PM Bug #39353 (Fix Under Review): Error message displayed when mon_osd_max_split_count would be exce...
- 12:36 PM Bug #39353 (Resolved): Error message displayed when mon_osd_max_split_count would be exceeded is ...
- Under certain circumstances, an attempt to increase the PG count of a pool can fail like this:...
- 06:58 AM Bug #24531: Mimic MONs have slow/long running ops
- We had this happen twice this week on a v13.2.5 cluster. (The cluster was recently upgraded from v12.2.11, where this...
- 06:19 AM Backport #39343 (In Progress): luminous: ceph-objectstore-tool rename dump-import to dump-export
- 06:13 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only
- 06:07 AM Backport #39343 (Resolved): luminous: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27636
- 06:16 AM Backport #39342 (In Progress): mimic: ceph-objectstore-tool rename dump-import to dump-export
- 06:13 AM Backport #39342: mimic: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only
- 06:07 AM Backport #39342 (Resolved): mimic: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27635
- 04:45 AM Backport #39043 (In Progress): nautilus: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27632
- 03:21 AM Backport #39044 (In Progress): mimic: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27629
04/16/2019
- 11:31 PM Backport #38566 (Resolved): mimic: osd_recovery_priority is not documented (but osd_recovery_op_p...
- 11:10 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
- 11:08 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 02:41 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
- 02:41 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 10:32 AM Bug #39281 (Fix Under Review): object_stat_sum_t decode broken if given older version
- 10:32 AM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 10:58 PM Bug #39306 (Need More Info): ceph config: impossible to set osd_scrub_chunk_max
- ...
- 05:25 AM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
- ...
- 08:42 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
- Run: http://pulpito.ceph.com/teuthology-2019-04-16_02:25:02-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 38528... - 07:28 PM Backport #39310 (In Progress): nautilus: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27620
- 08:00 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27620
- 06:18 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
sage-2019-04-16_13:58:36-rados-wip-sage-testing-2019-04-15-0844-distro-basic-smithi/3853774
The final PGs looked...- 04:53 PM Bug #39330 (New): recovery transfer rate not correct
- When running all OSDs inside a QEMU VM (with real disks attached through virtio-scsi), the gathered recovery statisti...
- 03:29 PM Bug #39249: Some PGs stuck in active+remapped state
- I've not tried changing reweights to 1, though last week I ran "ceph osd reweight-by-utilization 110"
Cluster is ... - 02:58 PM Backport #39325 (In Progress): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- 02:40 PM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27610
- 02:40 PM Bug #39284 (Pending Backport): ceph-objectstore-tool rename dump-import to dump-export
- 10:34 AM Bug #39284: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only (the PR#27564 includes another commit tha...
- 10:53 AM Bug #38786 (Resolved): autoscale down can lead to max_pg_per_osd limit
- 10:53 AM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
- 10:52 AM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 10:40 AM Bug #39055: OSD's crash when specific PG is trying to backfill
- Hi Greg,
Thanks for getting back.
After a while, I resorted to creating a new pool and migrated all the data of... - 10:33 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
- 10:32 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
- https://github.com/ceph/ceph/pull/27555
- 10:23 AM Support #39319 (New): Every 15 min - Monitor daemon marked osd.x down, but it is still running
- 1. Install Ceph (ceph version 13.2.5 mimic (stable)) in 4 node (CentOS7, in test environment VmWare ESXI 5.5)
f... - 08:00 AM Backport #39311 (Resolved): mimic: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27986
- 08:00 AM Backport #39309 (Rejected): luminous: crushtool crash on Fedora 28 and newer
- 07:45 AM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
- Creating an EC pool with m=1 on 14.2.0 defaults to a min_size of k, e.g. min_size of 2 for a 2+1 pool.
Older version... - 02:10 AM Backport #38993 (In Progress): nautilus: unable to link rocksdb library if use system rocksdb
- https://github.com/ceph/ceph/pull/27601
- 01:52 AM Bug #39006 (Fix Under Review): ceph tell osd.xx bench help : gives wrong help
04/15/2019
- 09:19 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
The message below outputs too many PGs. It counts active + up from pg_count as if the actingset and upset are disj...- 08:28 PM Bug #38930 (In Progress): ceph osd safe-to-destroy wrongly approves any out osd
- Okay, reproduced this with vstart. When I mark an OSD out, I get...
- 09:15 PM Bug #39055: OSD's crash when specific PG is trying to backfill
- You'll need to gather full debug logs of the crash and as much as possible about the object(s) which the PG is workin...
- 09:14 PM Bug #39056: localize-reads does not increment pg stats read count
- Yeah, localize_reads has some issues. This is the least of them and would be hard to fix in the current architecture ...
- 08:01 PM Backport #39271: nautilus: autoscale down can lead to max_pg_per_osd limit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27547
merged - 07:59 PM Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27550
merged - 07:40 PM Bug #39304 (Resolved): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_a...
- Run: http://pulpito.ceph.com/yuriw-2019-04-13_15:18:33-upgrade:nautilus-p2p-wip-yuri6-testing-2019-04-12-1636-nautilu...
- 06:40 PM Feature #39302 (New): `ceph df` reports misleading information when no ceph-mgr running
- When there is no ceph-mgr running, the `ceph df` command reports incorrect (misleading) information. For example, in ...
- 03:17 PM Backport #39239 (New): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- 03:16 PM Backport #39239 (In Progress): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- 01:21 PM Bug #39249: Some PGs stuck in active+remapped state
- exactly the same. In order to heal that I have changed all my reweights to 1. This helped. But anyway, I don't unders...
- 01:18 PM Bug #39249: Some PGs stuck in active+remapped state
- We have a Mimic 13.2.5 cluster with a similar looking problem:
After replacing a failing OSD, the cluster mostly ...
04/14/2019
- 08:26 PM Bug #39174 (Pending Backport): crushtool crash on Fedora 28 and newer
- 08:24 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- ...
04/13/2019
- 07:11 PM Backport #38904 (Resolved): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rol...
- 04:03 PM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- 12:40 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
- If not, the snapset in local obc may inconsistent, then the make_writeable()
will make mistakes..
04/12/2019
- 09:48 PM Bug #39263: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting down becau...
- /a/nojha-2019-04-11_19:53:24-rados-wip-parial-recovery-2019-04-11-distro-basic-smithi/3834700/
- 08:23 PM Backport #38904: mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27284
merged - 08:11 PM Bug #39284 (In Progress): ceph-objectstore-tool rename dump-import to dump-export
- 07:01 PM Bug #39284 (Resolved): ceph-objectstore-tool rename dump-import to dump-export
dump-import is a stupid name for this command.
Treat dump-import as undocumented synonym for dump-export.- 06:40 PM Bug #39281 (In Progress): object_stat_sum_t decode broken if given older version
- 04:58 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
When the encode/decode for object_stat_sum_t went from version 19 to 20 the fast path wasn't updated....- 05:46 PM Bug #39282 (Resolved): EIO from process_copy_chunk_manifest
- ...
- 03:14 PM Backport #38901 (Resolved): mimic: Minor rados related documentation fixes
- 03:00 PM Backport #39237: mimic: "sudo yum -y install python34-cephfs" fails on mimic
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27476
merged - 01:10 PM Bug #39249: Some PGs stuck in active+remapped state
- @Mark: Which version of Mimic are you running?
- 01:04 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 01:04 PM Bug #39249: Some PGs stuck in active+remapped state
- #3747 ?
- 12:26 PM Backport #38442 (In Progress): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 12:21 PM Backport #39275 (In Progress): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 12:04 PM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- https://github.com/ceph/ceph/pull/27550
- 12:09 PM Backport #39271 (In Progress): nautilus: autoscale down can lead to max_pg_per_osd limit
- 12:03 PM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
- https://github.com/ceph/ceph/pull/27547
- 11:57 AM Bug #38786 (Pending Backport): autoscale down can lead to max_pg_per_osd limit
- 11:55 AM Bug #38359 (Pending Backport): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:53 AM Bug #39159 (Fix Under Review): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
- 04:32 AM Bug #39099: Give recovery for inactive PGs a higher priority
- Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info...
04/11/2019
- 08:40 PM Bug #38840 (In Progress): snaps missing in mapper, should be: ca was r -2...repaired
- 07:14 PM Bug #39263 (Resolved): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting...
- ...
- 04:50 PM Bug #21388 (Duplicate): inconsistent pg but repair does nothing reporting head data_digest != dat...
- This was merged to master Jul 31, 2018 in https://github.com/ceph/ceph/pull/23217 for a different tracker.
- 04:32 PM Bug #39099 (In Progress): Give recovery for inactive PGs a higher priority
- 12:36 PM Bug #39249: Some PGs stuck in active+remapped state
- OSD.11 previously took part in this PG. I don't know now if as primary or not. The bug happened after I made `ceph os...
- 12:35 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:23 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:22 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
- Sometimes my PGs stuck in this state. When I stop primary OSD containig this PG, it becomes `active+undersized+degrad...
- 12:14 PM Feature #39248 (New): Add ability to limit number of simultaneously backfilling PGs
- I want to reduce affect of `ceph osd out osd.xxx`. A already set
--osd-recovery-max-active 1
--osd-max-backfills ... - 11:46 AM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
- Injecting into mgr has solved the issue, thanks!
- 11:07 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
- note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh...
- 10:59 AM Backport #39239 (Resolved): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/28493
- 10:59 AM Bug #39164 (Pending Backport): "sudo yum -y install python34-cephfs" fails on mimic
- 10:54 AM Backport #39236 (In Progress): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- 02:46 AM Backport #39236: nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27505
- 02:44 AM Backport #39236 (Resolved): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27505
- 07:56 AM Bug #39174 (In Progress): crushtool crash on Fedora 28 and newer
- 07:10 AM Bug #39174 (Fix Under Review): crushtool crash on Fedora 28 and newer
- 06:02 AM Bug #39174: crushtool crash on Fedora 28 and newer
- https://bugzilla.redhat.com/show_bug.cgi?id=1515858
- 04:36 AM Bug #39174: crushtool crash on Fedora 28 and newer
- Turning up verbosity gives clues to what might be the problem....
- 02:31 AM Bug #39174: crushtool crash on Fedora 28 and newer
- 02:30 AM Bug #39174: crushtool crash on Fedora 28 and newer
- Vasu Kulkarni wrote:
> very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad... - 02:47 AM Backport #39237 (In Progress): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- 02:47 AM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27476
04/10/2019
- 11:51 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- Ah, that's because a jewel osd does not know how to deal with this REJECT in the Started/ReplicaActive/RepNotRecoveri...
- 02:22 AM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- Fails in 1 out of 20 runs http://pulpito.ceph.com/nojha-2019-04-09_17:54:07-rados:upgrade:jewel-x-singleton-luminous-...
- 11:46 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- mon.c timeline:
2019-04-06 08:58:28.846 hits a lease timeout and triggers the election process
2019-04-06 08:58:28.... - 10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- Greg Farnum wrote:
> The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client c... - 09:59 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client connections or else not...
- 10:19 PM Backport #38720 (Resolved): mimic: crush: choose_args array size mis-sized when weight-sets are e...
- 10:18 PM Bug #38826 (Resolved): upmap broken the crush rule
- 10:18 PM Backport #38858 (Resolved): mimic: upmap broken the crush rule
- 09:48 PM Bug #39085 (Resolved): monmap created timestamp may be blank
- 09:12 PM Bug #39085 (Pending Backport): monmap created timestamp may be blank
- 09:45 PM Bug #38359 (Fix Under Review): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:45 PM Bug #38359: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3... - 09:36 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
- Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't b...
- 09:30 PM Bug #39174: crushtool crash on Fedora 28 and newer
- very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad brought this up long ti...
- 08:30 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
- On Fedora 29, Fedora 30, and RHEL 8, /usr/bin/crushtool crashes when trying to compile the map that Rook uses.
<pr... - 09:28 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
- As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll have better luck taking ...
- 09:16 PM Backport #38904 (In Progress): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
- 09:16 PM Backport #38906 (Resolved): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
- 09:14 PM Bug #39039: mon connection reset, command not resent
- So it's not the command specifically but that the client doesn't reconnect to a working monitor, right?
- 09:10 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:07 PM Backport #39220 (Resolved): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_miss...
- https://github.com/ceph/ceph/pull/27940
- 09:07 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
- This has not shown up recently, so maybe this got resolved as a result of http://tracker.ceph.com/issues/36739 being ...
- 09:07 PM Backport #39219 (Resolved): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
- https://github.com/ceph/ceph/pull/27839
- 09:07 PM Backport #39218 (Resolved): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
- https://github.com/ceph/ceph/pull/27878
- 09:05 PM Backport #39206 (Resolved): mimic: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27938
- 09:05 PM Backport #39205 (Resolved): nautilus: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27803
- 09:05 PM Backport #39204 (Resolved): luminous: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27810
- 09:01 PM Bug #39175 (Resolved): RGW DELETE calls partially missed shortly after OSD startup
- We have two separate clusters (physically 2,000+ miles apart) that are seeing
PGs going inconsistent while doing reb... - 04:06 PM Feature #39162 (In Progress): Improvements to standalone tests.
- 05:58 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
- See https://github.com/ceph/ceph/pull/27479 for a viable workaround. Note that this is a bug in gcc7 [1] and the pref...
- 04:46 AM Backport #38567 (In Progress): luminous: osd_recovery_priority is not documented (but osd_recover...
- 04:16 AM Bug #39164: "sudo yum -y install python34-cephfs" fails on mimic
- note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh...
- 04:13 AM Bug #39164 (Fix Under Review): "sudo yum -y install python34-cephfs" fails on mimic
- 03:24 AM Bug #39164 (Resolved): "sudo yum -y install python34-cephfs" fails on mimic
- see http://pulpito.ceph.com/yuriw-2019-04-09_19:20:36-multimds-wip-yuri3-testing-2019-04-08-2038-mimic-testing-basic-...
- 03:56 AM Bug #38582: Pool storage MAX AVAIL reduction seems higher when single OSD reweight is done
- Correction in the description.
It looks like the pools MAX AVAIL value had dropped after there was a hard disk fail...
04/09/2019
- 10:22 PM Bug #38724 (Need More Info): _txc_add_transaction error (39) Directory not empty not handled on o...
- logging level isn't high enough to tell what data is in this pg. :(
- 10:17 PM Bug #38786 (Fix Under Review): autoscale down can lead to max_pg_per_osd limit
- https://github.com/ceph/ceph/pull/27473
- 09:21 PM Feature #39162 (Resolved): Improvements to standalone tests.
Now that OSDs default to bluestore, need to fix the use of run_osd(). We should replace run_osd_bluestore() with r...- 08:29 PM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
- https://github.com/ceph/ceph/pull/27471
- 02:54 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- From the osd log including the thread before the crash....
- 02:36 PM Bug #38219 (Fix Under Review): rebuild-mondb hangs
- 12:25 PM Bug #39159 (Resolved): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
- Both store_thrash and thrash_store names are used for the same thing in mon_thrash.py. 'thrash_store' is used here: h...
- 08:13 AM Bug #39154 (Resolved): Don't mark removed osds in when running "ceph osd in any|all|*"
- To reproduce....
- 01:47 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
- https://github.com/ceph/ceph/pull/27273
- 01:04 AM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- OSD continously crashed
-1> 2019-04-08 17:47:06.615 7f3f3ef62700 -1 /build/ceph-14.2.0/src/os/bluestore/Bl...
04/08/2019
- 11:00 PM Bug #37264 (Resolved): scrub warning check incorrectly uses mon scrub interval
- 10:49 PM Bug #26971 (Duplicate): failed to become clean before timeout expired
- 10:18 PM Bug #26971: failed to become clean before timeout expired
- see http://tracker.ceph.com/issues/39149
- 10:15 PM Bug #26971: failed to become clean before timeout expired
- oh, it's because there's alos 1/10th the probability of choosing the second host:...
- 07:59 PM Bug #26971: failed to become clean before timeout expired
- This is just CRUSH failing. I extracted the osdmap from the data/mon.a.tgz and verified with osdmaptool that it's ju...
- 10:37 PM Bug #39150 (Resolved): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- ...
- 08:42 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
- ...
- 07:02 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
- ...
- 05:14 PM Bug #37775: some pg_created messages not sent to mon
- /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/
04/05/2019
- 08:56 PM Bug #26971: failed to become clean before timeout expired
- The up set seems to be problem here.
This is the point when we find out that osd.5 is down... - 08:50 PM Backport #38720: mimic: crush: choose_args array size mis-sized when weight-sets are enabled
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27082
merged - 08:50 PM Backport #38858: mimic: upmap broken the crush rule
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27257
merged - 08:30 PM Bug #39087: ec_lost_unfound: a EC shard has missing object after `osd lost`
- /a/yuriw-2019-04-02_20:09:55-rados-wip-yuri3-testing-2019-04-02-1623-mimic-distro-basic-smithi/3801955/ - looks like ...
- 08:06 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
- /a/yuriw-2019-04-02_20:09:55-rados-wip-yuri3-testing-2019-04-02-1623-mimic-distro-basic-smithi/3801823/
- 08:04 PM Backport #38906: nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27302
merged - 05:45 PM Feature #38940: Allow marking noout by failure domain for maintainance and planned downtime.
- Also related: https://github.com/rook/rook/issues/2825
- 05:43 PM Feature #38940: Allow marking noout by failure domain for maintainance and planned downtime.
- Relevant discussion as this relates to Rook https://github.com/rook/rook/issues/2253
- 04:16 PM Bug #37509: require past_interval bounds mismatch due to osd oldest_map
- /a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811215/
- 04:12 PM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
- /a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811205/
- 12:00 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Yes, most likely the issue was triggered by a power outage, the 2x OSD FAILED assert and the cluster is unable to rec...
- 11:07 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
- The fix so far is switching the osd back to filestore.
- 08:36 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
- Another PG....
- 07:40 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
- ...
- 06:48 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
- David Zafman wrote:
> Please find a stack trace in the osd log. Is there an assert that would look like this?
> ... - 08:23 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
- Another PG, where the missing is reported on osd.0/filestore (not osd.9/bluestore in the previous)....
- 08:08 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
- ...
- 07:08 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
- David Zafman wrote:
> It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output... - 07:14 AM Bug #39120 (New): rados: Segmentation fault in thread 7f0aebfff700 thread_name:fn_anonymous
- ...
04/04/2019
- 09:29 PM Bug #38931: osd does not proactively remove leftover PGs
- Greg Farnum wrote:
> So should we backport part of that PR, Neha?
>
> To answer your question more directly, Dan:... - 08:37 PM Bug #38931: osd does not proactively remove leftover PGs
- So should we backport part of that PR, Neha?
To answer your question more directly, Dan: OSDs don't delete PGs the... - 08:51 PM Bug #38900: EC pools don't self repair on client read error
- Yes, client IO is served. The PG is degraded, but the PG state won't necessarily reflect that.
- 08:33 PM Bug #38900: EC pools don't self repair on client read error
- Just to be clear, this means the object remains degraded, but client IO continues to be served?
- 08:32 PM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
- This is super weird; the only other recent reference I see to min_mon_release is https://github.com/ceph/ceph/pull/27...
- 04:52 PM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
- Please find a stack trace in the osd log. Is there an assert that would look like this?
/build/ceph-13.2.5-g###... - 03:24 PM Bug #39116 (New): Draining filestore osd, removing, and adding new bluestore osd causes OSDs to c...
- ...
- 04:44 PM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
- It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsis...
- 03:15 PM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore
- Running ceph pg repair on an inconsistent PG with missing data, I usually notice that the OSD is marked as down/up be...
- 01:48 PM Bug #39111 (New): "ceph config set" accepts osd ID with letters
- ...
- 09:44 AM Bug #38219: rebuild-mondb hangs
- ...
- 04:19 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Here is what the bad log looks like that caused one of the crashes. Clearly _head_ is bad because the log ends wit...- 02:40 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Maybe we could check each log in load_pgs(). If it is corrupt (head != head entry's version), move PG aside and igno...
04/03/2019
- 11:28 PM Bug #39099 (Resolved): Give recovery for inactive PGs a higher priority
Backfill inactive gets priority 220 and we should make sure that if we can have inactive that needs recovery only i...- 07:23 AM Bug #39087: ec_lost_unfound: a EC shard has missing object after `osd lost`
- is this `scrub error` we expect? what we should do is to find out why ceph doesn't recovery PG 2.4s0 ?
- 07:16 AM Bug #39087 (New): ec_lost_unfound: a EC shard has missing object after `osd lost`
- http://pulpito.ceph.com/kchai-2019-04-01_10:38:29-rados-wip-kefu-testing-2019-04-01-1531-distro-basic-mira/3797065/
... - 04:37 AM Feature #38616 (Resolved): Improvements to auto repair
04/02/2019
- 10:22 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Per request on irc.
pg log:
1.cas2 on osd.2: ceph-post-file: d74a0006-c0e9-41b1-a904-7bfe41617253
1.96s3 on osd.... - 07:51 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Output from: ceph-objectstore-tool --no-mon-config --data-path /var/lib/ceph/osd/ceph-0 --op log --pgid 1.cas0
1.c... - 06:29 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Hi Grant, is there a way you could dump the pg log by using a command like this "ceph-objectstore-tool --no-mon-confi...
- 09:51 PM Bug #39085 (Fix Under Review): monmap created timestamp may be blank
- 09:51 PM Bug #39085: monmap created timestamp may be blank
- https://github.com/ceph/ceph/pull/27327
- 07:13 PM Bug #39085 (Resolved): monmap created timestamp may be blank
- On at least one old cluster, monmap created timestamp is empty. lab cluster:...
- 07:27 PM Bug #38219: rebuild-mondb hangs
- I reproduced this again on master, http://pulpito.ceph.com/nojha-2019-04-02_17:39:35-rados:singleton-master-distro-ba...
- 11:46 AM Bug #38219: rebuild-mondb hangs
- http://pulpito.ceph.com/kchai-2019-04-02_08:04:13-rados-wip-kefu-testing-2019-04-01-1531-distro-basic-smithi/
- 01:02 PM Bug #38124: OSD down on snaptrim.
- Hello it's been two months now is there any update about this bug?
- 12:19 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
- 12:18 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
- It's an mgr option. You should instead inject it to the mgr daemon.
- 05:56 AM Backport #38905 (In Progress): luminous: osd/PGLog.h: print olog_can_rollback_to before deciding ...
- https://github.com/ceph/ceph/pull/27715
- 01:52 AM Backport #38983 (Resolved): nautilus: Improvements to auto repair
- 12:12 AM Backport #38906 (In Progress): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding ...
- https://github.com/ceph/ceph/pull/27302
04/01/2019
- 10:59 PM Bug #37439: Degraded PG does not discover remapped data on originating OSD
- My proposal to fix this bug is to call @discover_all_missing@ not only if there are missing objects, but also when th...
- 09:11 PM Bug #37439: Degraded PG does not discover remapped data on originating OSD
- Hi Jonas, thanks for creating a fix for this bug. Could you please upload the latest logs from nautilus, that you hav...
- 08:58 PM Bug #37439 (Fix Under Review): Degraded PG does not discover remapped data on originating OSD
- 01:07 AM Bug #37439: Degraded PG does not discover remapped data on originating OSD
- More findings, now on Nautilus 14.2.0:
OSD.62 once was part of pg 6.65, but content on it got remapped. A restart ... - 10:46 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Grant: I notice that the initial event outlined above is from October. Is that the very first anomalous behavior exh...
- 10:45 PM Feature #3362 (Resolved): Warn users before allowing pools to be created with more than N*<num_os...
- 09:22 PM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26616
merged - 09:22 PM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26616
merged - 04:10 PM Fix #39071 (New): monclient: initial probe is non-optimal with v2+v1
- When we are probing both v2 and v1 addrs for mons, we treat them as separate mons, which means we might be probing N ...
- 02:25 PM Feature #39066 (Fix Under Review): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- 02:21 PM Feature #39066 (Resolved): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- Currently it's only possible to run `...make; make tests -j8; ctest ...` on the same machine.
Please consider chan... - 01:48 PM Bug #38945 (Pending Backport): osd: leaked pg refs on shutdown
- 01:09 PM Bug #38219: rebuild-mondb hangs
- i am using following script to reproduce this issue locally, so far no luck...
- 11:51 AM Bug #39059 (Can't reproduce): assert in ceph::net::SocketMessenger::unregister_conn()
- ...
- 03:22 AM Bug #39056: localize-reads does not increment pg stats read count
- when set the flag of '--localize-reads', maybe peer_pg will complete read task, but peer_pg will not count read_num....
- 03:09 AM Bug #39056 (New): localize-reads does not increment pg stats read count
- when I mounted ceph-fuse, I setted the flag of '--localize-reads'. I found during the test that read_num count was In...
- 12:52 AM Backport #38904: mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
- https://github.com/ceph/ceph/pull/27284
03/31/2019
- 07:02 PM Bug #39055 (New): OSD's crash when specific PG is trying to backfill
- Hi,
I've got a peculiar issue whereby a specific PG is trying to backfill it's objects to the other peers, but th... - 12:08 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
- ceph-osd.1.log:7085:2019-02-27 13:07:21.336004 7f666b5bb700 -1 log_channel(cluster) log [ERR] : 3.33 push 3:ccb8da9c:...
03/30/2019
- 07:14 PM Bug #38931: osd does not proactively remove leftover PGs
- https://github.com/ceph/ceph/pull/27205/commits/f7c5b01e181630bb15e8b923b0334eb6adfdf50a
- 06:15 PM Bug #39053 (New): changing pool crush rule may lead to IO stop
How to reproduce:
1. create some OSDs
2. change their class to, say, "xxx"
3. create replicated crush rule ref...- 01:37 PM Backport #38860 (Resolved): nautilus: upmap broken the crush rule
- 08:46 AM Bug #38784 (Pending Backport): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
- 08:21 AM Backport #38854 (Resolved): luminous: .mgrstat failed to decode mgrstat state; luminous dev version?
- 08:21 AM Backport #38859 (Resolved): luminous: upmap broken the crush rule
- 08:20 AM Backport #38857 (Resolved): luminous: should set EPOLLET flag on del_event()
- 08:18 AM Backport #39044 (Resolved): mimic: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27629
- 08:18 AM Backport #39043 (Resolved): nautilus: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27632
- 08:18 AM Backport #39042 (Resolved): luminous: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27715
03/29/2019
- 11:04 PM Bug #39039 (Duplicate): mon connection reset, command not resent
- ...
- 07:45 PM Backport #38854: luminous: .mgrstat failed to decode mgrstat state; luminous dev version?
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27207
merged - 07:45 PM Backport #38859: luminous: upmap broken the crush rule
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27224
merged - 07:44 PM Backport #38857: luminous: should set EPOLLET flag on del_event()
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27226
merged - 07:12 AM Backport #38872 (In Progress): mimic: Rados.get_fsid() returning bytes in python3
- https://github.com/ceph/ceph/pull/27259
- 04:47 AM Backport #38858 (In Progress): mimic: upmap broken the crush rule
- https://github.com/ceph/ceph/pull/27257
- 03:04 AM Backport #38860 (In Progress): nautilus: upmap broken the crush rule
03/28/2019
- 10:23 PM Bug #39023 (Resolved): osd/PGLog: preserve original_crt to check rollbackability
- Related to the issue discovered in https://tracker.ceph.com/issues/21174#note-11.
- 07:12 PM Feature #39012 (Resolved): osd: distinguish unfound + impossible to find, vs start some down OSDs...
This may be a command that gets information from the primary of a pg listing unfound objects and where they may be ...- 06:59 PM Documentation #39011 (Resolved): Document how get_recovery_priority() and get_backfill_priority()...
Describe the get_recovery_priority() and get_backfill_priority() as it relates to these constants:...- 06:57 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- Hi Grant,
Thanks for applying the patch and updating the logs. Looks like the earlier crash on osd.2(ENOENT on cl... - 05:22 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- I am still seeing crashes with https://github.com/ceph/ceph/pull/27200 backported.
Attached are logs.
osd.2 cep... - 02:23 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- https://github.com/ceph/ceph/pull/27200 attempts to resolve the failure seen on osd.2
- 04:03 PM Bug #39006: ceph tell osd.xx bench help : gives wrong help
- moreover, it says that first number is a count of block, but actually it is the count of bytes for whole operation:
... - 04:01 PM Bug #39006 (Resolved): ceph tell osd.xx bench help : gives wrong help
- ```
$ ceph tell osd.11 bench help
help not valid: help doesn't represent an int
Invalid command: unused arguments... - 12:34 PM Backport #38859 (In Progress): luminous: upmap broken the crush rule
- 01:39 AM Backport #38859: luminous: upmap broken the crush rule
- https://github.com/ceph/ceph/pull/27224
- 11:10 AM Backport #38510 (Resolved): luminous: ceph CLI ability to change file ownership
- 11:09 AM Backport #38562 (Resolved): luminous: mgr deadlock
- 11:06 AM Backport #38903 (Resolved): nautilus: Minor rados related documentation fixes
- 07:50 AM Bug #38945: osd: leaked pg refs on shutdown
- please note, in luminous, we also need to stop @snap_sleep_timer@ and @scrub_sleep_timer@ into @OSDService::shutdown(...
- 07:43 AM Bug #38945 (Fix Under Review): osd: leaked pg refs on shutdown
- 06:12 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
- per Brad
> If we see this again we could try temporarily adding "--param ggc-min-expand=1 --param ggc-min-heapsize... - 03:22 AM Backport #38993 (Resolved): nautilus: unable to link rocksdb library if use system rocksdb
- https://github.com/ceph/ceph/pull/27601
- 03:04 AM Bug #38992 (Resolved): unable to link rocksdb library if use system rocksdb
- 02:33 AM Backport #38750 (New): luminous: should report EINVAL in ErasureCode::parse() if m<=0
- 02:31 AM Backport #38750 (In Progress): luminous: should report EINVAL in ErasureCode::parse() if m<=0
- 02:21 AM Backport #38857 (In Progress): luminous: should set EPOLLET flag on del_event()
- https://github.com/ceph/ceph/pull/27226
- 02:00 AM Backport #38860: nautilus: upmap broken the crush rule
- https://github.com/ceph/ceph/pull/27225
03/27/2019
- 10:56 PM Bug #38839: .mgrstat failed to decode mgrstat state; luminous dev version?
- Sage, Could this have something to do with #38941 ? The timing is right.
- 05:00 PM Backport #38983 (In Progress): nautilus: Improvements to auto repair
- 04:24 PM Backport #38983 (Resolved): nautilus: Improvements to auto repair
- https://github.com/ceph/ceph/pull/27220
- 04:38 PM Bug #38784 (Fix Under Review): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
- 04:01 AM Bug #26971: failed to become clean before timeout expired
dzafman-2019-03-26_16:39:54-rados:thrash-wip-zafman-26971-diag-distro-basic-smithi/3776762
Another run with diag...- 03:44 AM Backport #38854 (In Progress): luminous: .mgrstat failed to decode mgrstat state; luminous dev ve...
- https://github.com/ceph/ceph/pull/27207
- 01:54 AM Bug #38945 (Resolved): osd: leaked pg refs on shutdown
- recovery_request_timer may hold some QueuePeeringEvts which PGRef,
if we dont shutdown it earlier, it potentially ca... - 01:37 AM Feature #38616: Improvements to auto repair
- Also need to backport 0fb951963ff9d03a592bad0d4442049603195e25 with this.
Also available in: Atom