Project

General

Profile

Activity

From 11/04/2020 to 12/03/2020

12/03/2020

10:14 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
epoch 100 [0, 1, 2, 3 , 4, 5 ] (100'950, 100'1000]
epoch 110 [0, 1, 2, 3 , 4, N ] 0,1,2,3 (100'960, 110'1020] 4 (100...
Samuel Just
09:41 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
proc_master_log -> merge_log -> log.roll_forward_to -> advance_can_rollback_to
I think recovery below min_size (pe...
Samuel Just
12:14 AM Bug #48417: unfound EC objects in sepia's LRC after upgrade
-Ok, with osd.31 we have 4 copies of the correct version of this object. However, min_size is 5, so one would assume... Samuel Just
06:12 PM Bug #48452 (New): pg merge explodes osdmap mempool size
We have one cluster with several osds having >500MB osdmap mempools.
Here is one example from today:...
Dan van der Ster
03:10 PM Bug #46847: Loss of placement information on OSD reboot
Ok, that's the same state I see our PGs in when they become degraded due to remappings (might_have_unfound: already_p... Jonas Jelten
01:44 AM Backport #48444 (In Progress): nautilus: octopus: setting noscrub crashed osd process
David Zafman
01:39 AM Backport #48444 (Resolved): nautilus: octopus: setting noscrub crashed osd process
https://github.com/ceph/ceph/pull/38411 David Zafman
01:38 AM Bug #47767 (Pending Backport): octopus: setting noscrub crashed osd process
David Zafman
12:35 AM Bug #48033 (Closed): mon: after unrelated crash: handle_auth_request failed to assign global_id; ...
Neha Ojha

12/02/2020

11:09 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
Yeah, osd.31 has a copy:
root@mira093:/var/lib/ceph/osd/ceph-31# ceph-objectstore-tool --data-path . '["119.385s4"...
Samuel Just
10:56 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
If this guess is correct one or more of the following 8 osds should have a copy of {"oid":"10006fc22f8.00000000","key... Samuel Just
10:40 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
My current working assumption is that PeeringState::activate is willing to backfill a non-contiguous replica which ha... Samuel Just
09:46 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
122 doesn't seem to have a 119.385 instance. Samuel Just
09:36 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
[Wrong, used the wrong shard field] Samuel Just
08:57 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
Unfortunately, it's present but an older version... Samuel Just
08:49 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
I spot checked an object in 119.385
sjust@reesi002:~$ sudo ceph pg 119.385 list_unfound | head -n 30...
Samuel Just
08:39 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
[NVM, used the command wrong] Samuel Just
08:03 PM Bug #48440 (Need More Info): log [ERR] : scrub mismatch
... Patrick Donnelly
12:27 PM Bug #48432 (Resolved): test/azy-omap-stats fails to compile on alpine linux
lazy_omap_stats_test.h uses uint which alpine linux lacks
Duncan Bellamy
11:30 AM Feature #48430 (New): Add memory consumption of nodes to health checks
During some tests using a (very small) virtual cluster I noticed that Ceph doesn't seem to 'notice' when a node runs ... Gunther Heinrich

12/01/2020

11:07 PM Documentation #48420 (In Progress): Add guidelines regarding introducing new dependencies to Ceph
This should help developers better understand the consequences of introducing new dependencies to the project. Yaarit Hatuka
10:41 PM Feature #48419 (In Progress): Add support for balance by space utilization and evenly spread prim...

Replace pg count balancing with utilization with deviation %
followed by primary balancing for replicated that onl...
David Zafman
09:53 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
shard 122(5) looks suspicious, it seems to not have been queried from the pg query output but all_unfound_are_queried... Neha Ojha
08:59 PM Bug #48417 (Duplicate): unfound EC objects in sepia's LRC after upgrade
... Josh Durgin
07:42 PM Backport #48227 (Resolved): nautilus: Log "ceph health detail" periodically in cluster log
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38118
m...
Nathan Cutler
05:53 PM Feature #39362: ignore osd_max_scrubs for forced repair
We probably should only do this if #41363 has been implemented together or beforehand. Since too many repairs could ... David Zafman
05:08 PM Bug #47380 (Fix Under Review): mon: slow ops due to osd_failure
Neha Ojha

11/30/2020

11:14 PM Bug #48320 (Resolved): mon/mon-last-epoch-clean.sh fails
Kefu Chai
07:55 PM Bug #48030 (Fix Under Review): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_o...
Patrick Donnelly
07:46 PM Bug #48030 (In Progress): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_tim...
Patrick Donnelly
06:32 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Looking at the constructor of RadosClient class, the "add_observer()" call is missing due to which
the config option...
Sridhar Seshasayee
06:51 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
The logs did not show any clues on why the "pg dump" command became hung forever.
The suspicion is that the "rados_m...
Sridhar Seshasayee
06:06 PM Bug #35808 (Rejected): ceph osd ok-to-stop result dosen't match the real situation

Marking rejected because reporter hasn't responded to request.
David Zafman
06:03 PM Bug #23875 (Resolved): Removal of snapshot with corrupt replica crashes osd

Marking resolved since no issue has been seen since the supposed partial fix was added.
David Zafman
06:00 PM Bug #27988 (Rejected): Warn if queue of scrubs ready to run exceeds some threshold

This was already handled in a different but reasonable way by https://github.com/ceph/ceph/pull/15643 and refined b...
David Zafman
05:43 PM Bug #46264 (Resolved): mon: check for mismatched daemon versions
David Zafman
05:40 PM Backport #48227: nautilus: Log "ceph health detail" periodically in cluster log
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38118
merged
Yuri Weinstein
03:15 PM Bug #48385 (Fix Under Review): nautilus: statfs: a cluster with any up but out osd will report by...
Igor Fedotov
02:52 PM Bug #48385 (In Progress): nautilus: statfs: a cluster with any up but out osd will report bytes_u...
I can reproduce the issue at vstart cluster with both latest Nautilus and Octopus. But not for master.
Looks like th...
Igor Fedotov
12:24 PM Backport #48378 (In Progress): octopus: invalid values of crush-failure-domain should not be allo...
Nathan Cutler
12:22 PM Backport #48228 (In Progress): octopus: Log "ceph health detail" periodically in cluster log
Nathan Cutler

11/29/2020

03:01 PM Feature #48392 (New): ceph ignores --keyring?
I'm trying to set up a new OSD. I'm having some issues with the rollback not performing properly.
When "ceph-volume ...
Arkadiy K

11/27/2020

06:13 PM Bug #48033: mon: after unrelated crash: handle_auth_request failed to assign global_id; probing, ...
It seems that this can be closed: there is something deeply cursed on one of the machines in that cluster and seeming... Peter Gervai
09:39 AM Bug #42341: OSD PGs are not being purged
Has long been resolved, dont even remember the details anymore. Anonymous
04:26 AM Bug #48386 (Fix Under Review): Paxos::restart() and Paxos::shutdown() can race leading to use-aft...
Brad Hubbard
12:15 AM Bug #48386 (Resolved): Paxos::restart() and Paxos::shutdown() can race leading to use-after-free ...
... Brad Hubbard

11/26/2020

07:52 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
My best guess is it's related to this in PGMap.h:... Dan van der Ster
07:18 PM Bug #48385 (Resolved): nautilus: statfs: a cluster with any up but out osd will report bytes_used...
The pool df stats are supposed to preset user bytes as "stored" and raw used as "bytes_used".
But if any osd is up...
Dan van der Ster
11:18 AM Bug #46816 (Resolved): mon stat prints plain text with -f json
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
11:17 AM Backport #48379 (Resolved): nautilus: invalid values of crush-failure-domain should not be allowe...
https://github.com/ceph/ceph/pull/39124 Nathan Cutler
11:17 AM Backport #48378 (Resolved): octopus: invalid values of crush-failure-domain should not be allowed...
https://github.com/ceph/ceph/pull/38347 Nathan Cutler
10:34 AM Bug #48060: data loss in EC pool
We ended up decommissioning ceph. For unlucky 'incomplete PG' googlers - here are steps how to migrate openstack VM s... Hannes Tamme
10:27 AM Bug #42341: OSD PGs are not being purged
Does the workaround mentioned in #43948 help? Dan van der Ster
08:18 AM Bug #43948: Remapped PGs are sometimes not deleted from previous OSDs
I can report the same in 14.2.11.
We set some osds to crush weight 0, they were draining. But due to #47044 some of ...
Dan van der Ster

11/25/2020

10:42 PM Bug #47452 (Pending Backport): invalid values of crush-failure-domain should not be allowed while...
Prashant D
07:50 PM Backport #47899 (Resolved): nautilus: mon stat prints plain text with -f json
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37706
m...
Nathan Cutler
04:48 PM Feature #48361 (New): Parallel querying of buckets.index listomapkeys
"Querying listomapkeys for rgw.buckets.index takes 45 min in our test env (5 buckets * 521 shards). Each non-zero ind... Neha Ojha
01:09 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Updating the findings so far from logs under https://pulpito.ceph.com/nojha-2020-11-10_20:16:13-rados:monthrash-maste... Sridhar Seshasayee

11/24/2020

11:43 PM Bug #47767 (In Progress): octopus: setting noscrub crashed osd process
David Zafman
08:47 PM Documentation #40579: doc: POOL_NEAR_FULL on OSD_NEAR_FULL
This should be set to "resolved".
Anthony D'Atri fixed this one.
Zac Dover
05:47 PM Backport #47899: nautilus: mon stat prints plain text with -f json
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37706
merged
Yuri Weinstein
05:22 PM Bug #47419: make check: src/test/smoke.sh: TEST_multimon: timeout 8 rados -p foo bench 4 write -b...
https://jenkins.ceph.com/job/ceph-pull-requests/64337/consoleFull#10356408840526d21-3511-427d-909c-dd086c0d1034 Neha Ojha
03:15 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
I am assigning this to myself. Looking into the logs. Sridhar Seshasayee
08:53 AM Bug #48065 (New): "ceph osd crush set|reweight-subtree" commands do not set weight on device clas...
Mykola Golub
08:53 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
I eventually have got approve from the customer to publish their data.
I have attached a tarball that includes `ce...
Mykola Golub
06:06 AM Bug #48336 (Fix Under Review): monmaptool --create --add nodeA --clobber monmap aborts in entity_...
Brad Hubbard
05:01 AM Bug #48336 (Resolved): monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_t::...
It's incorrect usage of the command since an IP address is required but we should not abort IMHO.... Brad Hubbard
03:10 AM Bug #48334 (Rejected): Check of osd_scrub_auto_repair_num_errors happens before PG stat error is ...
David Zafman
03:06 AM Bug #48334 (Rejected): Check of osd_scrub_auto_repair_num_errors happens before PG stat error is ...
David Zafman

11/23/2020

01:06 PM Backport #48244 (Resolved): nautilus: collection_list_legacy: pg inconsistent
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38100
m...
Nathan Cutler
08:02 AM Bug #48320 (Fix Under Review): mon/mon-last-epoch-clean.sh fails
Kefu Chai
07:39 AM Bug #48323: "size 0 != clone_size 10" in clog - clone size mismatch when deduped object is evicted
https://github.com/ceph/ceph/pull/38237 Myoungwon Oh
07:38 AM Bug #48323 (Resolved): "size 0 != clone_size 10" in clog - clone size mismatch when deduped objec...
When evicting deduped object which is cloned,
we try to shrink its size to zero if all chunks are in chunk_map.
How...
Myoungwon Oh

11/22/2020

02:44 PM Bug #48320 (Resolved): mon/mon-last-epoch-clean.sh fails
... Kefu Chai
06:10 AM Documentation #40579 (Fix Under Review): doc: POOL_NEAR_FULL on OSD_NEAR_FULL
Kefu Chai
05:47 AM Bug #45761 (Fix Under Review): mon_thrasher: "Error ENXIO: mon unavailable" during sync_force com...
Kefu Chai

11/20/2020

10:07 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
Neha Ojha
10:07 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
Neha Ojha
09:31 PM Backport #48244: nautilus: collection_list_legacy: pg inconsistent
Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/38100
merged
Yuri Weinstein
12:45 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
Now, about 18 hours later, the @num_pg@ already has dropped quite a bit. These are the exact same OSDs. The balancer ... Jonas Jelten
12:38 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
aah, was working alongside octopus batch, might had confused, sorry Deepika Upadhyay

11/19/2020

08:41 PM Documentation #7386: librados: document rados_osd_op_timeout and rados_mon_op_timeout options
Josh and @zdover23 : Both of these are marked in @options.cc@ as @LEVEL_ADVANCED@. There are >1500 options now, too ... Anthony D'Atri
06:53 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
Another observation: The @num_pgs@ is the highest if it was created first on the same host. Later-created devices (hi... Jonas Jelten
06:37 PM Bug #48298 (New): hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
I just added OSDs to my cluster running 14.2.13.... Jonas Jelten
05:33 PM Documentation #22843: [doc][luminous] the configuration guide still contains osd_op_threads and d...
@zdover23 given Nathan's observation that upstream != RHCS, that some of the new options are in that document now, an... Anthony D'Atri
05:23 PM Documentation #23354: doc: osd_op_queue & osd_op_queue_cut_off
The default value for `osd_op_queue_cutoff` changed to `high` with Octopus and was documented as such.
https://g...
Anthony D'Atri
03:51 PM Bug #48297 (New): OSD process using up complete available memory after pg_num change / autoscaler on
we did following change on our cluster (cephadm octopus 15.2.5):
ceph osd pool set one pg_num 512
after some ti...
Tobias Fischer
06:01 AM Documentation #23612: doc: add description of new auth profiles
@zdover23 I think #23442 is a superset of this Anthony D'Atri
04:24 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
@zdover23 I believe that https://github.com/ceph/ceph/pull/31588 fixed this already.
Anthony D'Atri
04:13 AM Documentation #35968: [doc][jewel] sync documentation "OSD Config Reference" default values with ...
@zdover23 This is another one that I fear is moot at this late date. We won't see another Jewel release.
Anthony D'Atri
04:09 AM Documentation #35967: [doc] sync documentation "OSD Config Reference" default values with code de...
@zdover The @options.cc@ values listed here appear to be current in _master_, and as with #38429 the @mimic@ and @lum... Anthony D'Atri
12:37 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Deepika Upadhyay wrote:
> seeing on octopus as well:
> https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip...
Neha Ojha

11/18/2020

09:55 PM Bug #46224 (Resolved): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:58 PM Documentation #38558: doc: osd [test-]reweight-by-utilization is not properly documented in ceph cli
@zdover23 I think https://github.com/ceph/ceph/pull/37268 fulfills this and thus it can be marked as completed. Anthony D'Atri
12:39 PM Bug #48274 (New): mon_osd_adjust_heartbeat_grace blocks OSDs from being marked as down
I encountered a situation which I also posted on the users list: https://lists.ceph.io/hyperkitty/list/ceph-users@cep... Wido den Hollander
05:11 AM Documentation #40579: doc: POOL_NEAR_FULL on OSD_NEAR_FULL
This should be addressed by https://github.com/ceph/ceph/pull/38145
Anthony D'Atri

11/17/2020

07:20 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
/a/ksirivad-2020-11-16_07:16:50-rados-wip-mgr-progress-turn-off-option-distro-basic-smithi/5630402 - no logs Neha Ojha
02:58 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
... Deepika Upadhyay
12:31 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38069
m...
Nathan Cutler
03:26 AM Bug #47767: octopus: setting noscrub crashed osd process

The function PG::abort_scrub() probably races with in messages in flight. It might help if we called scrub_unreser...
David Zafman

11/16/2020

09:21 PM Backport #48227 (In Progress): nautilus: Log "ceph health detail" periodically in cluster log
Neha Ojha
07:51 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
Neha Ojha
06:53 PM Bug #47440: nautilus: valgrind caught leak in Messenger::ms_deliver_verify_authorizer
http://qa-proxy.ceph.com/teuthology/yuriw-2020-11-11_16:17:30-rados-wip-yuri-testing-2020-11-09-0849-nautilus-distro-... Deepika Upadhyay
06:49 PM Bug #38219: rebuild-mondb hangs

description: ...
Deepika Upadhyay
12:09 PM Bug #48172: Nautilus 14.2.13 osdmap not trimming on clean cluster
So, we managed to find the reason, and it's weird.
Cluster is not trimming osdmaps because it thinks that all PGs...
Marcin Śliwiński
07:44 AM Backport #48244 (In Progress): nautilus: collection_list_legacy: pg inconsistent
Mykola Golub
07:41 AM Backport #48244 (Resolved): nautilus: collection_list_legacy: pg inconsistent
https://github.com/ceph/ceph/pull/38100 Mykola Golub
07:42 AM Backport #48243 (In Progress): octopus: collection_list_legacy: pg inconsistent
Mykola Golub
07:41 AM Backport #48243 (Resolved): octopus: collection_list_legacy: pg inconsistent
https://github.com/ceph/ceph/pull/38098 Mykola Golub
07:38 AM Bug #48153 (Pending Backport): collection_list_legacy: pg inconsistent
Mykola Golub
12:08 AM Documentation #47163: document the difference between disk commit and apply time
Suggestions re in which document this should go?
Anthony D'Atri
12:07 AM Documentation #47176: creating pool doc is very out-of-date
FWIW, some admins (including me) have found the PG autoscaler to decrease `pgp_num` far too aggressively, and have se... Anthony D'Atri

11/15/2020

05:01 PM Bug #47930 (Resolved): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
Kefu Chai

11/14/2020

11:27 PM Backport #48233: nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38069
merged
Yuri Weinstein
05:10 AM Bug #38846 (Resolved): dump_pgstate_history doesn't really produce useful json output, needs an a...
Brad Hubbard
01:43 AM Bug #38846 (Pending Backport): dump_pgstate_history doesn't really produce useful json output, ne...
Brad Hubbard

11/13/2020

10:22 PM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
Neha Ojha
10:03 PM Bug #47930 (Fix Under Review): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background:...
Neha Ojha
09:48 PM Backport #48233 (In Progress): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODU...
Nathan Cutler
09:45 PM Backport #48233 (Resolved): nautilus: Health check failed: 4 mgr modules have failed (MGR_MODULE_...
https://github.com/ceph/ceph/pull/38069 Nathan Cutler
09:38 PM Bug #46224 (Pending Backport): Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)
Neha Ojha
09:30 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
Could be related to https://github.com/ceph/ceph/pull/37844. Also appeared in its test run https://trello.com/c/Nwckv... Neha Ojha
08:56 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
This seems to be due to those 3 modules not being present in "modules" when get_health_checks() is called.... Neha Ojha
08:24 PM Bug #48230: nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
It's odd, because the mgr log for the job cited above shows a lot of what look like normal status messages from rbd_s... Dan Mick
07:53 PM Bug #48230 (Resolved): nautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
... Neha Ojha
07:39 PM Bug #47617: rebuild_mondb: daemon-helper: command failed with exit status 1
Deepika:
/a/yuriw-2020-09-16_23:57:37-rados-wip-yuri8-testing-2020-09-16-2220-octopus-distro-basic-smithi/5441511 a...
Neha Ojha
05:35 PM Backport #48228 (Resolved): octopus: Log "ceph health detail" periodically in cluster log
https://github.com/ceph/ceph/pull/38345 Nathan Cutler
05:35 PM Backport #48227 (Resolved): nautilus: Log "ceph health detail" periodically in cluster log
https://github.com/ceph/ceph/pull/38118 Nathan Cutler
05:26 PM Bug #48219 (Fix Under Review): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: ...
Neha Ojha
05:25 PM Bug #48220 (Fix Under Review): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: re...
Neha Ojha

11/12/2020

11:18 PM Bug #48220 (Resolved): qa/standalone/misc/ver-health.sh: TEST_check_version_health_1: return 1
... Neha Ojha
11:17 PM Bug #48219 (Resolved): qa/standalone/scrub/osd-scrub-test.sh: TEST_scrub_extented_sleep: return 1
... Neha Ojha
11:03 PM Bug #48042 (Pending Backport): Log "ceph health detail" periodically in cluster log
Neha Ojha
10:40 AM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
I did a wrong copy and paste -- the ceph report had
"min_last_epoch_clean": 163735,
Dan van der Ster
10:38 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
We just merged a pool (id 36) from 1024 to 64 PGs, and after this was done the cluster osdmaps were no longer trimmed... Dan van der Ster

11/11/2020

08:46 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
Agreed, it looks related. Let's leave it here for now. Brad Hubbard
08:24 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
... Deepika Upadhyay
04:06 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...

https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip-yuri4-testing-2020-11-10-0959-distro-basic-smithi/
ht...
Deepika Upadhyay
01:27 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Fails deterministically: https://pulpito.ceph.com/nojha-2020-11-09_22:09:31-rados:monthrash-master-distro-basic-smith... Neha Ojha
03:59 PM Documentation #47523 (Resolved): ceph df documentation is outdated
Zac Dover
02:20 PM Bug #47697 (Resolved): mon: set session_timeout when adding to session_map
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:19 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:18 PM Backport #47748 (Resolved): nautilus: mon: set session_timeout when adding to session_map
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37554
m...
Nathan Cutler
01:05 AM Backport #47748: nautilus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37554
merged
Yuri Weinstein
02:04 PM Bug #48183 (New): monmap::build_initial returns different error val on FreeBSD
3/3 Test #132: unittest_mon_monmap ..............***Failed 0.14 sec
Running main() from gmock_main.cc
[=========...
Willem Jan Withagen
01:56 PM Backport #47747 (Resolved): octopus: mon: set session_timeout when adding to session_map
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37553
m...
Nathan Cutler
11:22 AM Bug #47044: PG::_delete_some isn't optimal iterating objects
Can confirm this makes replacing hw for an S3 cluster quite intrusive, due to the block-db devs getting overloaded (i... Dan van der Ster
10:14 AM Feature #48182 (Resolved): osd: allow remote read by calling cls method from within cls context
Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
Ken Iizawa
06:37 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
Please close this issue. wencong wan
06:36 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
Perf dump shows the crashed osd has too many active connections. Finally, we found some ceph-fuse clients on other ho... wencong wan

11/10/2020

10:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...
David Zafman
10:04 PM Bug #48173 (New): Additional test cases for osd-recovery-scrub.sh

3. Remote reservation non-overlapping PGs start recovery on PG that has a replica
4. failed local (need sleep in O...
David Zafman
08:13 PM Bug #48172 (New): Nautilus 14.2.13 osdmap not trimming on clean cluster
We have cluster running on 14.2.13 (some osd's are on 14.2.9, they are on Debian). Cluster is in active+clean state, ... Marcin Śliwiński
07:32 PM Backport #47747: octopus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37553
merged
Yuri Weinstein
06:57 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/ has ... Greg Farnum
06:57 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/5597065 Greg Farnum
11:25 AM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
More than 90 osds crash in my cluster, crash info is as follows:
"os_version_id": "7",
"assert_condition...
wencong wan
09:52 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
Actually, the problem with the weight not updated on the class subtree is easily reproducible on a vstart cluster (se... Mykola Golub
12:23 AM Bug #18445 (Won't Fix): ceph: ping <mon.id> doesn't connect to cluster
Long since fixed differently Dan Mick

11/09/2020

11:57 PM Bug #48065 (Need More Info): "ceph osd crush set|reweight-subtree" commands do not set weight on ...
This does sound like a bug. Can you please share the osdmap? Neha Ojha
10:11 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Fails on every run
https://pulpito.ceph.com/teuthology-2020-11-08_07:01:02-rados-master-distro-basic-smithi/
http...
Neha Ojha
09:52 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
rados/singleton/{all/max-pg-per-osd.from-replica mon_election/connectivity msgr-failures/many msgr/async-v2only objec... Neha Ojha
08:35 PM Bug #48153 (Fix Under Review): collection_list_legacy: pg inconsistent
Mykola Golub
06:35 PM Bug #48153: collection_list_legacy: pg inconsistent
And just as a note. The problem is observed only when the scrub is run for a pg that have both old and new versions o... Mykola Golub
06:23 PM Bug #48153 (In Progress): collection_list_legacy: pg inconsistent
> And if you see "collection_list_legacy" in the log, then I am interested to see more details about object names you... Mykola Golub
05:08 PM Bug #48153: collection_list_legacy: pg inconsistent
> ceph tell osd.430 injectargs '--debug-osd=10'
sorry, it should be '--debug-bluestore=10'.
And if you see "c...
Mykola Golub
04:17 PM Bug #48153: collection_list_legacy: pg inconsistent
Alexander, I am building a cluster to reproduce the issue.
Meantime, could you please increase debug_osd level on ...
Mykola Golub
03:38 PM Bug #48153: collection_list_legacy: pg inconsistent
Mykola Golub wrote:
> Alexander, did you have osds of different versions in your acting set? If you did, what exactl...
Alexander Kazansky
03:31 PM Bug #48153: collection_list_legacy: pg inconsistent
> missing replica happens only on osd with 14.2.12 or 14.2.13, on 14.2.11 all ok.
Can I assume that in [206,430,41...
Mykola Golub
03:26 PM Bug #48153: collection_list_legacy: pg inconsistent
Alexander, did you have osds of different versions in your acting set? If you did, what exactly they were?
Also is i...
Mykola Golub
03:07 PM Bug #48153 (Resolved): collection_list_legacy: pg inconsistent
hello ppl.
i have problem with osd on nau 14.2.12 and 14.2.13
after update some osd on 14.2.12 or 14.2.13 sta...
Alexander Kazansky
02:09 PM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
@Alexander -
This issue is closed and it is only by chance that I saw your comment on it and decided to respond.
...
Nathan Cutler
11:20 AM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
14.2.13 bug happens
ceph pg 21.4c1 query | jq .state...
Alexander Kazansky
12:50 PM Feature #48151: osd: allow remote read by calling cls method from within cls context
I should have created this ticket under RADOS project, not Ceph project. Ken Iizawa
12:43 PM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
Ken Iizawa

11/08/2020

04:25 PM Bug #46318: mon_recovery: quorum_status times out
/a/kchai-2020-11-08_14:53:34-rados-wip-kefu-testing-2020-11-07-2116-distro-basic-smithi/5602229/ Kefu Chai

11/07/2020

02:34 AM Bug #45706: Memory usage in buffer_anon showing unbounded growth in osds on EC pool. (14.2.9)
There's this buffer::list::rebuild buffer_anon leak fix in the master branch that may solve the issue:
https://git...
Zac Medico

11/06/2020

11:03 PM Bug #48033 (Need More Info): mon: after unrelated crash: handle_auth_request failed to assign glo...
https://tracker.ceph.com/issues/47654#note-7 and https://tracker.ceph.com/issues/47654#note-8 may help to understand ... Neha Ojha

11/05/2020

09:09 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg, I am assigning this bug to you, let me know if you need anything from me. Neha Ojha
05:59 PM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37818
m...
Nathan Cutler
05:17 PM Backport #47993: nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37818
merged
Yuri Weinstein
05:59 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37815
m...
Nathan Cutler
05:28 PM Backport #47825 (Resolved): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
05:16 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37815
merged
Yuri Weinstein
05:57 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37853
m...
Nathan Cutler
05:28 PM Backport #47826 (Resolved): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
04:27 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37853
merged
Yuri Weinstein
05:56 PM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37819
m...
Nathan Cutler
04:22 PM Backport #47994: octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37819
merged
Yuri Weinstein
05:56 PM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37817
m...
Nathan Cutler
04:22 PM Backport #47987: octopus: MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37817
merged
Yuri Weinstein
05:29 PM Bug #46405 (Resolved): osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
04:15 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
12:46 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590078 Neha Ojha
12:44 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590040 looks similar Neha Ojha
12:41 AM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590019 Neha Ojha
12:39 AM Bug #48029: Exiting scrub checking -- not all pgs scrubbed.
rados/singleton-nomsgr/{all/osd_stale_reads mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}... Neha Ojha

11/04/2020

11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Neha Ojha wrote:
> Greg Farnum wrote:
> > Oh the bug does occur while executing commands to change the strategy. Bu...
Greg Farnum
11:10 PM Bug #47654 (Triaged): test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg Farnum wrote:
> Oh the bug does occur while executing commands to change the strategy. But this all still looks...
Neha Ojha
07:46 PM Bug #46264: mon: check for mismatched daemon versions

Delay 7 days by default with a config value before health warning/error when detected.
David Zafman
07:46 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
Enhancement:
Possible test cases to replace existing test
1. Simple test for "not scheduling scrubs due to active...
David Zafman
 

Also available in: Atom