Activity
From 08/31/2021 to 09/29/2021
09/29/2021
- 06:48 PM Feature #52609: New PG states for pending scrubs / repairs
- While I agree the format is readable, it's a bit narrow in application.
Would it be a significant undertaking to:
... - 07:11 AM Feature #52609: New PG states for pending scrubs / repairs
- That schedule element seems like a pretty reasonable human-readable summary.
- 05:18 PM Bug #51527: Ceph osd crashed due to segfault
- I've attached the shell script "load-bi.sh". It requires that a cluster be brought up with RGW. It requires that a bu...
- 04:00 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
- ...
- 03:51 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
- huang jun wrote:
> [...]
The backtrace like this:... - 03:09 PM Bug #52756 (Fix Under Review): ceph-kvstore-tool repair segmentfault without bluestore-kv
- 07:26 AM Bug #52756 (Pending Backport): ceph-kvstore-tool repair segmentfault without bluestore-kv
- ...
- 03:00 PM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
- 03:00 PM Backport #52770 (Resolved): pacific: pg scrub stat mismatch with special objects that have hash '...
- https://github.com/ceph/ceph/pull/43512
- 03:00 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
- 12:25 PM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- In some cases it requires several daemon restarts until it gets to the right configuration.
I don't know if the wr... - 10:15 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Restarting the daemons seems to get the correct configuration but it is unclear why this did not happen when they wer...
- 10:00 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Just as statistics, there are now:
- 51 cases where there is an error in the front_addr or hb_front_addr configura... - 09:52 AM Bug #52761 (New): OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Ceph cluster configured with a public and cluster network:
>> ceph config dump|grep network
global advanced cl... - 09:19 AM Bug #52760 (Need More Info): Monitor unable to rejoin the cluster
- Our cluster has three monitors.
After a restart one of our monitors failed to join the cluster with:
Sep 24 07:52...
09/28/2021
- 11:16 PM Cleanup #52754 (New): windows warnings
- ...
- 11:12 PM Cleanup #52753 (Rejected): rbd cls : centos 8 warning
- ...
- 11:11 PM Cleanup #52752 (New): fix warnings
- there are warnings existing in ceph codebase that needs update with respect to mordern c++
eg one of them:
<pre... - 02:23 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Neha Ojha wrote:
> > Patrick Donnelly wrote:
> > > Patrick Donnelly wrote:
> > > > Neha... - 01:24 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Neha Ojha wrote:
> Patrick Donnelly wrote:
> > Patrick Donnelly wrote:
> > > Neha Ojha wrote:
> > > > [...]
> > ... - 07:06 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- 07:05 AM Fix #52329 (Resolved): src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" ...
- 07:04 AM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
- 07:03 AM Fix #52025 (Resolved): osd: Add config option to skip running the OSD benchmark on init.
- 07:03 AM Backport #51988 (Resolved): pacific: osd: Add mechanism to avoid running osd benchmark on osd ini...
- 07:01 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
- 07:00 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
- 06:59 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
- 05:08 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- > Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
...
09/27/2021
- 09:15 PM Backport #52322 (In Progress): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- 08:12 PM Backport #51555: octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43266
merged - 08:09 PM Backport #51967: octopus: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42616
merged - 07:47 PM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- lei cao wrote:
> global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is ... - 07:43 PM Bug #52509: PG merge: PG stuck in premerge+peered state
- Konstantin Shalygin wrote:
> We can plan and spent time to setup staging cluster for this and try to reproduce it, i... - 07:40 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Patrick Donnelly wrote:
> > Neha Ojha wrote:
> > > [...]
> > >
> > > Looks like peeri... - 07:31 PM Backport #52747 (In Progress): pacific: MON_DOWN during mon_join process
- https://github.com/ceph/ceph/pull/48558
- 07:31 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
- 07:28 PM Bug #52724: octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- Seems to be the same issue as https://tracker.ceph.com/issues/43584, marked the original ticket for backport.
- 07:27 PM Bug #43584 (Pending Backport): MON_DOWN during mon_join process
- 07:21 PM Bug #51527 (Need More Info): Ceph osd crashed due to segfault
- 07:21 PM Bug #51527: Ceph osd crashed due to segfault
- Hi Eric,
Could you please share the test that reproduces this crash on a vstart cluster? - 07:18 PM Bug #52741: pg inconsistent state is lost after the primary osd restart
- Can you please verify this behavior?
- 09:17 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- Just a note. I see that to re-detect inconsistency just running scrub (not deep-scrub) is enough, which is supposed t...
- 08:49 AM Bug #52741 (New): pg inconsistent state is lost after the primary osd restart
- Steps to reproduce:
- Create a pool (either replicated or erasure)
- Introduce an inconsistency (e.g. put an obje... - 02:13 PM Bug #52739 (Fix Under Review): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- 06:42 AM Bug #52739: msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- https://github.com/ceph/ceph/pull/43307
- 06:09 AM Bug #52739 (Resolved): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- ProtocalV2 sets the recv_stamp after the message is throttled and received completely.
This is wrong because it wa... - 09:30 AM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
- 09:29 AM Backport #47365 (Resolved): mimic: pgs inconsistent, union_shard_errors=missing
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37053
m... - 07:15 AM Bug #52486 (Resolved): test tracker: please ignore
09/26/2021
- 10:30 AM Bug #52737 (Duplicate): osd/tests: stat mismatch
- Test fails with:...
- 05:00 AM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Neha Ojha wrote:
> Can you provide a link to the failed run?
Trying to reproduce.
09/25/2021
- 11:24 AM Feature #52609: New PG states for pending scrubs / repairs
- Is the following a good enough solution?
Neha Ojha, Josh Durgin, Sam Just - what do you think?
I have drafted a...
09/24/2021
- 06:29 PM Bug #52731 (New): FAILED ceph_assert(!slot->waiting_for_split.empty())
- ...
- 03:21 PM Backport #51117: pacific: osd: Run osd bench test to override default max osd capacity for mclock.
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41731
merged - 03:16 PM Backport #51604: octopus: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42975
merged - 01:59 PM Bug #51527: Ceph osd crashed due to segfault
- J. Eric Ivancich wrote:
> I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.... - 10:01 AM Bug #52724 (Duplicate): octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- ...
09/23/2021
- 05:56 PM Bug #51527: Ceph osd crashed due to segfault
- ...
- 05:50 PM Bug #51527: Ceph osd crashed due to segfault
- Attached is an example log with backtrace that I get.
- 03:13 PM Bug #52707: mixed pool types reported via telemetry
- So far there are 6 clusters reporting this:
http://telemetry.front.sepia.ceph.com:4000/d/1rDWH5H7k/replicated-pool-w...
09/22/2021
- 10:45 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/44165
- 10:35 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
- Nautilus is EOL.
- 10:33 PM Backport #44835 (Rejected): nautilus: librados mon_command (mgr) command hang
- Nautilus is EOL
- 10:32 PM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
- 06:18 PM Backport #51555 (In Progress): octopus: mon: return -EINVAL when handling unknown option in 'ceph...
- 06:02 PM Backport #51552 (In Progress): octopus: rebuild-mondb hangs
- 05:39 PM Bug #51527: Ceph osd crashed due to segfault
- I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.2.13.
I can reproduce it... - 05:24 PM Bug #52707 (New): mixed pool types reported via telemetry
- In telemetry reports there are clusters with pools of type `replicated` with erasure_code_profile defined, for exampl...
- 09:15 AM Support #52700 (New): OSDs wont start
- We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegu...
09/21/2021
- 11:51 PM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
- ...
- 08:47 PM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- 11:23 AM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- Following an old change that made operator-initiated scrubs into a type of 'scheduled scrubs':
the operator command ... - 03:30 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- I think I found an issue in `ECBackend::get_hash_info` that might be responsible for introducing the inconsistency in...
- 12:13 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- Just for the record. In our customer case it was a mix of bluestore and filestore osds. The primary osd was the fails...
- 11:31 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- That's awesome, thanks. The behaviour you suggest sounds sensible to me.
Since it's been a while, I should probabl... - 10:44 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- We have a customer who experienced the same issue. In our case the hash info was corrupted only on two shards. I have...
- 10:38 AM Bug #48959 (Fix Under Review): Primary OSD crash caused corrupted object and further crashes duri...
- 01:45 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
- ...
- 11:43 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- ...
- 09:00 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Thank you for your reply.
My cluster has another trouble now. So I'll take these logs after resolving this problem.
09/20/2021
- 04:29 PM Bug #38931 (Resolved): osd does not proactively remove leftover PGs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:24 PM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:23 PM Backport #51966 (Resolved): nautilus: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42617
m... - 04:23 PM Backport #51583 (Resolved): nautilus: osd does not proactively remove leftover PGs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42240
m... - 04:21 PM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42862
m... - 04:19 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Neha Ojha wrote:
> > [...]
> >
> > Looks like peering induced by mapping change by the... - 07:17 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/p...
09/19/2021
09/18/2021
- 12:23 PM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
Test: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overr...- 11:21 AM Backport #51497 (In Progress): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- 11:07 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "...
- 11:04 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- Neha, sorry, logs already rotated by logrotate.
When I was removes all upmaps from this pool - all PG merges passed ... - 01:08 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is not be observed by ...
- 12:38 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Neha Ojha wrote:
> [...]
>
> Looks like peering induced by mapping change by the balancer. How often does this ha... - 12:03 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Chris Dunlop wrote:
> Neha Ojha wrote:
> > This is expected when mons don't form quorum, here it was caused by http...
09/17/2021
- 11:29 PM Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Neha Ojha wrote:
> This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issu... - 10:26 PM Bug #52489 (Duplicate): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use ...
- 10:52 PM Bug #52509 (Need More Info): PG merge: PG stuck in premerge+peered state
- This is similar to https://tracker.ceph.com/issues/44684, which has already been fixed. It seems like the pgs are in ...
- 10:30 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Most likely caused by failure injections, might be worth looking at what was going in the osd logs when we started se...
- 10:13 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
- 09:58 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- Is this reproducible on octopus or pacific, which are not EOL?
- 09:55 PM Bug #52445 (New): OSD asserts on starting too many pushes
- Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-po...
- 09:47 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- Please re-open if you happen to see the same issue on a recent release.
- 09:46 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- ...
- 09:35 PM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- We can workaround this by temporarily increasing mon_max_pg_per_osd, right?
- 03:41 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- https://github.com/ceph/ceph/pull/43201
- 03:15 AM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- At first,my cluster have 6 osds,3 pools which pgnum are all 256 and size are 2,mon_max_pg_per_osd is 300.So ,we need ...
- 09:30 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- Deepika Upadhyay wrote:
> [...]
> /ceph/teuthology-archive/yuriw-2021-09-13_19:12:32-rados-wip-yuri6-testing-2021-0... - 12:32 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- ...
- 09:24 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- The log attached has a sha1 ca906d0d7a65c8a598d397b764dd262cce645fe3, is this the first time you encountered this iss...
- 05:56 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- /a/sage-2021-09-16_18:04:19-rados-wip-sage-testing-2021-09-16-1020-distro-basic-smithi/6393058
note that this is m... - 04:27 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- /a/yuriw-2021-09-16_18:23:18-rados-wip-yuri2-testing-2021-09-16-0923-distro-basic-smithi/6393474
- 03:46 PM Documentation #35968 (Won't Fix): [doc][jewel] sync documentation "OSD Config Reference" default ...
- 09:41 AM Bug #50351 (Resolved): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restar...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:35 AM Backport #51605 (Resolved): pacific: bufferlist::splice() may cause stack corruption in bufferlis...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42976
m... - 08:35 AM Backport #52644 (In Progress): nautilus: pool last_epoch_clean floor is stuck after pg merging
- 08:34 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
- https://github.com/ceph/ceph/pull/43204
- 06:09 AM Bug #48508 (Resolved): Donot roundoff the bucket weight while decompiling crush map to source
09/16/2021
- 10:14 PM Backport #51966: nautilus: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42617
merged - 10:13 PM Backport #51583: nautilus: osd does not proactively remove leftover PGs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42240
merged - 07:01 PM Bug #49697 (Resolved): prime pg temp: unexpected optimization
- fan chen wrote:
> Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af6... - 11:16 AM Bug #52408 (Can't reproduce): osds not peering correctly after startup
- I rebuilt my cluster yesterday using a container image based on commit d906f946e845, and I'm not able to reproduce th...
- 02:08 AM Bug #52624 (New): qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABIL...
- /ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/638...
09/15/2021
- 09:23 PM Feature #52605 (Fix Under Review): osd: add scrub duration to pg dump
- 06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-...
- 06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- ...
- 03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/43513
- 03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
- 02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged - 02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
- https://github.com/ceph/ceph/pull/42498 merged
- 02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- 2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/...
09/14/2021
- 05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
- Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,...
- 01:14 PM Feature #52605 (Pending Backport): osd: add scrub duration to pg dump
- We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed.
09/13/2021
- 10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
- Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()... - 09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:05 PM Backport #52595 (New): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:03 PM Feature #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
- This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo...
- 02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
- On a...
- 10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- https://github.com/ceph/ceph/pull/41731
- 10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
- FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904 - 09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
- Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa...
- 05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
- CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl... - 04:19 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/09/2021
- 11:09 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- I got the same assert for 14.2.22 when the scenario is replayed
1. make rbd pool write-back cache layer
2. restar... - 10:00 PM Bug #52140 (Duplicate): crash: OpTracker::~OpTracker(): assert((sharded_in_flight_list.back())->o...
- 09:57 PM Bug #52141 (Need More Info): crash: void OSD::load_pgs(): abort
- 09:53 PM Bug #52142 (Duplicate): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:52 PM Bug #52145 (Duplicate): crash: OSDMapRef OSDService::get_map(epoch_t): assert(ret)
- 09:49 PM Bug #52147 (Duplicate): crash: rocksdb::InstrumentedMutex::Lock()
- 09:48 PM Bug #52148 (Duplicate): crash: pthread_getname_np()
- 09:47 PM Bug #52149 (Duplicate): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_...
- 09:46 PM Bug #52150 (Won't Fix): crash: bool HealthMonitor::check_member_health(): assert(store_size > 0)
- 09:37 PM Bug #52152 (Duplicate): crash: pthread_getname_np()
- 09:37 PM Bug #52154 (Won't Fix): crash: Infiniband::MemoryManager::Chunk::write(char*, unsigned int)
- RDMA is not being actively worked on.
- 09:32 PM Bug #52155 (Need More Info): crash: pthread_rwlock_rdlock() in queue_want_up_thru
- 09:30 PM Bug #52156 (Duplicate): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(err == 0)
- 09:28 PM Bug #52158 (Need More Info): crash: ceph::common::PerfCounters::set(int, unsigned long)
- 09:25 PM Bug #52159 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:25 PM Bug #52160 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 09:24 PM Bug #52153 (Won't Fix): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:22 PM Bug #52161 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:22 PM Bug #52163 (Rejected): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRe...
- Not a ceph bug, most likely failed to write to rocksdb.
- 09:20 PM Bug #52165 (Rejected): crash: void MonitorDBStore::clear(std::set<std::__cxx11::basic_string<char...
- A non-zero return value could be possibly due to a rocksdb corruption and there just 2 clusters reporting this.
- 09:17 PM Bug #52166 (Won't Fix): crash: void Device::binding_port(ceph::common::CephContext*, int): assert...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:15 PM Bug #52167 (Won't Fix): crash: RDMAConnectedSocketImpl::RDMAConnectedSocketImpl(ceph::common::Cep...
- RDMA is not being actively worked on, this is one cluster reporting all the crashes.
- 09:14 PM Bug #52162 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:14 PM Bug #52164 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:13 PM Bug #52168 (Duplicate): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionR...
- 09:06 PM Bug #52170 (Duplicate): crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: ass...
- 09:05 PM Bug #52171 (Triaged): crash: virtual int RocksDBStore::get(const string&, const string&, ceph::bu...
- Seen on 2 clusters, could be related to some sort of rocksdb corruption.
- 09:01 PM Bug #52173 (Need More Info): crash in ProtocolV2::send_message()
- Seen on 2 octopus clusters.
- 08:45 PM Bug #52189 (Need More Info): crash in AsyncConnection::maybe_start_delay_thread()
- We'll need more information to debug a crash like this.
- 05:25 PM Backport #51605: pacific: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42976
merged - 05:20 PM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
- https://github.com/ceph/ceph/pull/41731
- 05:19 PM Fix #52025 (Pending Backport): osd: Add config option to skip running the OSD benchmark on init.
- Merged https://github.com/ceph/ceph/pull/42604
- 05:17 PM Fix #52329 (Pending Backport): src/vstart: The command "set config key osd_mclock_max_capacity_io...
- 05:10 PM Fix #52329: src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" fails with ...
- https://github.com/ceph/ceph/pull/42853 merged
- 04:17 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
- /a/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886
As part of the th... - 02:24 PM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- 10:15 AM Backport #52557 (In Progress): pacific: pybind: rados.RadosStateError raised when closed watch ob...
- https://github.com/ceph/ceph/pull/51259
- 10:15 AM Backport #52556 (Rejected): octopus: pybind: rados.RadosStateError raised when closed watch objec...
- 10:11 AM Bug #52553 (Pending Backport): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:10 AM Bug #52553 (Fix Under Review): pybind: rados.RadosStateError raised when closed watch object goes...
- 07:06 AM Bug #52553 (Pending Backport): pybind: rados.RadosStateError raised when closed watch object goes...
- This one is easiest to demonstrate by example. Here's some code:...
09/08/2021
- 09:57 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- I got the logs of sd.{3,11,13}'s during their boot. This data was collected with tuning up the log level.
https://... - 06:57 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The ticket can be closed from our side - and it may be a duplicate, but I'm not able to say this for sure. But I have...
- 05:30 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- Roland Sommer wrote:
> The cluster is running without any problems since we rolled out the latest dev release from t... - 01:17 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- The cluster is running without any problems since we rolled out the latest dev release from the pacific branch to all...
- 09:21 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- We started rolling out 16.2.5-522-gde2ff323-1bionic from the dev repos on the osd nodes, as there is no release/tag v...
- 05:35 AM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- This could be related to https://tracker.ceph.com/issues/52089
@Roland, could yu please upgrade to 16.2.6 and update... - 04:42 PM Bug #52408: osds not peering correctly after startup
- Erm, in fact, right after doing cephadm bootstrap, before rebooting anything:...
- 04:17 PM Bug #52408: osds not peering correctly after startup
- Odd. The hosts in question are all KVM nodes on the same physical host, so I wouldn't expect networking issues.
I ... - 03:42 PM Backport #51952 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
- 09:11 AM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Increasing priority, as this happens pretty often in the ceph-volume jenkins jobs recently
- 09:02 AM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
- seeing failures in ceph-volume CI because of monitor crashing after an OSD gets destroyed....
09/07/2021
- 02:34 PM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
- Nautilus is EOL and the backport is too intrusive.
- 01:23 PM Bug #52523: Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- I attached another graph showing the increased amount of written data.
- 10:01 AM Bug #52523 (Duplicate): Latency spikes causing timeouts after upgrade to pacific (16.2.5)
- After having run pacific in our low volume staging system for 2 months, yesterday we upgraded our production cluster ...
- 10:55 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- PG was actually inconsistent...
09/06/2021
- 06:42 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Hey Ilya, nope, since the issue was seen in pacific I thought it might be something we backported to recent versions....
- 06:34 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- On osd0 and osd1:...
- 04:20 PM Bug #51799 (Resolved): osd: snaptrim logs to derr at every tick
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:18 PM Bug #52421 (Resolved): test tracker
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:04 PM Backport #52336 (Resolved): pacific: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42860
m... - 04:03 PM Backport #51830 (Resolved): pacific: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42615
m... - 04:02 PM Backport #51290: pacific: mon: stretch mode clusters do not sanely set default crush rules
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42909
m... - 09:20 AM Bug #52513 (New): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- We get crash of two simultaneously OSD's served 17.7ff [684,768,760] ...
09/05/2021
- 02:34 PM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
- Hi, we get a couple of outages with two stuck PG it premerge+peered state:...
09/03/2021
- 05:25 PM Bug #52503 (New): cli_generic.sh: slow ops when trying rand write on cache pools
- failing: leads to slow ops: http://qa-proxy.ceph.com/teuthology/yuriw-2021-09-01_19:04:25-rbd-wip-yuri-testing-2021-0...
- 02:13 PM Bug #52445: OSD asserts on starting too many pushes
- Hi,
I have managed to set debug log using ceph config set command and taken log output.
# options before changi... - 06:44 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
09/02/2021
- 10:12 PM Bug #44715 (Fix Under Review): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
- 10:10 PM Bug #52408: osds not peering correctly after startup
- Jeff Layton wrote:
> Ok. I wasn't clear on whether I needed to run "ceph config set debug_osd 20" on all the hosts o... - 09:53 PM Backport #52498 (Rejected): nautilus: test tracker: please ignore
- 09:50 PM Bug #52445 (Need More Info): OSD asserts on starting too many pushes
- Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of ceph -s?
Is this crash... - 09:42 PM Backport #52497 (Rejected): octopus: test tracker: please ignore
- 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Ronen Friedman wrote:
> Some possibly helpful hints:
> 1. In "my" specific instance, the pg address handed over to ... - 08:34 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- more useful debug logging being added in https://github.com/ceph/ceph/pull/42965
- 05:48 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I dug into this more today and I am wondering if it has something to do with `_conf->cluster` not being set right (to...
- 01:26 PM Backport #52495 (Rejected): pacific: test tracker: please ignore
- 01:26 PM Bug #52486 (Pending Backport): test tracker: please ignore
- 06:25 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2021-08-27_21:20:08-rados-wip-yuri2-testing-2021-08-27-1207-distro-basic-smithi/6363835
- 01:22 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgradin...
- 12:42 AM Bug #52488 (New): Pacific mon won't join Octopus mons
- I'm in the midst of an upgrade from Octopus to Pacific. Due to issues with the version of docker available on Debian ...
09/01/2021
- 11:55 PM Bug #52421 (Pending Backport): test tracker
- 07:24 PM Bug #52486 (Closed): test tracker: please ignore
- please ignore
- 05:18 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-08-31_22:30:47-rados-wip-yuri8-testing-2021-08-30-0930-pacific-distro-basic-smithi/6369129/remote/smith...
- 03:55 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Some possibly helpful hints:
1. In "my" specific instance, the pg address handed over to register_and_wake_split_chi...
08/31/2021
- 09:57 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
- 09:41 PM Bug #52421 (Resolved): test tracker
- 09:41 PM Backport #52475 (Resolved): octopus: test tracker
- 09:20 PM Backport #52475 (Resolved): octopus: test tracker
- 09:40 PM Backport #52474 (Resolved): nautilus: test tracker
- 08:52 PM Backport #52474 (Resolved): nautilus: test tracker
- 09:40 PM Backport #52466 (Resolved): pacific: test tracker
- 03:44 PM Backport #52466 (Resolved): pacific: test tracker
- 08:59 AM Bug #49697: prime pg temp: unexpected optimization
- Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af69e" can solve my is...
- 03:44 AM Bug #52255: The pgs state are degraded, but all the osds is up and there is no recovering and bac...
- Neha Ojha wrote:
> can you share your osdmap? are all your osds up and in? the crushmap looks fine.
wish to get y...
Also available in: Atom