Project

General

Profile

Activity

From 12/15/2021 to 01/13/2022

01/13/2022

11:15 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
https://github.com/ceph/ceph/pull/44585 Backport Bot
11:15 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
https://github.com/ceph/ceph/pull/44584 Backport Bot
11:11 PM Bug #53326 (Pending Backport): pgs wait for read lease after osd start
Neha Ojha
10:54 PM Bug #53729: ceph-osd takes all memory before oom on boot
Neha Ojha wrote:
> Gonzalo Aguilar Delgado wrote:
> > Neha Ojha wrote:
> > > Like the other case reported in the m...
Gonzalo Aguilar Delgado
10:52 PM Bug #53729: ceph-osd takes all memory before oom on boot
Igor Fedotov wrote:
> One more case:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FQXV452YLHBJ...
Gonzalo Aguilar Delgado
12:23 PM Bug #53729: ceph-osd takes all memory before oom on boot
One more case:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FQXV452YLHBJW6Y2UK7WUZP7HO5PVIA5/
Igor Fedotov
10:13 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
Same failed test, and same Traceback message as reported above. Pasted here is another relevant part of the log that ... Laura Flores
09:06 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608450 Laura Flores
09:01 PM Bug #53875 (Duplicate): AssertionError: wait_for_recovery: failed before timeout expired due to d...
Description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/... Laura Flores
08:57 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
/a/yuriw-2022-01-12_21:37:22-rados-wip-yuri6-testing-2022-01-12-1131-distro-default-smithi/6611439
last pg map bef...
Laura Flores

01/12/2022

11:04 PM Bug #53729: ceph-osd takes all memory before oom on boot
Gonzalo Aguilar Delgado wrote:
> Neha Ojha wrote:
> > Like the other case reported in the mailing list ([ceph-users...
Neha Ojha
09:50 PM Bug #53729: ceph-osd takes all memory before oom on boot
Gonzalo Aguilar Delgado wrote:
> Hi,
>
> The logs I've already provided had:
> --debug_osd 90 --debug_mon 2 --d...
Neha Ojha
08:40 PM Bug #53729: ceph-osd takes all memory before oom on boot
Neha Ojha wrote:
> Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and ...
Gonzalo Aguilar Delgado
08:38 PM Bug #53729: ceph-osd takes all memory before oom on boot
Neha Ojha wrote:
> Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and ...
Gonzalo Aguilar Delgado
08:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
Hi,
The logs I've already provided had:
--debug_osd 90 --debug_mon 2 --debug_filestore 7 --debug_monc 99 --debug...
Gonzalo Aguilar Delgado
06:24 PM Bug #53729: ceph-osd takes all memory before oom on boot
Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and https://tracker.ceph... Neha Ojha
07:12 PM Bug #53855 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v... Laura Flores
07:08 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
Later on in the example Neha originally posted (/a/yuriw-2021-11-15_19:24:05-rados-wip-yuri8-testing-2021-11-15-0845-... Laura Flores
06:55 PM Support #51609: OSD refuses to start (OOMK) due to pg split
Tor Martin Ølberg wrote:
> Tor Martin Ølberg wrote:
> > After an upgrade to 15.2.13 from 15.2.4 my small home lab c...
Neha Ojha
06:19 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608445/ Laura Flores
10:03 AM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
This present in 16.2.7. Any reason why the linked PR wasn't merged into that release? Janek Bevendorff

01/11/2022

08:47 PM Backport #53719 (In Progress): octopus: mon: frequent cpu_tp had timed out messages
Cory Snyder
08:33 PM Backport #53718 (In Progress): pacific: mon: frequent cpu_tp had timed out messages
Cory Snyder
08:31 PM Backport #53507 (Duplicate): pacific: ceph -s mon quorum age negative number
Backport was handled along with https://github.com/ceph/ceph/pull/43698 in PR: https://github.com/ceph/ceph/pull/43698 Cory Snyder
08:29 PM Backport #53660 (In Progress): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" w...
Cory Snyder
08:29 PM Backport #53659 (In Progress): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" w...
Cory Snyder
08:27 PM Backport #53721 (Resolved): octopus: common: admin socket compiler warning
The relevant code has already made it into Octopus, no further backport required. Cory Snyder
08:27 PM Backport #53720 (Resolved): pacific: common: admin socket compiler warning
The relevant code has already made it to Pacific, no further backport necessary. Cory Snyder
08:14 PM Backport #53769 (In Progress): pacific: [ceph osd set noautoscale] Global on/off flag for PG auto...
Kamoltat (Junior) Sirivadhna
08:14 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
https://github.com/ceph/ceph/pull/44540 Kamoltat (Junior) Sirivadhna
01:55 PM Bug #53824 (Fix Under Review): Stretch mode: peering can livelock with acting set changes swappin...
Greg Farnum
12:14 AM Bug #53824: Stretch mode: peering can livelock with acting set changes swapping primary back and ...
So, why is it accepting the non-acting-set member each time, when they seem to have the same data? There's a clue in ... Greg Farnum
12:14 AM Bug #53824 (Pending Backport): Stretch mode: peering can livelock with acting set changes swappin...
From https://bugzilla.redhat.com/show_bug.cgi?id=2025800
We're getting repeated swaps in the acting set, with logg...
Greg Farnum
06:42 AM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
/a/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599471... Sridhar Seshasayee
05:29 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599449... Sridhar Seshasayee

01/10/2022

10:20 PM Bug #53729: ceph-osd takes all memory before oom on boot
Forget about previous comment.
The stack trace is just the opposite, seems that the call to encode in PGog::_writ...
Gonzalo Aguilar Delgado
10:02 PM Bug #53729: ceph-osd takes all memory before oom on boot
I was taking a look to:
3,1 GiB: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (in /usr/bin/ce...
Gonzalo Aguilar Delgado
09:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
I did something better. I added a new OSD with bluestore to see if it's a problem of the filestore backend.
Then ...
Gonzalo Aguilar Delgado
09:43 AM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-01-08_17:57:43-rados-wip-yuri8-testing-2022-01-07-1541-distro-default-smithi/6603232 Sridhar Seshasayee
02:22 AM Bug #53740: mon: all mon daemon always crash after rm pool
Neha Ojha wrote:
> Do you happen to have a coredump from this crash?
No
Taizeng Wu

01/07/2022

10:25 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6598917 Laura Flores
10:09 PM Bug #48468: ceph-osd crash before being up again
Igor Fedotov wrote:
> @neha, @Gonsalo - to avoid the mess let's use https://tracker.ceph.com/issues/53729 for furthe...
Neha Ojha
09:56 PM Bug #48468: ceph-osd crash before being up again
@neha, @Gonsalo - to avoid the mess let's use https://tracker.ceph.com/issues/53729 for further communication on the ... Igor Fedotov
06:22 PM Bug #48468: ceph-osd crash before being up again
Gonzalo Aguilar Delgado wrote:
> Hi I'm having the same problem.
>
> -7> 2021-12-25T12:05:37.491+0100 7fd15c9...
Neha Ojha
09:48 PM Bug #53729: ceph-osd takes all memory before oom on boot
Looks relevant as well:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YHR3P7N5EXCKNHK45L7FRF4XNBOC...
Igor Fedotov
09:20 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599338 Laura Flores
06:41 PM Bug #53806 (Resolved): unessesarily long laggy PG state
the first `pg_lease_ack_t` after becoming laggy would not trigger `recheck_readable`. However, every other ack would ... 玮文 胡
06:34 PM Bug #53740: mon: all mon daemon always crash after rm pool
Do you happen to have a coredump from this crash? Neha Ojha

01/06/2022

09:57 PM Bug #53789 (Pending Backport): CommandFailedError (rados/test_python.sh): "RADOS object not found...
Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v... Laura Flores

01/05/2022

09:56 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
/a/yuriw-2022-01-04_21:52:15-rados-wip-yuri7-testing-2022-01-04-1159-distro-default-smithi/6595525 Laura Flores
09:51 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
/a/yuriw-2022-01-04_21:52:15-rados-wip-yuri7-testing-2022-01-04-1159-distro-default-smithi/6595522 Laura Flores

01/04/2022

11:31 PM Backport #53769 (Resolved): pacific: [ceph osd set noautoscale] Global on/off flag for PG autosca...
Backport Bot
11:26 PM Feature #51213 (Pending Backport): [ceph osd set noautoscale] Global on/off flag for PG autoscale...
Vikhyat Umrao
09:49 PM Bug #53768 (New): timed out waiting for admin_socket to appear after osd.2 restart in thrasher/de...
Error snippet:
2022-01-02T01:37:09.296 DEBUG:teuthology.orchestra.run.smithi086:> sudo adjust-ulimits ceph-coverag...
Joseph Sawaya
09:35 PM Bug #53767 (Duplicate): qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing c...
Description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-healt... Laura Flores
06:14 PM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
/a/yuriw-2021-12-23_16:50:03-rados-wip-yuri6-testing-2021-12-22-1410-distro-default-smithi/6582413... Laura Flores
12:37 PM Bug #23827: osd sends op_reply out of order
Here is the log information on my environment.
h3. op1 is coming and send shards to peers osd
2021-12-01 18:27:...
Ivan Guan
03:11 AM Bug #53757: I have a rados object that data size is 0, and this object have a large amount of oma...
pr:https://github.com/ceph/ceph/pull/44450 xingyu wang
02:59 AM Bug #53757 (Fix Under Review): I have a rados object that data size is 0, and this object have a ...
Env:ceph version is 10.2.9, os is rhel7.8,and kernerl version is ' 3.13.0-86-generic'
1、cereat some rados objects ...
xingyu wang

01/02/2022

06:58 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
Another thing I don't understand from the docs:
https://docs.ceph.com/en/pacific/rados/configuration/msgr2/#transi...
Niklas Hambuechen
06:34 PM Bug #53751 (Need More Info): "N monitors have not enabled msgr2" is always shown for new clusters
I am experiencing that for new clusters (currently Ceph 16.2.7), `ceph status` always shows e.g.:
3 monitors h...
Niklas Hambuechen

12/31/2021

01:11 PM Bug #48468: ceph-osd crash before being up again
Hey Gonzalo,
It was some times ago but from my memories I've created a hudge swapfile ~50G and I restarted the os...
Clément Hampaï
04:15 AM Bug #53749 (New): ceph device scrape-health-metrics truncates smartctl/nvme output to 100 KiB
* https://github.com/ceph/ceph/blob/ae17c0a0c319c42d822e4618fd0d1c52c9b07ed1/src/common/blkdev.cc#L729
* https://git...
Niklas Hambuechen

12/30/2021

11:42 PM Bug #51626: OSD uses all host memory (80g) on startup due to pg_split
Gonzalo Aguilar Delgado wrote:
> Any update? I have the same trouble...
>
> I downgraded kernel to 4.XX because w...
Tor Martin Ølberg
01:17 PM Bug #51626: OSD uses all host memory (80g) on startup due to pg_split
Any update? I have the same trouble...
I downgraded kernel to 4.XX because with newer kernel I cannot even get thi...
Gonzalo Aguilar Delgado

12/29/2021

08:00 AM Bug #53740 (Resolved): mon: all mon daemon always crash after rm pool
We have an openstack cluster.Last week we started clearing all openstack instances and deleting all ceph pools.All mo... Taizeng Wu

12/28/2021

04:25 AM Bug #50775: mds and osd unable to obtain rotating service keys
We ran into this issue, too. Our environment is a multi-host cluster (v15.2.9). Sometimes, we can observe that "unabl... Jerry Pu

12/27/2021

12:34 AM Bug #53729: ceph-osd takes all memory before oom on boot
Could you please set debug-osd to 5/20 and share relevant OSD startup log? Igor Fedotov

12/25/2021

09:30 PM Bug #53729 (Resolved): ceph-osd takes all memory before oom on boot
Hi, I cannot boot half of my OSD all of them die by oom killed.
It seems they are taking all the memory. Everythi...
Gonzalo Aguilar Delgado
09:03 PM Bug #48468: ceph-osd crash before being up again
Clément Hampaï wrote:
> Hi Sage,
>
> Hum I've finally managed to recover my cluster after an uncounted osd resta...
Gonzalo Aguilar Delgado
09:03 PM Bug #48468: ceph-osd crash before being up again
Hi I'm having the same problem.
-7> 2021-12-25T12:05:37.491+0100 7fd15c920640 1 heartbeat_map reset_timeout '...
Gonzalo Aguilar Delgado

12/23/2021

07:28 PM Bug #52925 (Closed): pg peering alway after trigger async recovery
As per https://github.com/ceph/ceph/pull/43534#issuecomment-984252587 Neha Ojha
07:27 PM Backport #53721 (Resolved): octopus: common: admin socket compiler warning
Backport Bot
07:27 PM Backport #53720 (Resolved): pacific: common: admin socket compiler warning
Backport Bot
07:25 PM Backport #53719 (Resolved): octopus: mon: frequent cpu_tp had timed out messages
https://github.com/ceph/ceph/pull/44546 Backport Bot
07:25 PM Backport #53718 (Resolved): pacific: mon: frequent cpu_tp had timed out messages
https://github.com/ceph/ceph/pull/44545 Backport Bot
07:24 PM Bug #43266 (Pending Backport): common: admin socket compiler warning
Neha Ojha
07:21 PM Bug #53506 (Pending Backport): mon: frequent cpu_tp had timed out messages
Neha Ojha
07:17 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2021-12-22_22:11:35-rados-wip-yuri3-testing-2021-12-22-1047-distro-default-smithi/6580187 Laura Flores
07:14 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2021-12-22_22:11:35-rados-wip-yuri3-testing-2021-12-22-1047-distro-default-smithi/6580436 Laura Flores
02:07 PM Bug #52509: PG merge: PG stuck in premerge+peered state
Neha Ojha wrote:
> Konstantin Shalygin wrote:
> > We can plan and spent time to setup staging cluster for this and ...
Konstantin Shalygin
10:17 AM Bug #52509: PG merge: PG stuck in premerge+peered state
@Markus just for the record, what is your Ceph version? And what is your hardware for OSD's? The actual issue was on ... Konstantin Shalygin
10:45 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
Hi Sage,
is there some update?
Manuel Lausch
09:32 AM Backport #53701 (In Progress): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in ...
PR: https://github.com/ceph/ceph/pull/43438 Mykola Golub
09:25 AM Backport #53702 (In Progress): pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in ...
Mykola Golub
03:05 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
匿名用户 wrote:
> Neha Ojha wrote:
> > [...]
> >
> > looks like this pg had the same pi when it got created
> >
>...
Shu Yu

12/22/2021

05:25 PM Backport #53702 (Resolved): pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
https://github.com/ceph/ceph/pull/44387 Backport Bot
05:25 PM Backport #53701 (Resolved): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
https://github.com/ceph/ceph/pull/43438 Backport Bot
05:23 PM Bug #53677 (Pending Backport): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
Neha Ojha
04:06 PM Bug #53677 (Fix Under Review): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
Mykola Golub
03:44 PM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
/a/yuriw-2021-12-21_18:01:07-rados-wip-yuri3-testing-2021-12-21-0749-distro-default-smithi/6576331/ Kamoltat (Junior) Sirivadhna
11:53 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
I asked on the ML about this issue - see thread here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/messag... Christian Rohmann
03:40 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Neha Ojha wrote:
> [...]
>
> looks like this pg had the same pi when it got created
>
> [...]
>
> but get_r...
Anonymous

12/21/2021

05:37 PM Bug #53677 (In Progress): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
This looks like a false test failure due to the test setting backfillfull ratio too low.
We see that the test calc...
Mykola Golub
11:22 AM Bug #53685 (New): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS)' failed.
Test "rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_... Adam Kupczyk

12/20/2021

05:30 PM Bug #53677 (Resolved): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
... Neha Ojha
11:51 AM Bug #23827: osd sends op_reply out of order
This bug occurred in my online environment(Nautilus 14.2.5) some days ago and my application exited because client’s ... Ivan Guan
09:10 AM Bug #53667: osd cannot be started after being set to stop
fix in https://github.com/ceph/ceph/pull/44363 changzhi tan
08:55 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
after setting osd stop, osd cannot be pulled up again
[root@controller-2 ~]# ceph osd status
ID HOST US...
changzhi tan

12/19/2021

07:56 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
the problem still exists on 15.2.15.
I've also got replicated size 3 min_size 2.
the problem occurs only when one O...
marek czardybon
12:50 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
The only "special" settings I can think of are... Christian Rohmann

12/18/2021

11:16 PM Bug #53663 (Duplicate): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
On a 4 node Octopus cluster I am randomly seeing batches of scrub errors, as in:... Christian Rohmann

12/17/2021

04:28 PM Bug #53485: monstore: logm entries are not garbage collected
fix is in progress Daniel Poelzleithner
03:07 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
https://github.com/ceph/ceph/pull/44544 Backport Bot
03:07 PM Backport #53659 (Resolved): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
https://github.com/ceph/ceph/pull/44543 Backport Bot
03:00 PM Bug #39150 (Pending Backport): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
Sage Weil

12/16/2021

11:24 PM Bug #53600 (Rejected): Crash in MOSDPGLog::encode_payload
Brad Hubbard
11:12 PM Bug #53600: Crash in MOSDPGLog::encode_payload
It should be noted there were a whole lot of oom-kill events on this node during the times these crashes occurred. Gi... Brad Hubbard
03:11 AM Bug #53600: Crash in MOSDPGLog::encode_payload
The binaries running when these crashes were seen actually are from this wip branch in the ceph-ci repo.
https://s...
Brad Hubbard
05:55 PM Bug #53485: monstore: logm entries are not garbage collected
I changed the paxos debug level to 20 and fond this in mon store log:... Daniel Poelzleithner
03:36 PM Bug #53485: monstore: logm entries are not garbage collected
We just grew to wopping 80 gb metadata server. I'm out ideas here and don't know how to stop the growth.
Somebody ad...
Daniel Poelzleithner
04:35 PM Backport #53644 (Resolved): pacific: Disable health warning when autoscaler is on
https://github.com/ceph/ceph/pull/45152 Backport Bot
04:33 PM Bug #53516 (Pending Backport): Disable health warning when autoscaler is on
Neha Ojha
03:56 PM Bug #52189: crash in AsyncConnection::maybe_start_delay_thread()
We observed a few more of those crashes. Six of them where just seconds or minutes apart or different osd / hosts eve... Christian Rohmann
03:45 PM Bug #39150 (Fix Under Review): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
Sage Weil

12/15/2021

08:04 AM Bug #52488: Pacific mon won't join Octopus mons
There is the same problem with migrating to Pacific from Nautilus Michael Uleysky
 

Also available in: Atom