Project

General

Profile

Activity

From 12/14/2020 to 01/12/2021

01/12/2021

05:06 PM Bug #48729 (Triaged): Bluestore memory leak on srub operations
It looks like high RAM usage is caused by improper onode cache trimming inside BlueStore. Which in turn might be caus... Igor Fedotov
10:55 AM Bug #48729: Bluestore memory leak on srub operations
@Igor
here you are:
https://cf2.cloudferro.com:8080/swift/v1/AUTH_5b9ea421deb745bfb4dab930cebe153f/ceph-sharings/...
Rafal Wadolowski
02:02 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
Thank you thank you. They are attached.
Best,
Will
William Law
11:43 AM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
@Will - to make block.db extract just use:
dd if=block.db ibs=1 skip=15589376 count=32768 of=dump.out
Igor Fedotov
01:19 PM Bug #48849: BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
Wondering if you had experienced any recent OSD crashes prior to this failure?
You might also want to Check for HW...
Igor Fedotov
12:43 PM Bug #48849: BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
BTW, I looked through other reported issues and found https://tracker.ceph.com/issues/48002 or https://tracker.ceph.c... Christian Rohmann
12:41 PM Bug #48849 (Need More Info): BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
We experienced a few OSD crashes all with the same signature in the logs:
--- cut ---
2021-01-08 06:13:54.946 7f3...
Christian Rohmann
11:43 AM Backport #48194 (Resolved): octopus: bufferlist c_str() sometimes clears assignment to mempool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38429
m...
Nathan Cutler
11:43 AM Backport #48094 (Resolved): octopus: Hybrid allocator might segfault when fallback allocator is p...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38428
m...
Nathan Cutler
11:41 AM Backport #48093: nautilus: Hybrid allocator might segfault when fallback allocator is present
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38637
m...
Nathan Cutler
11:40 AM Backport #47672: nautilus: Hybrid allocator might cause duplicate admin socket command registrati...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37793
m...
Nathan Cutler
10:23 AM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
to answer my question - head -n 2 /dev/vg/lv will give the block device uuid Glen Baars
09:44 AM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
Any way to determine the correct DB->Block arrangement after they are lost? I have a host that has hit this bug and a... Glen Baars
01:19 AM Bug #48776: ObjectStore/StoreTest hangs
... Neha Ojha

01/11/2021

09:19 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
HI Igor -
I feel like I did something wrong as hexdump returned nothing... My apologies we haven't slept much
@ro...
William Law
08:33 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
@Will, would you please share the hex dump of block.db file starting offset 0xede000 length 0x8000.
Latest startup...
Igor Fedotov
05:00 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
Igor, thank you! It's attached.
Will
William Law
04:24 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
@William - would you please share OSD startup log with debug-bluefs set to 20? Igor Fedotov
04:03 PM Bug #48827 (Duplicate): Ceph Bluestore OSDs fail to start on WAL corruption
Hi -
I posted a note to the Ceph user list also, but we've run into this bug and it unfortunately hit 5 OSDs at th...
William Law
07:59 PM Bug #48729: Bluestore memory leak on srub operations
Presuming mem utilization is still that high could you please temporary set debug_bluestore to 20 for the osd in ques... Igor Fedotov
10:25 AM Bug #48729: Bluestore memory leak on srub operations
Unfortunately, That's not the case. After 4 days some of the osds took >10GB of ram.
In example:...
Rafal Wadolowski
07:55 PM Backport #48194: octopus: bufferlist c_str() sometimes clears assignment to mempool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38429
merged
Yuri Weinstein
07:55 PM Backport #48094: octopus: Hybrid allocator might segfault when fallback allocator is present
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38428
merged
Yuri Weinstein
04:50 PM Bug #47443 (Resolved): Hybrid allocator might cause duplicate admin socket command registration.
Igor Fedotov
04:49 PM Backport #47672 (Resolved): nautilus: Hybrid allocator might cause duplicate admin socket command...
Igor Fedotov
04:43 PM Backport #47672: nautilus: Hybrid allocator might cause duplicate admin socket command registrati...
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37793
merged
Yuri Weinstein
04:48 PM Backport #48093 (Resolved): nautilus: Hybrid allocator might segfault when fallback allocator is ...
Igor Fedotov
04:44 PM Backport #48093: nautilus: Hybrid allocator might segfault when fallback allocator is present
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38637
merged
Yuri Weinstein
04:13 AM Bug #48819 (New): fsck error: found stray (per-pg) omap data on omap_head
/a/kchai-2021-01-10_13:20:22-rados-master-distro-basic-smithi/ Kefu Chai

01/08/2021

01:08 PM Bug #48781: crash in BlueStore::Onode::put()
and on the last host:
Jan 7 07:34:17 ceph2 kernel: [107054.315343] tp_osd_tp[20519]: segfault at 0 ip 00007efd3db...
Tom Myny
01:04 PM Bug #48781: crash in BlueStore::Onode::put()
On another system we see the following to:
Jan 7 10:02:32 ceph1 kernel: [114774.759038] tp_osd_tp[17449]: segfaul...
Tom Myny
01:02 PM Bug #48781: crash in BlueStore::Onode::put()
We also see the following in our OS logs:
[119268.259883] tp_osd_tp[32332]: segfault at 0 ip 00007f8ccce40733 sp 0...
Tom Myny

01/07/2021

09:34 PM Bug #48776: ObjectStore/StoreTest hangs
... Neha Ojha
12:38 AM Bug #48776: ObjectStore/StoreTest hangs
/a/teuthology-2021-01-05_07:01:02-rados-master-distro-basic-smithi/5755704 Neha Ojha
12:38 AM Bug #48776 (Resolved): ObjectStore/StoreTest hangs
... Neha Ojha
02:45 PM Bug #48781: crash in BlueStore::Onode::put()
Download file in attachment with extra logs Tom Myny
02:21 PM Bug #48781: crash in BlueStore::Onode::put()
Here is some extra information regarding this problem:
{
"backtrace": [
"(()+0x12b20) [0x7f0afc7a8b2...
Tom Myny
09:26 AM Bug #48781 (Resolved): crash in BlueStore::Onode::put()
Following the earlier issue reported in #48778, I now see frequent OSD crashes. I'm not sure both are related.
<pr...
Gerry D

01/06/2021

11:14 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
No problem, and thanks for confirming! Joshua Baergen
11:12 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Joshua Baergen wrote:
> Interesting, thanks. Is that 14.2.17 change this one: https://tracker.ceph.com/issues/47044 ...
Dan van der Ster
11:10 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Interesting, thanks. Is that 14.2.17 change this one: https://tracker.ceph.com/issues/47044 ?
FWIW, what I'm seein...
Joshua Baergen
11:07 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Joshua Baergen wrote:
> Hey Dan/Eric, did either of you see a big increase in the number of writes hitting your disk...
Dan van der Ster
11:00 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Hey Dan/Eric, did either of you see a big increase in the number of writes hitting your disks when buffered mode was ... Joshua Baergen

01/05/2021

03:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
Fixing this error:... Nathan Cutler
03:02 PM Bug #46124 (Resolved): Potential race condition regression around new OSD flock()s
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:41 AM Bug #46490: osds crashing during deep-scrub
We seem to hit the same behaviour after upgrading our ceph cluster from 12.2.12 to 14.2.11.
Since then we have quite...
Maximilian Stinsky
02:48 AM Support #48747: which version support spdk perfect?
... hg liu
02:45 AM Support #48747: which version support spdk perfect?
13.2.13 i encounter this fail when ceph-osd read data of 2048 counts of lbas... hg liu
02:42 AM Support #48747 (Closed): which version support spdk perfect?
if i try 12.2.12 12.2.13 and 13.2.10 all of them can not run stable because of
crush when write or read? which ver...
hg liu

01/01/2021

03:52 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
Kefu Chai

12/30/2020

03:26 PM Bug #48729: Bluestore memory leak on srub operations
I will put this on test environment. Will see Rafal Wadolowski
02:39 PM Bug #48729: Bluestore memory leak on srub operations
Wondering if you can try a patch from https://tracker.ceph.com/issues/46027 and check whether it's helpful in your ca... Igor Fedotov
01:39 PM Bug #48729 (Resolved): Bluestore memory leak on srub operations
We observed some unlimited growing ram on OSD processes.
During our investigation (valgrind), we gathered informatio...
Rafal Wadolowski

12/29/2020

03:44 PM Bug #45519: OSD asserts during block allocation for BlueFS
Interestingly enough, I was able to get the OSD to start back up with the stupid allocator.. does that help? Mohammed Naser
03:35 PM Bug #45519: OSD asserts during block allocation for BlueFS
This is what it looks like:
http://sprunge.us/MypV0b
And the high fragmentation is using the score. Sorry, we ...
Mohammed Naser

12/28/2020

10:42 PM Bug #48726 (Rejected): /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_asser...
Igor Fedotov
10:39 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
Just beat me to it :-) Chris Dunlop
10:36 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
Sorry, false alarm. The issue would have been caused by this disk error: ... Chris Dunlop
10:36 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
There are the following lines prior to the assertion:
-3> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561e...
Igor Fedotov
10:02 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
See also: https://tracker.ceph.com/issues/19984 Chris Dunlop
09:58 PM Bug #48726 (Rejected): /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_asser...
Per attached log... Chris Dunlop

12/24/2020

02:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hd...
Igor Fedotov
12:25 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
@Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hdd clusters. (we left ssd-on... Dan van der Ster
04:29 AM Bug #48696 (Fix Under Review): osd assert because of aios will be truncated.
Kefu Chai

12/23/2020

08:41 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Konstantin Shalygin wrote:
> > > > Seena,...
Igor Fedotov
08:35 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> Igor Fedotov wrote:
> > Dan van der Ster wrote:
> > > Igor or others, do you have any in...
Igor Fedotov
08:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Seena Fallah wrote:
> > Konstantin Shalygin wrote:
> > > Seena, Igor already push fixes for ...
Seena Fallah
07:16 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Dan van der Ster wrote:
> > Igor or others, do you have any insight into which exact conditio...
Dan van der Ster
03:29 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Konstantin Shalygin wrote:
> > Seena, Igor already push fixes for hybrid allocator to review ...
Igor Fedotov
03:25 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Igor Fedotov wrote:
> > So generally the issue is that hybrid allocator might return out-of-b...
Igor Fedotov
03:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> Igor or others, do you have any insight into which exact conditions can trigger the alloca...
Igor Fedotov
02:54 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Bastian Mäuser wrote:
> How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is w...
Igor Fedotov
02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Konstantin Shalygin wrote:
> Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release...
Seena Fallah
01:33 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release. Konstantin Shalygin
12:27 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Isn't it better to change the default allocator to bitmap while the bug fix? I have a various heartbeat_map timeout i... Seena Fallah
10:26 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Konstantin Shalygin wrote:
> > (I'm just checking all bases -- we have been lucky so far to not see a single instanc...
Dan van der Ster
06:18 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
> (I'm just checking all bases -- we have been lucky so far to not see a single instance or this crash on 5000 osds)
...
Konstantin Shalygin
01:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
> For (2), Do you mean if we enable for example bluefs_buffered_io?
Deferred writes have nothing to do with buffer...
Dan van der Ster
12:59 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is why my errors allowed all ... Bastian Mäuser

12/22/2020

10:50 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> So generally the issue is that hybrid allocator might return out-of-bound extent while process...
Seena Fallah
06:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor or others, do you have any insight into which exact conditions can trigger the allocation bug? Any particular us... Dan van der Ster
05:52 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
So generally the issue is that hybrid allocator might return out-of-bound extent while processing write request.
Dep...
Igor Fedotov
01:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Interessting. In my Case (I am on .11, osd's were initially created on .8) I don't need to recreate the OSD's. Maybe ... Bastian Mäuser
12:42 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Yes dead in permanently dead until I recreate the OSD. The drive itself is well, just the OSD data corrupts. The star... Jonas Jelten
12:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
So is the proposed workaround to set bluefs_allocator to "bitmap" or what? Can I do that on a running cluster?
@Ig...
Bastian Mäuser
12:16 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
@Jonas:
Dead OSD's? permanently? In my case "just" the osd process died and restarted itsself.
Bastian Mäuser
08:23 PM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
The page of the user who raised this issue shows the following:
Registered on: 05/10/2018
Last connection...
Zac Dover
03:29 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
* 1.anomalies
osd assert after it‘s reboot,just like the following:...
hongsong wu

12/21/2020

07:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
It's set by @bluefs_allocator@ at bluestore @mkfs@ time: https://github.com/ceph/ceph/blob/b1d0fa70590c23e80a09638df9... Jonas Jelten
11:37 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
It would be good if you could issue a workaround howto. Bastian Mäuser

12/18/2020

02:39 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37794
m...
Nathan Cutler
08:27 AM Backport #47671 (Resolved): octopus: Hybrid allocator might cause duplicate admin socket command ...
Igor Fedotov
02:39 PM Backport #47708 (Resolved): octopus: Potential race condition regression around new OSD flock()s
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37860
m...
Nathan Cutler
01:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
Igor wrote in the mimic backport issue: "We don't have hybrid allocator in mimic and there are no related (claim_free... Nathan Cutler

12/17/2020

11:32 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37794
merged
Yuri Weinstein
10:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Things keep occurring to me after I press <enter>. :)
When this issue occurs on our spinners, the read rate is ver...
Joshua Baergen
10:05 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Actually, I should be careful - we have _definitely_ seen the symptom of high read rate on Luminous (https://tracker.... Joshua Baergen
09:38 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
I doubt it helps, but I just wanted to add a "me too" here on 14.2.11. We're augmenting a cluster and had moved a few... Joshua Baergen
08:53 PM Backport #47708: octopus: Potential race condition regression around new OSD flock()s
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37860
merged
Yuri Weinstein
05:36 PM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37861
m...
Nathan Cutler
05:22 PM Backport #47892: octopus: Compressed blobs lack checksums
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37861
merged
Yuri Weinstein
01:14 PM Bug #48276 (Triaged): OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov
01:14 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
After some analysis IMO the root cause is highly likely the same as for https://tracker.ceph.com/issues/47751
Under ...
Igor Fedotov
01:05 PM Backport #48093 (In Progress): nautilus: Hybrid allocator might segfault when fallback allocator ...
https://github.com/ceph/ceph/pull/38637 Igor Fedotov
01:02 PM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
We don't have hybrid allocator in mimic and there are no related (*claim_free_to_*) methods in bitmap one. Hence no n... Igor Fedotov
01:37 AM Bug #48036: bluefs corrupted in a OSD
Igor Fedotov wrote:
> On the other hand log directory is shared among containers as we could see output from multip...
Satoru Takeuchi

12/16/2020

05:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
here ya go, fsck crashes in @BlueStore::_fsck_check_extents@ with @ceph_assert(pos < bs.size())@, so fsck also seeks ... Jonas Jelten
05:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Jonas Jelten wrote:
> here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore f...
Igor Fedotov
04:05 PM Backport #48478 (In Progress): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ce...
Igor Fedotov
04:05 PM Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
https://github.com/ceph/ceph/pull/38474 Igor Fedotov
04:04 PM Backport #48479 (In Progress): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause c...
https://github.com/ceph/ceph/pull/38475 Igor Fedotov
02:55 PM Bug #48036: bluefs corrupted in a OSD
Satoru Takeuchi wrote:
>
> > It's still unclear to me why multiple OSD instances are able to bypass exclusive lock...
Igor Fedotov
06:41 AM Bug #48036: bluefs corrupted in a OSD
> it seems to me that this is OSD main device (or corresponding symlink in OSD fulder) not fsid file which is locked ... Satoru Takeuchi
12:39 AM Bug #48389: _do_read bdev-read failed
Igor Fedotov wrote:
> Seena,
> mind this to be closed as invalid?
I’ve change my disk and seems it was because o...
Seena Fallah

12/15/2020

02:05 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38310
m...
Nathan Cutler
12:08 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore free-dump (which also cr... Jonas Jelten
11:20 AM Bug #48389: _do_read bdev-read failed
Seena,
mind this to be closed as invalid?
Igor Fedotov

12/14/2020

09:18 PM Backport #47669 (Resolved): nautilus: Some structs aren't bound to mempools properly
Igor Fedotov
06:55 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38310
merged
Yuri Weinstein
12:40 PM Bug #48036: bluefs corrupted in a OSD
Satoru Takeuchi wrote:
> @Igor
>
> This problem can be fixed by providing an option to move fsid file to other pl...
Igor Fedotov
 

Also available in: Atom