Activity
From 12/14/2020 to 01/12/2021
01/12/2021
- 05:06 PM Bug #48729 (Triaged): Bluestore memory leak on srub operations
- It looks like high RAM usage is caused by improper onode cache trimming inside BlueStore. Which in turn might be caus...
- 10:55 AM Bug #48729: Bluestore memory leak on srub operations
- @Igor
here you are:
https://cf2.cloudferro.com:8080/swift/v1/AUTH_5b9ea421deb745bfb4dab930cebe153f/ceph-sharings/... - 02:02 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- Thank you thank you. They are attached.
Best,
Will - 11:43 AM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- @Will - to make block.db extract just use:
dd if=block.db ibs=1 skip=15589376 count=32768 of=dump.out - 01:19 PM Bug #48849: BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
- Wondering if you had experienced any recent OSD crashes prior to this failure?
You might also want to Check for HW... - 12:43 PM Bug #48849: BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
- BTW, I looked through other reported issues and found https://tracker.ceph.com/issues/48002 or https://tracker.ceph.c...
- 12:41 PM Bug #48849 (Need More Info): BlueStore.cc: 11380: FAILED ceph_assert(r == 0)
- We experienced a few OSD crashes all with the same signature in the logs:
--- cut ---
2021-01-08 06:13:54.946 7f3... - 11:43 AM Backport #48194 (Resolved): octopus: bufferlist c_str() sometimes clears assignment to mempool
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38429
m... - 11:43 AM Backport #48094 (Resolved): octopus: Hybrid allocator might segfault when fallback allocator is p...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38428
m... - 11:41 AM Backport #48093: nautilus: Hybrid allocator might segfault when fallback allocator is present
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38637
m... - 11:40 AM Backport #47672: nautilus: Hybrid allocator might cause duplicate admin socket command registrati...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37793
m... - 10:23 AM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
- to answer my question - head -n 2 /dev/vg/lv will give the block device uuid
- 09:44 AM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
- Any way to determine the correct DB->Block arrangement after they are lost? I have a host that has hit this bug and a...
- 01:19 AM Bug #48776: ObjectStore/StoreTest hangs
- ...
01/11/2021
- 09:19 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- HI Igor -
I feel like I did something wrong as hexdump returned nothing... My apologies we haven't slept much
@ro... - 08:33 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- @Will, would you please share the hex dump of block.db file starting offset 0xede000 length 0x8000.
Latest startup... - 05:00 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- Igor, thank you! It's attached.
Will - 04:24 PM Bug #48827: Ceph Bluestore OSDs fail to start on WAL corruption
- @William - would you please share OSD startup log with debug-bluefs set to 20?
- 04:03 PM Bug #48827 (Duplicate): Ceph Bluestore OSDs fail to start on WAL corruption
- Hi -
I posted a note to the Ceph user list also, but we've run into this bug and it unfortunately hit 5 OSDs at th... - 07:59 PM Bug #48729: Bluestore memory leak on srub operations
- Presuming mem utilization is still that high could you please temporary set debug_bluestore to 20 for the osd in ques...
- 10:25 AM Bug #48729: Bluestore memory leak on srub operations
- Unfortunately, That's not the case. After 4 days some of the osds took >10GB of ram.
In example:... - 07:55 PM Backport #48194: octopus: bufferlist c_str() sometimes clears assignment to mempool
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38429
merged - 07:55 PM Backport #48094: octopus: Hybrid allocator might segfault when fallback allocator is present
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38428
merged - 04:50 PM Bug #47443 (Resolved): Hybrid allocator might cause duplicate admin socket command registration.
- 04:49 PM Backport #47672 (Resolved): nautilus: Hybrid allocator might cause duplicate admin socket command...
- 04:43 PM Backport #47672: nautilus: Hybrid allocator might cause duplicate admin socket command registrati...
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37793
merged - 04:48 PM Backport #48093 (Resolved): nautilus: Hybrid allocator might segfault when fallback allocator is ...
- 04:44 PM Backport #48093: nautilus: Hybrid allocator might segfault when fallback allocator is present
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38637
merged - 04:13 AM Bug #48819 (New): fsck error: found stray (per-pg) omap data on omap_head
- /a/kchai-2021-01-10_13:20:22-rados-master-distro-basic-smithi/
01/08/2021
- 01:08 PM Bug #48781: crash in BlueStore::Onode::put()
- and on the last host:
Jan 7 07:34:17 ceph2 kernel: [107054.315343] tp_osd_tp[20519]: segfault at 0 ip 00007efd3db... - 01:04 PM Bug #48781: crash in BlueStore::Onode::put()
- On another system we see the following to:
Jan 7 10:02:32 ceph1 kernel: [114774.759038] tp_osd_tp[17449]: segfaul... - 01:02 PM Bug #48781: crash in BlueStore::Onode::put()
- We also see the following in our OS logs:
[119268.259883] tp_osd_tp[32332]: segfault at 0 ip 00007f8ccce40733 sp 0...
01/07/2021
- 09:34 PM Bug #48776: ObjectStore/StoreTest hangs
- ...
- 12:38 AM Bug #48776: ObjectStore/StoreTest hangs
- /a/teuthology-2021-01-05_07:01:02-rados-master-distro-basic-smithi/5755704
- 12:38 AM Bug #48776 (Resolved): ObjectStore/StoreTest hangs
- ...
- 02:45 PM Bug #48781: crash in BlueStore::Onode::put()
- Download file in attachment with extra logs
- 02:21 PM Bug #48781: crash in BlueStore::Onode::put()
- Here is some extra information regarding this problem:
{
"backtrace": [
"(()+0x12b20) [0x7f0afc7a8b2... - 09:26 AM Bug #48781 (Resolved): crash in BlueStore::Onode::put()
- Following the earlier issue reported in #48778, I now see frequent OSD crashes. I'm not sure both are related.
<pr...
01/06/2021
- 11:14 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- No problem, and thanks for confirming!
- 11:12 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Joshua Baergen wrote:
> Interesting, thanks. Is that 14.2.17 change this one: https://tracker.ceph.com/issues/47044 ... - 11:10 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Interesting, thanks. Is that 14.2.17 change this one: https://tracker.ceph.com/issues/47044 ?
FWIW, what I'm seein... - 11:07 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Joshua Baergen wrote:
> Hey Dan/Eric, did either of you see a big increase in the number of writes hitting your disk... - 11:00 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Hey Dan/Eric, did either of you see a big increase in the number of writes hitting your disks when buffered mode was ...
01/05/2021
- 03:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
- Fixing this error:...
- 03:02 PM Bug #46124 (Resolved): Potential race condition regression around new OSD flock()s
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:41 AM Bug #46490: osds crashing during deep-scrub
- We seem to hit the same behaviour after upgrading our ceph cluster from 12.2.12 to 14.2.11.
Since then we have quite... - 02:48 AM Support #48747: which version support spdk perfect?
- ...
- 02:45 AM Support #48747: which version support spdk perfect?
- 13.2.13 i encounter this fail when ceph-osd read data of 2048 counts of lbas...
- 02:42 AM Support #48747 (Closed): which version support spdk perfect?
- if i try 12.2.12 12.2.13 and 13.2.10 all of them can not run stable because of
crush when write or read? which ver...
01/01/2021
12/30/2020
- 03:26 PM Bug #48729: Bluestore memory leak on srub operations
- I will put this on test environment. Will see
- 02:39 PM Bug #48729: Bluestore memory leak on srub operations
- Wondering if you can try a patch from https://tracker.ceph.com/issues/46027 and check whether it's helpful in your ca...
- 01:39 PM Bug #48729 (Resolved): Bluestore memory leak on srub operations
- We observed some unlimited growing ram on OSD processes.
During our investigation (valgrind), we gathered informatio...
12/29/2020
- 03:44 PM Bug #45519: OSD asserts during block allocation for BlueFS
- Interestingly enough, I was able to get the OSD to start back up with the stupid allocator.. does that help?
- 03:35 PM Bug #45519: OSD asserts during block allocation for BlueFS
- This is what it looks like:
http://sprunge.us/MypV0b
And the high fragmentation is using the score. Sorry, we ...
12/28/2020
- 10:42 PM Bug #48726 (Rejected): /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_asser...
- 10:39 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
- Just beat me to it :-)
- 10:36 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
- Sorry, false alarm. The issue would have been caused by this disk error: ...
- 10:36 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
- There are the following lines prior to the assertion:
-3> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561e... - 10:02 PM Bug #48726: /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 &&...
- See also: https://tracker.ceph.com/issues/19984
- 09:58 PM Bug #48726 (Rejected): /build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_asser...
- Per attached log...
12/24/2020
- 02:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hd... - 12:25 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hdd clusters. (we left ssd-on...
- 04:29 AM Bug #48696 (Fix Under Review): osd assert because of aios will be truncated.
12/23/2020
- 08:41 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Konstantin Shalygin wrote:
> > > > Seena,... - 08:35 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> Igor Fedotov wrote:
> > Dan van der Ster wrote:
> > > Igor or others, do you have any in... - 08:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > Konstantin Shalygin wrote:
> > > Seena, Igor already push fixes for ... - 07:16 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Dan van der Ster wrote:
> > Igor or others, do you have any insight into which exact conditio... - 03:29 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Konstantin Shalygin wrote:
> > Seena, Igor already push fixes for hybrid allocator to review ... - 03:25 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > So generally the issue is that hybrid allocator might return out-of-b... - 03:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> Igor or others, do you have any insight into which exact conditions can trigger the alloca... - 02:54 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Bastian Mäuser wrote:
> How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is w... - 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Konstantin Shalygin wrote:
> Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release... - 01:33 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release.
- 12:27 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Isn't it better to change the default allocator to bitmap while the bug fix? I have a various heartbeat_map timeout i...
- 10:26 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Konstantin Shalygin wrote:
> > (I'm just checking all bases -- we have been lucky so far to not see a single instanc... - 06:18 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- > (I'm just checking all bases -- we have been lucky so far to not see a single instance or this crash on 5000 osds)
... - 01:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- > For (2), Do you mean if we enable for example bluefs_buffered_io?
Deferred writes have nothing to do with buffer... - 12:59 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is why my errors allowed all ...
12/22/2020
- 10:50 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> So generally the issue is that hybrid allocator might return out-of-bound extent while process... - 06:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor or others, do you have any insight into which exact conditions can trigger the allocation bug? Any particular us...
- 05:52 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So generally the issue is that hybrid allocator might return out-of-bound extent while processing write request.
Dep... - 01:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Interessting. In my Case (I am on .11, osd's were initially created on .8) I don't need to recreate the OSD's. Maybe ...
- 12:42 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Yes dead in permanently dead until I recreate the OSD. The drive itself is well, just the OSD data corrupts. The star...
- 12:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So is the proposed workaround to set bluefs_allocator to "bitmap" or what? Can I do that on a running cluster?
@Ig... - 12:16 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- @Jonas:
Dead OSD's? permanently? In my case "just" the osd process died and restarted itsself. - 08:23 PM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
- The page of the user who raised this issue shows the following:
Registered on: 05/10/2018
Last connection... - 03:29 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
- * 1.anomalies
osd assert after it‘s reboot,just like the following:...
12/21/2020
- 07:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It's set by @bluefs_allocator@ at bluestore @mkfs@ time: https://github.com/ceph/ceph/blob/b1d0fa70590c23e80a09638df9...
- 11:37 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be good if you could issue a workaround howto.
12/18/2020
- 02:39 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37794
m... - 08:27 AM Backport #47671 (Resolved): octopus: Hybrid allocator might cause duplicate admin socket command ...
- 02:39 PM Backport #47708 (Resolved): octopus: Potential race condition regression around new OSD flock()s
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37860
m... - 01:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
- Igor wrote in the mimic backport issue: "We don't have hybrid allocator in mimic and there are no related (claim_free...
12/17/2020
- 11:32 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37794
merged - 10:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Things keep occurring to me after I press <enter>. :)
When this issue occurs on our spinners, the read rate is ver... - 10:05 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Actually, I should be careful - we have _definitely_ seen the symptom of high read rate on Luminous (https://tracker....
- 09:38 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- I doubt it helps, but I just wanted to add a "me too" here on 14.2.11. We're augmenting a cluster and had moved a few...
- 08:53 PM Backport #47708: octopus: Potential race condition regression around new OSD flock()s
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37860
merged - 05:36 PM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37861
m... - 05:22 PM Backport #47892: octopus: Compressed blobs lack checksums
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37861
merged - 01:14 PM Bug #48276 (Triaged): OSD Crash with ceph_assert(is_valid_io(off, len))
- 01:14 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- After some analysis IMO the root cause is highly likely the same as for https://tracker.ceph.com/issues/47751
Under ... - 01:05 PM Backport #48093 (In Progress): nautilus: Hybrid allocator might segfault when fallback allocator ...
- https://github.com/ceph/ceph/pull/38637
- 01:02 PM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
- We don't have hybrid allocator in mimic and there are no related (*claim_free_to_*) methods in bitmap one. Hence no n...
- 01:37 AM Bug #48036: bluefs corrupted in a OSD
- Igor Fedotov wrote:
> On the other hand log directory is shared among containers as we could see output from multip...
12/16/2020
- 05:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here ya go, fsck crashes in @BlueStore::_fsck_check_extents@ with @ceph_assert(pos < bs.size())@, so fsck also seeks ...
- 05:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Jonas Jelten wrote:
> here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore f... - 04:05 PM Backport #48478 (In Progress): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ce...
- 04:05 PM Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- https://github.com/ceph/ceph/pull/38474
- 04:04 PM Backport #48479 (In Progress): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause c...
- https://github.com/ceph/ceph/pull/38475
- 02:55 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
>
> > It's still unclear to me why multiple OSD instances are able to bypass exclusive lock... - 06:41 AM Bug #48036: bluefs corrupted in a OSD
- > it seems to me that this is OSD main device (or corresponding symlink in OSD fulder) not fsid file which is locked ...
- 12:39 AM Bug #48389: _do_read bdev-read failed
- Igor Fedotov wrote:
> Seena,
> mind this to be closed as invalid?
I’ve change my disk and seems it was because o...
12/15/2020
- 02:05 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38310
m... - 12:08 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore free-dump (which also cr...
- 11:20 AM Bug #48389: _do_read bdev-read failed
- Seena,
mind this to be closed as invalid?
12/14/2020
- 09:18 PM Backport #47669 (Resolved): nautilus: Some structs aren't bound to mempools properly
- 06:55 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38310
merged - 12:40 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
> @Igor
>
> This problem can be fixed by providing an option to move fsid file to other pl...
Also available in: Atom