Project

General

Profile

Activity

From 11/25/2020 to 12/24/2020

12/24/2020

02:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hd...
Igor Fedotov
12:25 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
@Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hdd clusters. (we left ssd-on... Dan van der Ster
04:29 AM Bug #48696 (Fix Under Review): osd assert because of aios will be truncated.
Kefu Chai

12/23/2020

08:41 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Konstantin Shalygin wrote:
> > > > Seena,...
Igor Fedotov
08:35 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> Igor Fedotov wrote:
> > Dan van der Ster wrote:
> > > Igor or others, do you have any in...
Igor Fedotov
08:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Seena Fallah wrote:
> > Konstantin Shalygin wrote:
> > > Seena, Igor already push fixes for ...
Seena Fallah
07:16 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Dan van der Ster wrote:
> > Igor or others, do you have any insight into which exact conditio...
Dan van der Ster
03:29 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Konstantin Shalygin wrote:
> > Seena, Igor already push fixes for hybrid allocator to review ...
Igor Fedotov
03:25 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Igor Fedotov wrote:
> > So generally the issue is that hybrid allocator might return out-of-b...
Igor Fedotov
03:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Dan van der Ster wrote:
> Igor or others, do you have any insight into which exact conditions can trigger the alloca...
Igor Fedotov
02:54 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Bastian Mäuser wrote:
> How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is w...
Igor Fedotov
02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Konstantin Shalygin wrote:
> Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release...
Seena Fallah
01:33 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release. Konstantin Shalygin
12:27 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Isn't it better to change the default allocator to bitmap while the bug fix? I have a various heartbeat_map timeout i... Seena Fallah
10:26 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Konstantin Shalygin wrote:
> > (I'm just checking all bases -- we have been lucky so far to not see a single instanc...
Dan van der Ster
06:18 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
> (I'm just checking all bases -- we have been lucky so far to not see a single instance or this crash on 5000 osds)
...
Konstantin Shalygin
01:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
> For (2), Do you mean if we enable for example bluefs_buffered_io?
Deferred writes have nothing to do with buffer...
Dan van der Ster
12:59 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is why my errors allowed all ... Bastian Mäuser

12/22/2020

10:50 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> So generally the issue is that hybrid allocator might return out-of-bound extent while process...
Seena Fallah
06:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor or others, do you have any insight into which exact conditions can trigger the allocation bug? Any particular us... Dan van der Ster
05:52 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
So generally the issue is that hybrid allocator might return out-of-bound extent while processing write request.
Dep...
Igor Fedotov
01:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Interessting. In my Case (I am on .11, osd's were initially created on .8) I don't need to recreate the OSD's. Maybe ... Bastian Mäuser
12:42 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Yes dead in permanently dead until I recreate the OSD. The drive itself is well, just the OSD data corrupts. The star... Jonas Jelten
12:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
So is the proposed workaround to set bluefs_allocator to "bitmap" or what? Can I do that on a running cluster?
@Ig...
Bastian Mäuser
12:16 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
@Jonas:
Dead OSD's? permanently? In my case "just" the osd process died and restarted itsself.
Bastian Mäuser
08:23 PM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
The page of the user who raised this issue shows the following:
Registered on: 05/10/2018
Last connection...
Zac Dover
03:29 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
* 1.anomalies
osd assert after it‘s reboot,just like the following:...
hongsong wu

12/21/2020

07:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
It's set by @bluefs_allocator@ at bluestore @mkfs@ time: https://github.com/ceph/ceph/blob/b1d0fa70590c23e80a09638df9... Jonas Jelten
11:37 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
It would be good if you could issue a workaround howto. Bastian Mäuser

12/18/2020

02:39 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37794
m...
Nathan Cutler
08:27 AM Backport #47671 (Resolved): octopus: Hybrid allocator might cause duplicate admin socket command ...
Igor Fedotov
02:39 PM Backport #47708 (Resolved): octopus: Potential race condition regression around new OSD flock()s
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37860
m...
Nathan Cutler
01:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
Igor wrote in the mimic backport issue: "We don't have hybrid allocator in mimic and there are no related (claim_free... Nathan Cutler

12/17/2020

11:32 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37794
merged
Yuri Weinstein
10:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Things keep occurring to me after I press <enter>. :)
When this issue occurs on our spinners, the read rate is ver...
Joshua Baergen
10:05 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Actually, I should be careful - we have _definitely_ seen the symptom of high read rate on Luminous (https://tracker.... Joshua Baergen
09:38 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
I doubt it helps, but I just wanted to add a "me too" here on 14.2.11. We're augmenting a cluster and had moved a few... Joshua Baergen
08:53 PM Backport #47708: octopus: Potential race condition regression around new OSD flock()s
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37860
merged
Yuri Weinstein
05:36 PM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37861
m...
Nathan Cutler
05:22 PM Backport #47892: octopus: Compressed blobs lack checksums
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37861
merged
Yuri Weinstein
01:14 PM Bug #48276 (Triaged): OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov
01:14 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
After some analysis IMO the root cause is highly likely the same as for https://tracker.ceph.com/issues/47751
Under ...
Igor Fedotov
01:05 PM Backport #48093 (In Progress): nautilus: Hybrid allocator might segfault when fallback allocator ...
https://github.com/ceph/ceph/pull/38637 Igor Fedotov
01:02 PM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
We don't have hybrid allocator in mimic and there are no related (*claim_free_to_*) methods in bitmap one. Hence no n... Igor Fedotov
01:37 AM Bug #48036: bluefs corrupted in a OSD
Igor Fedotov wrote:
> On the other hand log directory is shared among containers as we could see output from multip...
Satoru Takeuchi

12/16/2020

05:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
here ya go, fsck crashes in @BlueStore::_fsck_check_extents@ with @ceph_assert(pos < bs.size())@, so fsck also seeks ... Jonas Jelten
05:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Jonas Jelten wrote:
> here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore f...
Igor Fedotov
04:05 PM Backport #48478 (In Progress): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ce...
Igor Fedotov
04:05 PM Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
https://github.com/ceph/ceph/pull/38474 Igor Fedotov
04:04 PM Backport #48479 (In Progress): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause c...
https://github.com/ceph/ceph/pull/38475 Igor Fedotov
02:55 PM Bug #48036: bluefs corrupted in a OSD
Satoru Takeuchi wrote:
>
> > It's still unclear to me why multiple OSD instances are able to bypass exclusive lock...
Igor Fedotov
06:41 AM Bug #48036: bluefs corrupted in a OSD
> it seems to me that this is OSD main device (or corresponding symlink in OSD fulder) not fsid file which is locked ... Satoru Takeuchi
12:39 AM Bug #48389: _do_read bdev-read failed
Igor Fedotov wrote:
> Seena,
> mind this to be closed as invalid?
I’ve change my disk and seems it was because o...
Seena Fallah

12/15/2020

02:05 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38310
m...
Nathan Cutler
12:08 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore free-dump (which also cr... Jonas Jelten
11:20 AM Bug #48389: _do_read bdev-read failed
Seena,
mind this to be closed as invalid?
Igor Fedotov

12/14/2020

09:18 PM Backport #47669 (Resolved): nautilus: Some structs aren't bound to mempools properly
Igor Fedotov
06:55 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38310
merged
Yuri Weinstein
12:40 PM Bug #48036: bluefs corrupted in a OSD
Satoru Takeuchi wrote:
> @Igor
>
> This problem can be fixed by providing an option to move fsid file to other pl...
Igor Fedotov

12/12/2020

08:28 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Jonas Jelten wrote:
> Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze t...
Igor Fedotov
02:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze things there if you like. Jonas Jelten

12/11/2020

06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Is there any progress on tracking this down? Is it octopus also affected (#46800)?
I've lost 2 different OSDs runn...
Jonas Jelten

12/07/2020

01:33 PM Backport #48479 (Resolved): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph...
https://github.com/ceph/ceph/pull/38475 Igor Fedotov
01:33 PM Backport #48478 (Resolved): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_...
https://github.com/ceph/ceph/pull/38474 Igor Fedotov
01:31 PM Backport #48477 (Rejected): octopus: osd: fix bluestore avl allocator
Igor Fedotov
01:27 PM Backport #48477 (Resolved): octopus: osd: fix bluestore avl allocator
https://github.com/ceph/ceph/pull/43747 Igor Fedotov
01:31 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
Igor Fedotov
01:27 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
Igor Fedotov
01:31 PM Fix #48272 (Resolved): osd: fix bluestore avl allocator
Igor Fedotov
01:26 PM Fix #48272 (Pending Backport): osd: fix bluestore avl allocator
Igor Fedotov

12/06/2020

04:54 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
ok sorry, didn't know that. Yes then it has changed while installing the .12 release. Martin Verges
10:00 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
The default allocator was changed from bitmap to hybrid starting v14.2.11.
Hence unless you had made any custom sett...
Igor Fedotov
08:21 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Igor Fedotov wrote:
> Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then...
Martin Verges

12/05/2020

01:51 PM Bug #47174 (Resolved): [BlueStore] Pool/PG deletion(space reclamation) is very slow
Kefu Chai
01:49 PM Bug #47883 (Pending Backport): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
Kefu Chai
12:11 AM Bug #48036: bluefs corrupted in a OSD
@Igor
This problem can be fixed by providing an option to move fsid file to other place.
Then Rook, and possibly ...
Satoru Takeuchi

12/04/2020

02:10 PM Bug #48389 (Triaged): _do_read bdev-read failed
Igor Fedotov
02:04 PM Bug #48036 (Need More Info): bluefs corrupted in a OSD
Igor Fedotov
02:03 PM Bug #48036 (Triaged): bluefs corrupted in a OSD
Igor Fedotov
02:01 PM Fix #48288 (Need More Info): test/objectstore: allocate function may return -ENOSPC
Could you please provide more information on the issue?
Which test case is failing, what's the output etc...
Igor Fedotov
10:58 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Martin Verges wrote:
> just as an additional Information, the allocator is already bitmap.
Do you mean it WAS bit...
Igor Fedotov

12/03/2020

10:09 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
just as an additional Information, the allocator is already bitmap. Martin Verges
07:31 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Well, my previous comment is valid for Martin's issue. It's still unclear for me whether it applies to the original r... Igor Fedotov
07:24 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
The issue is caused by a bug in avl/hybrid allocators. The workaround would be switching back to bitmap allocator. Igor Fedotov
07:23 PM Bug #47883 (Fix Under Review): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
Igor Fedotov
12:25 PM Backport #48281 (In Progress): octopus: osd: fix bluestore bitmap allocator
Nathan Cutler
12:24 PM Backport #48194 (In Progress): octopus: bufferlist c_str() sometimes clears assignment to mempool
Nathan Cutler
12:23 PM Backport #48094 (In Progress): octopus: Hybrid allocator might segfault when fallback allocator i...
Nathan Cutler
01:03 AM Bug #48443 (New): rocksdb: Corruption: missing start of fragmented record(2)
Hi, Guys!
This happened after a power failure.
It seems that a simple rocksdb corruption, unfortunately, throw...
Gabriel Goes

12/02/2020

11:20 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
We triggered the bug on a production cluster having a lot of small files and objects as their workload.
The affected...
Martin Verges

11/30/2020

01:59 PM Bug #40434: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD
This is old, But I'm experiencing this today on the newest 15.2.6. Hopes this can be fixed.
I'm trying to remove t...
玮文 胡

11/29/2020

11:48 PM Bug #48036: bluefs corrupted in a OSD
Thank you for your input!
> I'll continue to investigate this issue with other Rook developers.
I found there i...
Satoru Takeuchi

11/28/2020

12:00 AM Bug #48389: _do_read bdev-read failed
Seena Fallah wrote:
> You are right. It seems the disk has read error by itself and this occurs 3 times today and I'...
Igor Fedotov

11/27/2020

11:51 PM Bug #48389: _do_read bdev-read failed
You are right. It seems the disk has read error by itself and this occurs 3 times today and I'm wondering why Ceph do... Seena Fallah
11:29 PM Bug #48389: _do_read bdev-read failed
Thanks for sharing!
Unfortunately too low debug level for bdev hence not much useful info.
Wondering if you're ab...
Igor Fedotov
10:58 PM Bug #48389: _do_read bdev-read failed
Thanks for your review. Here you go. Seena Fallah
09:47 PM Bug #48389: _do_read bdev-read failed
I think this is another form of https://tracker.ceph.com/issues/48276
And the root cause is presumably pretty the sa...
Igor Fedotov
05:29 PM Bug #48389 (Rejected): _do_read bdev-read failed
I think it happens because of deep scrubbing as I see the one here https://tracker.ceph.com/issues/36455#note-11
<...
Seena Fallah
11:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
It would be good to have a confirmation which Version fixes this regression. It must have been introduced after 14.2.9. Bastian Mäuser
04:04 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Did QA and QE run on bitmap allocator too ...
Igor Fedotov
03:38 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Seena Fallah wrote:
> > Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
>
>...
Seena Fallah
03:34 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
Sorry I'm not getting the qu...
Igor Fedotov
03:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Did QA and QE run on bitmap allocator too in nautilus 14.2.14? Seena Fallah
02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> Just to prioritize this issue another OSD from my SSD tier fails :(
Mind switching to bitma...
Igor Fedotov
01:07 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Just to prioritize this issue another OSD from my SSD tier fails :( Seena Fallah

11/26/2020

07:31 PM Backport #47669 (In Progress): nautilus: Some structs aren't bound to mempools properly
https://github.com/ceph/ceph/pull/38310 Igor Fedotov
07:19 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Seena Fallah wrote:
> > I faced this issue again in nautilus 14.2.14 and there is a log about...
Seena Fallah
06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Seena Fallah wrote:
> I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator
> [...
Igor Fedotov
03:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator... Seena Fallah
06:32 PM Bug #48036: bluefs corrupted in a OSD
To troubleshoot 2) one might try the following:
- Create two containers that access a single shared folder from a ho...
Igor Fedotov
06:24 AM Bug #48036: bluefs corrupted in a OSD
> You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they wi... Satoru Takeuchi
10:24 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
@Igor another dimension to this that I haven't seen discussed yet -- AFAIU, PG deletion happens concurrently, for exa... Dan van der Ster

11/25/2020

12:11 PM Bug #48036: bluefs corrupted in a OSD
May be multiple containers attached to the same volume by some chance? Igor Fedotov
12:04 PM Bug #48036: bluefs corrupted in a OSD
You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they will... Igor Fedotov
12:02 PM Bug #48036: bluefs corrupted in a OSD
Hence presumable we have multiple ceph-osd instances using the same bluefs.
I can see at least two issues here. Both...
Igor Fedotov
11:58 AM Bug #48036: bluefs corrupted in a OSD
So my hypothesis about multiple kv_sync_thread-s is confirmed. Here is the log snippet from OSD log:
Thread 7faf0e...
Igor Fedotov
05:56 AM Bug #48036: bluefs corrupted in a OSD
> > could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore set to ... Satoru Takeuchi
01:47 AM Bug #48036: bluefs corrupted in a OSD
@Igor
> You mentioned that osd_tail.log was truncated Now I believe I need the full one so could you please send...
Satoru Takeuchi
 

Also available in: Atom