Activity
From 11/25/2020 to 12/24/2020
12/24/2020
- 02:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hd... - 12:25 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- @Igor: Thanks for all the clear info. We've changed to the bitmap allocators on all our hdd clusters. (we left ssd-on...
- 04:29 AM Bug #48696 (Fix Under Review): osd assert because of aios will be truncated.
12/23/2020
- 08:41 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Konstantin Shalygin wrote:
> > > > Seena,... - 08:35 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> Igor Fedotov wrote:
> > Dan van der Ster wrote:
> > > Igor or others, do you have any in... - 08:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > Konstantin Shalygin wrote:
> > > Seena, Igor already push fixes for ... - 07:16 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Dan van der Ster wrote:
> > Igor or others, do you have any insight into which exact conditio... - 03:29 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Konstantin Shalygin wrote:
> > Seena, Igor already push fixes for hybrid allocator to review ... - 03:25 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > So generally the issue is that hybrid allocator might return out-of-b... - 03:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Dan van der Ster wrote:
> Igor or others, do you have any insight into which exact conditions can trigger the alloca... - 02:54 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Bastian Mäuser wrote:
> How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is w... - 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Konstantin Shalygin wrote:
> Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release... - 01:33 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena, Igor already push fixes for hybrid allocator to review for next Nautilus release.
- 12:27 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Isn't it better to change the default allocator to bitmap while the bug fix? I have a various heartbeat_map timeout i...
- 10:26 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Konstantin Shalygin wrote:
> > (I'm just checking all bases -- we have been lucky so far to not see a single instanc... - 06:18 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- > (I'm just checking all bases -- we have been lucky so far to not see a single instance or this crash on 5000 osds)
... - 01:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- > For (2), Do you mean if we enable for example bluefs_buffered_io?
Deferred writes have nothing to do with buffer... - 12:59 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- How does it behave on rbd only usage? Do deferred writes occur there at all? Maybe this is why my errors allowed all ...
12/22/2020
- 10:50 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> So generally the issue is that hybrid allocator might return out-of-bound extent while process... - 06:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor or others, do you have any insight into which exact conditions can trigger the allocation bug? Any particular us...
- 05:52 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So generally the issue is that hybrid allocator might return out-of-bound extent while processing write request.
Dep... - 01:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Interessting. In my Case (I am on .11, osd's were initially created on .8) I don't need to recreate the OSD's. Maybe ...
- 12:42 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Yes dead in permanently dead until I recreate the OSD. The drive itself is well, just the OSD data corrupts. The star...
- 12:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So is the proposed workaround to set bluefs_allocator to "bitmap" or what? Can I do that on a running cluster?
@Ig... - 12:16 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- @Jonas:
Dead OSD's? permanently? In my case "just" the osd process died and restarted itsself. - 08:23 PM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
- The page of the user who raised this issue shows the following:
Registered on: 05/10/2018
Last connection... - 03:29 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
- * 1.anomalies
osd assert after it‘s reboot,just like the following:...
12/21/2020
- 07:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It's set by @bluefs_allocator@ at bluestore @mkfs@ time: https://github.com/ceph/ceph/blob/b1d0fa70590c23e80a09638df9...
- 11:37 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be good if you could issue a workaround howto.
12/18/2020
- 02:39 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37794
m... - 08:27 AM Backport #47671 (Resolved): octopus: Hybrid allocator might cause duplicate admin socket command ...
- 02:39 PM Backport #47708 (Resolved): octopus: Potential race condition regression around new OSD flock()s
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37860
m... - 01:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
- Igor wrote in the mimic backport issue: "We don't have hybrid allocator in mimic and there are no related (claim_free...
12/17/2020
- 11:32 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37794
merged - 10:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Things keep occurring to me after I press <enter>. :)
When this issue occurs on our spinners, the read rate is ver... - 10:05 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Actually, I should be careful - we have _definitely_ seen the symptom of high read rate on Luminous (https://tracker....
- 09:38 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- I doubt it helps, but I just wanted to add a "me too" here on 14.2.11. We're augmenting a cluster and had moved a few...
- 08:53 PM Backport #47708: octopus: Potential race condition regression around new OSD flock()s
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37860
merged - 05:36 PM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37861
m... - 05:22 PM Backport #47892: octopus: Compressed blobs lack checksums
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37861
merged - 01:14 PM Bug #48276 (Triaged): OSD Crash with ceph_assert(is_valid_io(off, len))
- 01:14 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- After some analysis IMO the root cause is highly likely the same as for https://tracker.ceph.com/issues/47751
Under ... - 01:05 PM Backport #48093 (In Progress): nautilus: Hybrid allocator might segfault when fallback allocator ...
- https://github.com/ceph/ceph/pull/38637
- 01:02 PM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
- We don't have hybrid allocator in mimic and there are no related (*claim_free_to_*) methods in bitmap one. Hence no n...
- 01:37 AM Bug #48036: bluefs corrupted in a OSD
- Igor Fedotov wrote:
> On the other hand log directory is shared among containers as we could see output from multip...
12/16/2020
- 05:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here ya go, fsck crashes in @BlueStore::_fsck_check_extents@ with @ceph_assert(pos < bs.size())@, so fsck also seeks ...
- 05:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Jonas Jelten wrote:
> here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore f... - 04:05 PM Backport #48478 (In Progress): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ce...
- 04:05 PM Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- https://github.com/ceph/ceph/pull/38474
- 04:04 PM Backport #48479 (In Progress): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause c...
- https://github.com/ceph/ceph/pull/38475
- 02:55 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
>
> > It's still unclear to me why multiple OSD instances are able to bypass exclusive lock... - 06:41 AM Bug #48036: bluefs corrupted in a OSD
- > it seems to me that this is OSD main device (or corresponding symlink in OSD fulder) not fsid file which is locked ...
- 12:39 AM Bug #48389: _do_read bdev-read failed
- Igor Fedotov wrote:
> Seena,
> mind this to be closed as invalid?
I’ve change my disk and seems it was because o...
12/15/2020
- 02:05 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38310
m... - 12:08 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore free-dump (which also cr...
- 11:20 AM Bug #48389: _do_read bdev-read failed
- Seena,
mind this to be closed as invalid?
12/14/2020
- 09:18 PM Backport #47669 (Resolved): nautilus: Some structs aren't bound to mempools properly
- 06:55 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38310
merged - 12:40 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
> @Igor
>
> This problem can be fixed by providing an option to move fsid file to other pl...
12/12/2020
- 08:28 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Jonas Jelten wrote:
> Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze t... - 02:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze things there if you like.
12/11/2020
- 06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Is there any progress on tracking this down? Is it octopus also affected (#46800)?
I've lost 2 different OSDs runn...
12/07/2020
- 01:33 PM Backport #48479 (Resolved): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph...
- https://github.com/ceph/ceph/pull/38475
- 01:33 PM Backport #48478 (Resolved): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_...
- https://github.com/ceph/ceph/pull/38474
- 01:31 PM Backport #48477 (Rejected): octopus: osd: fix bluestore avl allocator
- 01:27 PM Backport #48477 (Resolved): octopus: osd: fix bluestore avl allocator
- https://github.com/ceph/ceph/pull/43747
- 01:31 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
- 01:27 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
- 01:31 PM Fix #48272 (Resolved): osd: fix bluestore avl allocator
- 01:26 PM Fix #48272 (Pending Backport): osd: fix bluestore avl allocator
12/06/2020
- 04:54 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- ok sorry, didn't know that. Yes then it has changed while installing the .12 release.
- 10:00 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- The default allocator was changed from bitmap to hybrid starting v14.2.11.
Hence unless you had made any custom sett... - 08:21 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Igor Fedotov wrote:
> Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then...
12/05/2020
- 01:51 PM Bug #47174 (Resolved): [BlueStore] Pool/PG deletion(space reclamation) is very slow
- 01:49 PM Bug #47883 (Pending Backport): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
- 12:11 AM Bug #48036: bluefs corrupted in a OSD
- @Igor
This problem can be fixed by providing an option to move fsid file to other place.
Then Rook, and possibly ...
12/04/2020
- 02:10 PM Bug #48389 (Triaged): _do_read bdev-read failed
- 02:04 PM Bug #48036 (Need More Info): bluefs corrupted in a OSD
- 02:03 PM Bug #48036 (Triaged): bluefs corrupted in a OSD
- 02:01 PM Fix #48288 (Need More Info): test/objectstore: allocate function may return -ENOSPC
- Could you please provide more information on the issue?
Which test case is failing, what's the output etc... - 10:58 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Martin Verges wrote:
> just as an additional Information, the allocator is already bitmap.
Do you mean it WAS bit...
12/03/2020
- 10:09 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- just as an additional Information, the allocator is already bitmap.
- 07:31 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Well, my previous comment is valid for Martin's issue. It's still unclear for me whether it applies to the original r...
- 07:24 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- The issue is caused by a bug in avl/hybrid allocators. The workaround would be switching back to bitmap allocator.
- 07:23 PM Bug #47883 (Fix Under Review): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
- 12:25 PM Backport #48281 (In Progress): octopus: osd: fix bluestore bitmap allocator
- 12:24 PM Backport #48194 (In Progress): octopus: bufferlist c_str() sometimes clears assignment to mempool
- 12:23 PM Backport #48094 (In Progress): octopus: Hybrid allocator might segfault when fallback allocator i...
- 01:03 AM Bug #48443 (New): rocksdb: Corruption: missing start of fragmented record(2)
- Hi, Guys!
This happened after a power failure.
It seems that a simple rocksdb corruption, unfortunately, throw...
12/02/2020
- 11:20 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- We triggered the bug on a production cluster having a lot of small files and objects as their workload.
The affected...
11/30/2020
- 01:59 PM Bug #40434: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD
- This is old, But I'm experiencing this today on the newest 15.2.6. Hopes this can be fixed.
I'm trying to remove t...
11/29/2020
- 11:48 PM Bug #48036: bluefs corrupted in a OSD
- Thank you for your input!
> I'll continue to investigate this issue with other Rook developers.
I found there i...
11/28/2020
- 12:00 AM Bug #48389: _do_read bdev-read failed
- Seena Fallah wrote:
> You are right. It seems the disk has read error by itself and this occurs 3 times today and I'...
11/27/2020
- 11:51 PM Bug #48389: _do_read bdev-read failed
- You are right. It seems the disk has read error by itself and this occurs 3 times today and I'm wondering why Ceph do...
- 11:29 PM Bug #48389: _do_read bdev-read failed
- Thanks for sharing!
Unfortunately too low debug level for bdev hence not much useful info.
Wondering if you're ab... - 10:58 PM Bug #48389: _do_read bdev-read failed
- Thanks for your review. Here you go.
- 09:47 PM Bug #48389: _do_read bdev-read failed
- I think this is another form of https://tracker.ceph.com/issues/48276
And the root cause is presumably pretty the sa... - 05:29 PM Bug #48389 (Rejected): _do_read bdev-read failed
- I think it happens because of deep scrubbing as I see the one here https://tracker.ceph.com/issues/36455#note-11
<... - 11:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be good to have a confirmation which Version fixes this regression. It must have been introduced after 14.2.9.
- 04:04 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Did QA and QE run on bitmap allocator too ... - 03:38 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
>
>... - 03:34 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
Sorry I'm not getting the qu... - 03:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
- 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Just to prioritize this issue another OSD from my SSD tier fails :(
Mind switching to bitma... - 01:07 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Just to prioritize this issue another OSD from my SSD tier fails :(
11/26/2020
- 07:31 PM Backport #47669 (In Progress): nautilus: Some structs aren't bound to mempools properly
- https://github.com/ceph/ceph/pull/38310
- 07:19 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > I faced this issue again in nautilus 14.2.14 and there is a log about... - 06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator
> [... - 03:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator...
- 06:32 PM Bug #48036: bluefs corrupted in a OSD
- To troubleshoot 2) one might try the following:
- Create two containers that access a single shared folder from a ho... - 06:24 AM Bug #48036: bluefs corrupted in a OSD
- > You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they wi...
- 10:24 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- @Igor another dimension to this that I haven't seen discussed yet -- AFAIU, PG deletion happens concurrently, for exa...
11/25/2020
- 12:11 PM Bug #48036: bluefs corrupted in a OSD
- May be multiple containers attached to the same volume by some chance?
- 12:04 PM Bug #48036: bluefs corrupted in a OSD
- You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they will...
- 12:02 PM Bug #48036: bluefs corrupted in a OSD
- Hence presumable we have multiple ceph-osd instances using the same bluefs.
I can see at least two issues here. Both... - 11:58 AM Bug #48036: bluefs corrupted in a OSD
- So my hypothesis about multiple kv_sync_thread-s is confirmed. Here is the log snippet from OSD log:
Thread 7faf0e... - 05:56 AM Bug #48036: bluefs corrupted in a OSD
- > > could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore set to ...
- 01:47 AM Bug #48036: bluefs corrupted in a OSD
- @Igor
> You mentioned that osd_tail.log was truncated Now I believe I need the full one so could you please send...
Also available in: Atom