Activity
From 11/23/2020 to 12/22/2020
12/22/2020
- 10:50 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> So generally the issue is that hybrid allocator might return out-of-bound extent while process... - 06:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor or others, do you have any insight into which exact conditions can trigger the allocation bug? Any particular us...
- 05:52 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So generally the issue is that hybrid allocator might return out-of-bound extent while processing write request.
Dep... - 01:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Interessting. In my Case (I am on .11, osd's were initially created on .8) I don't need to recreate the OSD's. Maybe ...
- 12:42 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Yes dead in permanently dead until I recreate the OSD. The drive itself is well, just the OSD data corrupts. The star...
- 12:20 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- So is the proposed workaround to set bluefs_allocator to "bitmap" or what? Can I do that on a running cluster?
@Ig... - 12:16 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- @Jonas:
Dead OSD's? permanently? In my case "just" the osd process died and restarted itsself. - 08:23 PM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
- The page of the user who raised this issue shows the following:
Registered on: 05/10/2018
Last connection... - 03:29 AM Bug #48696 (Resolved): osd assert because of aios will be truncated.
- * 1.anomalies
osd assert after it‘s reboot,just like the following:...
12/21/2020
- 07:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It's set by @bluefs_allocator@ at bluestore @mkfs@ time: https://github.com/ceph/ceph/blob/b1d0fa70590c23e80a09638df9...
- 11:37 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be good if you could issue a workaround howto.
12/18/2020
- 02:39 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37794
m... - 08:27 AM Backport #47671 (Resolved): octopus: Hybrid allocator might cause duplicate admin socket command ...
- 02:39 PM Backport #47708 (Resolved): octopus: Potential race condition regression around new OSD flock()s
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37860
m... - 01:06 PM Bug #47751: Hybrid allocator might segfault when fallback allocator is present
- Igor wrote in the mimic backport issue: "We don't have hybrid allocator in mimic and there are no related (claim_free...
12/17/2020
- 11:32 PM Backport #47671: octopus: Hybrid allocator might cause duplicate admin socket command registration.
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/37794
merged - 10:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Things keep occurring to me after I press <enter>. :)
When this issue occurs on our spinners, the read rate is ver... - 10:05 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Actually, I should be careful - we have _definitely_ seen the symptom of high read rate on Luminous (https://tracker....
- 09:38 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- I doubt it helps, but I just wanted to add a "me too" here on 14.2.11. We're augmenting a cluster and had moved a few...
- 08:53 PM Backport #47708: octopus: Potential race condition regression around new OSD flock()s
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37860
merged - 05:36 PM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37861
m... - 05:22 PM Backport #47892: octopus: Compressed blobs lack checksums
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37861
merged - 01:14 PM Bug #48276 (Triaged): OSD Crash with ceph_assert(is_valid_io(off, len))
- 01:14 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- After some analysis IMO the root cause is highly likely the same as for https://tracker.ceph.com/issues/47751
Under ... - 01:05 PM Backport #48093 (In Progress): nautilus: Hybrid allocator might segfault when fallback allocator ...
- https://github.com/ceph/ceph/pull/38637
- 01:02 PM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
- We don't have hybrid allocator in mimic and there are no related (*claim_free_to_*) methods in bitmap one. Hence no n...
- 01:37 AM Bug #48036: bluefs corrupted in a OSD
- Igor Fedotov wrote:
> On the other hand log directory is shared among containers as we could see output from multip...
12/16/2020
- 05:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here ya go, fsck crashes in @BlueStore::_fsck_check_extents@ with @ceph_assert(pos < bs.size())@, so fsck also seeks ...
- 05:21 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Jonas Jelten wrote:
> here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore f... - 04:05 PM Backport #48478 (In Progress): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ce...
- 04:05 PM Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- https://github.com/ceph/ceph/pull/38474
- 04:04 PM Backport #48479 (In Progress): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause c...
- https://github.com/ceph/ceph/pull/38475
- 02:55 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
>
> > It's still unclear to me why multiple OSD instances are able to bypass exclusive lock... - 06:41 AM Bug #48036: bluefs corrupted in a OSD
- > it seems to me that this is OSD main device (or corresponding symlink in OSD fulder) not fsid file which is locked ...
- 12:39 AM Bug #48389: _do_read bdev-read failed
- Igor Fedotov wrote:
> Seena,
> mind this to be closed as invalid?
I’ve change my disk and seems it was because o...
12/15/2020
- 02:05 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38310
m... - 12:08 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- here's the two dumps. first i launched the the osd itself and it crashed. then the bluestore free-dump (which also cr...
- 11:20 AM Bug #48389: _do_read bdev-read failed
- Seena,
mind this to be closed as invalid?
12/14/2020
- 09:18 PM Backport #47669 (Resolved): nautilus: Some structs aren't bound to mempools properly
- 06:55 PM Backport #47669: nautilus: Some structs aren't bound to mempools properly
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38310
merged - 12:40 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
> @Igor
>
> This problem can be fixed by providing an option to move fsid file to other pl...
12/12/2020
- 08:28 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Jonas Jelten wrote:
> Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze t... - 02:10 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Another OSD died, this time on a different host. Also 1.1TiB 10k HDD. I can dump and analyze things there if you like.
12/11/2020
- 06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Is there any progress on tracking this down? Is it octopus also affected (#46800)?
I've lost 2 different OSDs runn...
12/07/2020
- 01:33 PM Backport #48479 (Resolved): nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph...
- https://github.com/ceph/ceph/pull/38475
- 01:33 PM Backport #48478 (Resolved): octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_...
- https://github.com/ceph/ceph/pull/38474
- 01:31 PM Backport #48477 (Rejected): octopus: osd: fix bluestore avl allocator
- 01:27 PM Backport #48477 (Resolved): octopus: osd: fix bluestore avl allocator
- https://github.com/ceph/ceph/pull/43747
- 01:31 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
- 01:27 PM Backport #48476 (Rejected): nautilus: osd: fix bluestore avl allocator
- 01:31 PM Fix #48272 (Resolved): osd: fix bluestore avl allocator
- 01:26 PM Fix #48272 (Pending Backport): osd: fix bluestore avl allocator
12/06/2020
- 04:54 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- ok sorry, didn't know that. Yes then it has changed while installing the .12 release.
- 10:00 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- The default allocator was changed from bitmap to hybrid starting v14.2.11.
Hence unless you had made any custom sett... - 08:21 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Igor Fedotov wrote:
> Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then...
12/05/2020
- 01:51 PM Bug #47174 (Resolved): [BlueStore] Pool/PG deletion(space reclamation) is very slow
- 01:49 PM Bug #47883 (Pending Backport): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
- 12:11 AM Bug #48036: bluefs corrupted in a OSD
- @Igor
This problem can be fixed by providing an option to move fsid file to other place.
Then Rook, and possibly ...
12/04/2020
- 02:10 PM Bug #48389 (Triaged): _do_read bdev-read failed
- 02:04 PM Bug #48036 (Need More Info): bluefs corrupted in a OSD
- 02:03 PM Bug #48036 (Triaged): bluefs corrupted in a OSD
- 02:01 PM Fix #48288 (Need More Info): test/objectstore: allocate function may return -ENOSPC
- Could you please provide more information on the issue?
Which test case is failing, what's the output etc... - 10:58 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Martin Verges wrote:
> just as an additional Information, the allocator is already bitmap.
Do you mean it WAS bit...
12/03/2020
- 10:09 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- just as an additional Information, the allocator is already bitmap.
- 07:31 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- Well, my previous comment is valid for Martin's issue. It's still unclear for me whether it applies to the original r...
- 07:24 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- The issue is caused by a bug in avl/hybrid allocators. The workaround would be switching back to bitmap allocator.
- 07:23 PM Bug #47883 (Fix Under Review): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert...
- 12:25 PM Backport #48281 (In Progress): octopus: osd: fix bluestore bitmap allocator
- 12:24 PM Backport #48194 (In Progress): octopus: bufferlist c_str() sometimes clears assignment to mempool
- 12:23 PM Backport #48094 (In Progress): octopus: Hybrid allocator might segfault when fallback allocator i...
- 01:03 AM Bug #48443 (New): rocksdb: Corruption: missing start of fragmented record(2)
- Hi, Guys!
This happened after a power failure.
It seems that a simple rocksdb corruption, unfortunately, throw...
12/02/2020
- 11:20 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
- We triggered the bug on a production cluster having a lot of small files and objects as their workload.
The affected...
11/30/2020
- 01:59 PM Bug #40434: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD
- This is old, But I'm experiencing this today on the newest 15.2.6. Hopes this can be fixed.
I'm trying to remove t...
11/29/2020
- 11:48 PM Bug #48036: bluefs corrupted in a OSD
- Thank you for your input!
> I'll continue to investigate this issue with other Rook developers.
I found there i...
11/28/2020
- 12:00 AM Bug #48389: _do_read bdev-read failed
- Seena Fallah wrote:
> You are right. It seems the disk has read error by itself and this occurs 3 times today and I'...
11/27/2020
- 11:51 PM Bug #48389: _do_read bdev-read failed
- You are right. It seems the disk has read error by itself and this occurs 3 times today and I'm wondering why Ceph do...
- 11:29 PM Bug #48389: _do_read bdev-read failed
- Thanks for sharing!
Unfortunately too low debug level for bdev hence not much useful info.
Wondering if you're ab... - 10:58 PM Bug #48389: _do_read bdev-read failed
- Thanks for your review. Here you go.
- 09:47 PM Bug #48389: _do_read bdev-read failed
- I think this is another form of https://tracker.ceph.com/issues/48276
And the root cause is presumably pretty the sa... - 05:29 PM Bug #48389 (Rejected): _do_read bdev-read failed
- I think it happens because of deep scrubbing as I see the one here https://tracker.ceph.com/issues/36455#note-11
<... - 11:17 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be good to have a confirmation which Version fixes this regression. It must have been introduced after 14.2.9.
- 04:04 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > Seena Fallah wrote:
> > > Did QA and QE run on bitmap allocator too ... - 03:38 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
>
>... - 03:34 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
Sorry I'm not getting the qu... - 03:02 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Did QA and QE run on bitmap allocator too in nautilus 14.2.14?
- 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> Just to prioritize this issue another OSD from my SSD tier fails :(
Mind switching to bitma... - 01:07 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Just to prioritize this issue another OSD from my SSD tier fails :(
11/26/2020
- 07:31 PM Backport #47669 (In Progress): nautilus: Some structs aren't bound to mempools properly
- https://github.com/ceph/ceph/pull/38310
- 07:19 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Seena Fallah wrote:
> > I faced this issue again in nautilus 14.2.14 and there is a log about... - 06:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Seena Fallah wrote:
> I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator
> [... - 03:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I faced this issue again in nautilus 14.2.14 and there is a log about the HybridAllocator...
- 06:32 PM Bug #48036: bluefs corrupted in a OSD
- To troubleshoot 2) one might try the following:
- Create two containers that access a single shared folder from a ho... - 06:24 AM Bug #48036: bluefs corrupted in a OSD
- > You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they wi...
- 10:24 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- @Igor another dimension to this that I haven't seen discussed yet -- AFAIU, PG deletion happens concurrently, for exa...
11/25/2020
- 12:11 PM Bug #48036: bluefs corrupted in a OSD
- May be multiple containers attached to the same volume by some chance?
- 12:04 PM Bug #48036: bluefs corrupted in a OSD
- You can double check the above by trying to run multiple OSD-0 instance in parallel manually. Highly likely they will...
- 12:02 PM Bug #48036: bluefs corrupted in a OSD
- Hence presumable we have multiple ceph-osd instances using the same bluefs.
I can see at least two issues here. Both... - 11:58 AM Bug #48036: bluefs corrupted in a OSD
- So my hypothesis about multiple kv_sync_thread-s is confirmed. Here is the log snippet from OSD log:
Thread 7faf0e... - 05:56 AM Bug #48036: bluefs corrupted in a OSD
- > > could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore set to ...
- 01:47 AM Bug #48036: bluefs corrupted in a OSD
- @Igor
> You mentioned that osd_tail.log was truncated Now I believe I need the full one so could you please send...
11/24/2020
- 06:00 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Bastian, thanks!
- 12:40 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Bastian Mäuser wrote:
> Hi Igor,
>
> it's there, but not prior but after - assertion occured at 20:24:41:
I me... - 12:39 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Hi Igor,
it's there, but not prior but after - assertion occured at 20:24:41:
root@px2# zgrep fallback *
ceph-... - 11:57 AM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- It would be great if one try to grep failing OSD's logs (prior to the assertion) for "constructing fallback allocator...
11/23/2020
- 06:32 PM Bug #48036: bluefs corrupted in a OSD
- @Satoru,
could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore s... - 06:06 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
> @Igor
>
> Do you have any progress?
Hi Satoru,
sorry for a long response.
At the se... - 04:13 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- v14.2.11 has got hybrid allocator enabled but bluestore_volume_selection_policy was still at original there. Hence th...
- 03:00 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
I got the same issue in nautilus 14.2.11
it happened four times on different nodes..
- 02:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Thanks everybody for updates. Yeah I understand all the complexities for the debugging this so... - 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Thanks everybody for updates. Yeah I understand all the complexities for the debugging this sort of issues in a produ...
- 01:55 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I got the same issue in nautilus 14.2.14
it happened four times on different nodes.. - 01:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I got the same issue in nautilus 14.2.14
Full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ - 12:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Hi Igor,
thanks for answering.
The thing is:
- Issue isn't reproduceable
- Happens on Production Systems.
... - 12:45 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Meanwhile I see no way to troubleshoot this unless one is able to repro the issue with debug-bdev set to 20.
- 12:43 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- The following patch once merged [and backported] will provide more insight on the issue's root cause.
https://gith... - 01:46 PM Documentation #23443 (Resolved): doc: object -> file -> disk is wrong for bluestore
Also available in: Atom