Project

General

Profile

Activity

From 12/29/2021 to 01/27/2022

01/27/2022

02:11 PM Bug #53466: OSD is unable to allocate free space for BlueFS
Sorry for the lates response, Jan and I have been looking into the issue together.
We were unable to retrieve the ...
Alexander Trost

01/26/2022

11:42 PM Backport #53609: octopus: crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44724
merged
Yuri Weinstein
01:26 PM Bug #54019 (Resolved): OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
When upgrading to rook 1.8.3 (ceph 16.2.7) we experience issue's with the OSD initialization; basically only +/- 50% ... Paul Bormans

01/24/2022

10:58 AM Bug #53899 (Need More Info): bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg...
Igor Fedotov
10:58 AM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Thanks for the information!
Pivert Dubuisson wrote:
> Hi,
>
> I added the 2 other dumps to the same location....
Igor Fedotov

01/22/2022

12:46 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Re-posting the script as indentation broke it:... Pivert Dubuisson
12:43 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Hi,
I added the 2 other dumps to the same location. But I expect the same:
* https://www.pivert.org/osd.0_bluesto...
Pivert Dubuisson

01/21/2022

08:53 PM Feature #53571 (Resolved): test/allocator_replay_test: Add replay_alloc option
Igor Fedotov
05:44 PM Backport #53609 (In Progress): octopus: crash BlueStore::Onode::put from BlueStore::TransContext:...
Cory Snyder
05:44 PM Backport #53608 (In Progress): pacific: crash BlueStore::Onode::put from BlueStore::TransContext:...
Cory Snyder

01/20/2022

12:26 AM Bug #53184: failed to start new osd due to SIGSEGV in BlueStore::read()
I'll update my ceph clusters to 16.2.7 next week or next to next week (upgrading rook for now). I'll run the reproduc... Satoru Takeuchi

01/19/2022

10:58 PM Bug #53184 (Need More Info): failed to start new osd due to SIGSEGV in BlueStore::read()
@Satoru - any updates? Igor Fedotov
10:56 PM Bug #53426 (Duplicate): rbd_directory (and some others) corrupted after update from 15.2.10 to 16...
Igor Fedotov
03:30 PM Bug #53906 (Duplicate): BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size)
Neha Ojha
12:36 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Pivert Dubuisson wrote:
> Hi,
> the file size exceed the max 1MB rule.... The file is temporarily there:
> www.pi...
Igor Fedotov

01/18/2022

12:01 PM Bug #53814: Pacific cluster crash
So unfortunately this last failure looks like a bug caused by downsizing allocation granularity to 4K. Which as I men... Igor Fedotov
12:28 AM Bug #53907: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
- OSD.0 also had similar crash... Vikhyat Umrao
12:25 AM Bug #53907: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
The same cluster OSD.95 had also hit the same assert and got restarted by systemd and after that running fine!... Vikhyat Umrao
12:21 AM Bug #53907: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
- After hitting this crash the systemd restarted the OSD container pod and after the restart, OSD is running fine!
Vikhyat Umrao
12:20 AM Bug #53907 (Resolved): BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
... Vikhyat Umrao

01/17/2022

10:45 PM Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size)
OSD.138 logs for thread id - 7f2bd48ec700:... Vikhyat Umrao
10:38 PM Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size)
- Looks like the systemd restarted the OSD container pod after hitting this crash and after the restart, OSD is runni... Vikhyat Umrao
10:37 PM Bug #53906 (Duplicate): BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size)
... Vikhyat Umrao
06:38 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Hi,
the file size exceed the max 1MB rule.... The file is temporarily there:
www.pivert.org/osd.2_bluestore_alloca...
Pivert Dubuisson
06:28 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Pivert Dubuisson wrote:
> Hi,
>
> Obviously, I've been able to get out of this terrible situation.
> Since I'm o...
Igor Fedotov
12:31 AM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Hi,
Obviously, I've been able to get out of this terrible situation.
Since I'm on lvm (on top of nvme) and I was ...
Pivert Dubuisson
12:00 AM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Just realized that I forgot to mention the tar exension for some files which could be confusing.
The files like:
c...
Pivert Dubuisson
06:30 PM Bug #53590 (Duplicate): ceph abort at bluefs enospc
Igor Fedotov
05:33 PM Backport #53890 (In Progress): pacific: High memory usage in fsck/repair
https://github.com/ceph/ceph/pull/44613 Igor Fedotov
05:32 PM Backport #53891 (In Progress): octopus: High memory usage in fsck/repair
https://github.com/ceph/ceph/pull/44614 Igor Fedotov
09:14 AM Bug #53814: Pacific cluster crash
Igor Fedotov wrote:
> Could you please collect OSD startup log with debug-bluefs set to 10?1
Yes, you could downl...
Cyprien Devillez

01/16/2022

11:37 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Thanks Igor,
Just realized that the existing logs are already verbose and quite huge.
In attachment are the 3 res...
Pivert Dubuisson
09:07 PM Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")
Hey Pivert,
is it possible to get OSD startup log more verbose than just output to stderr? Actually I need one with ...
Igor Fedotov
05:32 PM Bug #53899 (Need More Info): bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg...
All OSDs failing to start after OSD near full. Cluster down.
3 nodes cluster (pve1, pve2, pve3) - Bluestore on a sin...
Pivert Dubuisson
09:34 PM Bug #53814: Pacific cluster crash
Cyprien Devillez wrote:
> I don't have much time since last post so I only try on other osd today and it's not worki...
Igor Fedotov

01/14/2022

07:02 PM Backport #53891 (Resolved): octopus: High memory usage in fsck/repair
Backport Bot
07:01 PM Backport #53890 (Resolved): pacific: High memory usage in fsck/repair
Backport Bot
06:56 PM Bug #44924 (Pending Backport): High memory usage in fsck/repair
Igor Fedotov
05:11 PM Bug #53699 (Resolved): NCB code doesn't update allocation file when we expand-device
Neha Ojha

01/13/2022

01:06 PM Bug #53814: Pacific cluster crash
I don't have much time since last post so I only try on other osd today and it's not working.
Now I have this erro...
Cyprien Devillez

01/10/2022

04:18 PM Bug #53814: Pacific cluster crash
Cyprien Devillez wrote:
> With *ceph config set osd.0 bluefs_shared_alloc_size 4096* and *systemctl start ceph-osd@0...
Igor Fedotov
03:53 PM Bug #53814: Pacific cluster crash
With *ceph config set osd.0 bluefs_shared_alloc_size 4096* and *systemctl start ceph-osd@0.service*, osd.0 is back an... Cyprien Devillez
03:22 PM Bug #53814: Pacific cluster crash

Sorry, I meant bluefs_shared_alloc_size indeed.
As for space usage growth - I don't have any ideas at this point...
Igor Fedotov
02:55 PM Bug #53814: Pacific cluster crash
Thank for your reply.
I can't allocate more space, because it's on remote servers with fully allocated disk.
I tr...
Cyprien Devillez
11:55 AM Bug #53814 (Won't Fix): Pacific cluster crash
Given the following output
-4> 2022-01-06T15:30:36.463+0100 7fb49cf7bf00 1 bluefs _allocate unable to allocate 0x90...
Igor Fedotov
11:13 AM Bug #53814 (Won't Fix): Pacific cluster crash
Hi all,
Last Thursday, few days after an Octopus to Pacific upgrade on a 4 hosts Proxmox install, my Ceph Cluster ...
Cyprien Devillez

12/31/2021

02:33 AM Bug #53748 (New): scrape-health-metrics swallows sudo error messages
When you run `ceph device scrape-health-metrics` and `sudo` fails, sudo's stderr is discarded, making debugging hard.... Niklas Hambuechen
 

Also available in: Atom