Project

General

Profile

Activity

From 06/21/2020 to 07/20/2020

07/20/2020

03:20 PM Backport #46643 (Rejected): octopus: segv in LruOnodeCacheShard::_pin
Nathan Cutler
01:03 PM Bug #44213 (Resolved): Erasure coded pool might need much more disk space than expected
Igor Fedotov
06:37 AM Bug #38745: spillover that doesn't make sense
I found... Seena Fallah

07/18/2020

12:13 PM Backport #46584 (Need More Info): octopus: os/bluestore: simplify Onode pin/unpin logic.
this backport is on hold because the fix is baking in master Nathan Cutler
12:12 PM Bug #43147 (Pending Backport): segv in LruOnodeCacheShard::_pin
Neha - is it OK to backport this to Octopus now? Nathan Cutler

07/17/2020

05:35 PM Bug #38554: ObjectStore/StoreTestSpecificAUSize.TooManyBlobsTest/2 fail, Expected: (res_stat.allo...
Igor, I am seeing this failure on latest nautilus.... Neha Ojha
12:58 PM Bug #46490: osds crashing during deep-scrub
... Sorry, the preceding "921747443:" in line 6 is just a remnant line number of the grep -n I did initally and forgo... Lawrence Smith
12:52 PM Bug #46490: osds crashing during deep-scrub
The output for grep -e "_verify_csum bad" -e "fsck error" on the log file is:... Lawrence Smith
12:47 PM Bug #46490: osds crashing during deep-scrub
I'm sorry, i wasn't clear in that. Yes, this is the only output. (The line "-9999>[...]" appears only in the console ... Lawrence Smith
10:45 AM Bug #46490: osds crashing during deep-scrub
Lawrence Smith wrote:
> The whole logfile is 60G. The file 'osd-164-fsck.out.gz' that I uploaded with the last updat...
Igor Fedotov
12:22 PM Backport #46599 (In Progress): octopus: Rescue procedure for extremely large bluefs log
Nathan Cutler
12:22 PM Backport #46599 (Resolved): octopus: Rescue procedure for extremely large bluefs log
https://github.com/ceph/ceph/pull/36123
Nathan Cutler
12:17 PM Backport #46598 (Resolved): luminous: Rescue procedure for extremely large bluefs log
Nathan Cutler
12:16 PM Backport #46598 (Resolved): luminous: Rescue procedure for extremely large bluefs log
https://github.com/ceph/ceph/pull/35776
(Note: this is not a cherry-pick from master. Rather, the master PR is bas...
Nathan Cutler
11:14 AM Backport #46584 (Resolved): octopus: os/bluestore: simplify Onode pin/unpin logic.
https://github.com/ceph/ceph/pull/36795 Nathan Cutler
11:13 AM Backport #45684 (Resolved): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_i...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/35404
m...
Nathan Cutler

07/16/2020

05:11 PM Backport #45684: nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/35404
merged
Yuri Weinstein
03:24 PM Bug #46575: os/bluestore: simplify Onode pin/unpin logic.
Should fix issues like https://tracker.ceph.com/issues/43147 that we've been seeing. Neha Ojha
03:23 PM Bug #46575 (Resolved): os/bluestore: simplify Onode pin/unpin logic.
We want to let this bake in master before backporting to octopus. Neha Ojha

07/15/2020

10:58 PM Bug #46490: osds crashing during deep-scrub
The whole logfile is 60G. The file 'osd-164-fsck.out.gz' that I uploaded with the last update is the console output o... Lawrence Smith
08:37 PM Bug #46490: osds crashing during deep-scrub
Hi Lawrence
looks like some data corruption (multiple objects referring to the same disk extent) which causes decomp...
Igor Fedotov
08:02 AM Bug #46490: osds crashing during deep-scrub
Hi Igor, thanks for looking into this.
You were right about the corrupted backtrace. Although select_prefer_bdef ca...
Lawrence Smith
09:28 PM Bug #43147: segv in LruOnodeCacheShard::_pin
/a/yuriw-2020-07-13_23:06:23-rados-wip-yuri5-testing-2020-07-13-1944-octopus-distro-basic-smithi/5224399 Neha Ojha
05:35 PM Bug #46552: Rescue procedure for extremely large bluefs log
Neha Ojha wrote:
> Octopus: -https://github.com/ceph/ceph/pull/36112-
https://github.com/ceph/ceph/pull/36123
> ...
Neha Ojha
04:18 PM Bug #46552 (Resolved): Rescue procedure for extremely large bluefs log
This feature was developed on luminous before being merged into master.
Luminous: https://github.com/ceph/ceph/pul...
Neha Ojha
09:51 AM Bug #46411: mimic: Disks associated to osds have small write io even on an idle ceph cluster
I have fixed this bug in pr: https://github.com/ceph/ceph/pull/36108 , can you help to have a review ? @Igor Fedotov... hongsong wu
08:51 AM Bug #46411: mimic: Disks associated to osds have small write io even on an idle ceph cluster
I am sorry, There is a problem with the pr link given above( This abnormal phenomenon is caused by the pr(https://git... hongsong wu
08:36 AM Bug #46411: mimic: Disks associated to osds have small write io even on an idle ceph cluster
hongsong wu wrote:
> Affected Versions: v12.2.12~~v12.2.13, v13.2.5 ~~ v13.2.10
hongsong wu

07/14/2020

03:31 AM Bug #46525 (Need More Info): osd crush
my env:
ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
CentOS Linux release 7....
伟杰 谭

07/13/2020

09:37 AM Bug #46490: osds crashing during deep-scrub
First of all IMO the Nautilus back trace isn't valid - there is no RocksDBBlueFSVolumeSelector::select_prefer_bdev ca... Igor Fedotov

07/11/2020

03:08 PM Bug #46490 (Need More Info): osds crashing during deep-scrub
During scrubbing osds from our 8+3 EC-pool seem to be randomly crashing with the backtrace:... Lawrence Smith
08:52 AM Backport #46009 (In Progress): octopus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentatio...
Nathan Cutler

07/10/2020

05:54 PM Bug #44880 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:50 PM Bug #38745: spillover that doesn't make sense
And also here in graphs you can see the bluefs db used is decreasing after slow used increases but still slow bytes i... Seena Fallah
12:58 PM Bug #38745: spillover that doesn't make sense
Thanks @Igor for your help again.
I saw a new behavior now and I don't see any level gets score 1.0 but ceph says th...
Seena Fallah
06:07 AM Backport #45426 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/34943
m...
Nathan Cutler

07/08/2020

07:23 PM Bug #46055: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
https://github.com/ceph/ceph/pull/34943 merged Yuri Weinstein
07:22 PM Backport #45426: octopus: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
https://github.com/ceph/ceph/pull/34943 merged Yuri Weinstein
09:59 AM Bug #46411: mimic: Disks associated to osds have small write io even on an idle ceph cluster
Adam, mind taking a look? Igor Fedotov
06:44 AM Bug #46411: mimic: Disks associated to osds have small write io even on an idle ceph cluster
Affected Versions: v12.2.12~~v12.2.13, v13.2.5 ~~ v13.2.9 hongsong wu
06:09 AM Bug #46411 (Rejected): mimic: Disks associated to osds have small write io even on an idle ceph c...
* 1.anomalies
When the ceph cluster is idle,you can found that the disks associated to osds have small write io e...
hongsong wu
09:33 AM Bug #46124: Potential race condition regression around new OSD flock()s
I'm able to reproduce the issue with src/objectstore/store_test and pretty pawerfull HW:
-10> 2020-07-08T09:28:28....
Igor Fedotov

07/06/2020

10:33 PM Bug #46124 (New): Potential race condition regression around new OSD flock()s
Neha Ojha
10:30 PM Bug #46124: Potential race condition regression around new OSD flock()s
@Neha Ojha:
I think the bug remains as real as it gets; I did not retract that this is a bug. With my last comment...
Niklas Hambuechen

07/03/2020

03:37 PM Backport #46350 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompressi...
https://github.com/ceph/ceph/pull/37373 Nathan Cutler

07/02/2020

02:09 PM Bug #46124 (Closed): Potential race condition regression around new OSD flock()s
Please feel free to reopen if you find a real bug somewhere. Neha Ojha
01:59 PM Bug #45613 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
Igor Fedotov

07/01/2020

12:56 PM Bug #46054 (Resolved): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion...
Kefu Chai

06/30/2020

09:45 PM Bug #44880: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Backporting note: needs to be backported together with follow-on fix. See the octopus backport PR and #45426 Nathan Cutler
09:44 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
backport tracked via https://tracker.ceph.com/issues/44880 Nathan Cutler
02:21 AM Bug #46055: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Being backported in https://github.com/ceph/ceph/pull/34943 Neha Ojha
02:20 AM Bug #46055 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Neha Ojha
04:48 PM Bug #46270: mimic:osd can not start
This just looks like bluefs is running out of space. Mimic is EOL, I'd recommend you to upgrade and report back if yo... Neha Ojha
06:40 AM Bug #46270 (Can't reproduce): mimic:osd can not start
My env:
[root@mon1 test]# ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
[r...
伟杰 谭

06/27/2020

02:52 PM Bug #46054 (Fix Under Review): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): A...
Kefu Chai
02:51 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
hi Adam, i am working on this issue. as i've run into it twice. and i feel obliged to fix it. as i failed to identify... Kefu Chai
08:18 AM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
/a//kchai-2020-06-27_07:37:00-rados-wip-kefu-testing-2020-06-27-1407-distro-basic-smithi/5183643 Kefu Chai

06/26/2020

05:35 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
/a/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/5176446 Neha Ojha

06/24/2020

08:43 PM Backport #46195 (Resolved): luminous: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/35776 Nathan Cutler
08:43 PM Backport #46194 (Resolved): nautilus: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/37948 Nathan Cutler
08:43 PM Backport #46193 (Resolved): octopus: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/36621 Nathan Cutler
08:43 PM Backport #46192 (Rejected): mimic: BlueFS replay log grows without end
Nathan Cutler
02:35 PM Bug #45903 (Pending Backport): BlueFS replay log grows without end
It will be good to get the fix into luminous and mimic for affected users. Neha Ojha
10:37 AM Backport #45682 (Resolved): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/35446
m...
Nathan Cutler

06/23/2020

08:12 PM Backport #45682: octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/35446
merged
Yuri Weinstein

06/21/2020

09:09 PM Bug #46124: Potential race condition regression around new OSD flock()s
> I suspect that Ceph starts other threads (using clone() on Linux) while the lock is held
Sorry, this should be t...
Niklas Hambuechen
05:27 PM Bug #46124: Potential race condition regression around new OSD flock()s
From the strace above, we can see that there's always a `close()` after a matching `flock()` within the same PID, so ... Niklas Hambuechen
01:53 PM Bug #46124: Potential race condition regression around new OSD flock()s
Another question:
Would it not be better to use OFD locks (Open File Description locks), that is via ...
Niklas Hambuechen
03:18 AM Bug #46124: Potential race condition regression around new OSD flock()s
In case it helps, here are `strace` invocations, each showing slightly different behaviour and error messages, that i... Niklas Hambuechen
03:08 AM Bug #46124: Potential race condition regression around new OSD flock()s
I did not experience that in Mimic. Niklas Hambuechen
03:07 AM Bug #46124 (Resolved): Potential race condition regression around new OSD flock()s
In #38150 and PR https://github.com/ceph/ceph/pull/26245, a new `flock()` approach was introuduced.
When I use `ce...
Niklas Hambuechen
03:08 AM Bug #38150: KernelDevice exclusive lock broken
I suspect this may have introduced a regression: #46124 Niklas Hambuechen
 

Also available in: Atom