Project

General

Profile

Activity

From 11/04/2018 to 12/03/2018

12/03/2018

11:32 PM Backport #37495 (In Progress): luminous: bluefs-bdev-expand aborts
Igor Fedotov
11:17 PM Backport #37495: luminous: bluefs-bdev-expand aborts
https://github.com/ceph/ceph/pull/25384 Igor Fedotov
08:36 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
Igor Fedotov wrote:
> May be benchmark this drive using FIO?
> And try to simulate the use pattern: mixed read + w...
Gavin Baker
08:35 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
Igor Fedotov wrote:
> BTW - do these drives/controllers have write caching enabled? May be try to disable if so? AFA...
Gavin Baker

12/01/2018

06:38 AM Backport #37494 (In Progress): mimic: bluefs-bdev-expand aborts
Nathan Cutler
06:37 AM Backport #37494 (Resolved): mimic: bluefs-bdev-expand aborts
https://github.com/ceph/ceph/pull/25348 Nathan Cutler
06:37 AM Backport #37495 (Resolved): luminous: bluefs-bdev-expand aborts
https://github.com/ceph/ceph/pull/25384 Nathan Cutler
06:37 AM Bug #37360 (Pending Backport): bluefs-bdev-expand aborts
Nathan Cutler

11/30/2018

06:49 PM Bug #37360: bluefs-bdev-expand aborts
mimic fix (which is completely different from Nautilus one as we don't backport main device expansion feature): https... Igor Fedotov
01:30 PM Bug #20236: bluestore: ObjectStore/StoreTestSpecificAUSize.Many4KWritesNoCSumTest/2 failure
I haven't seen this in a while.. have you? Sage Weil
01:29 PM Bug #26896 (Can't reproduce): store_test.cc: FAILED ObjectStore/StoreTest.Rename/2
Sage Weil

11/29/2018

08:22 PM Bug #23463 (Can't reproduce): src/os/bluestore/StupidAllocator.cc: 336: FAILED assert(rm.empty())
Sage Weil
08:21 PM Bug #25006 (Can't reproduce): bad csum during upgrade test
http://pulpito.ceph.com/sage-2018-11-29_15:08:26-upgrade:luminous-x-mimic-distro-basic-smithi/
Sage Weil
07:30 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
BTW - do these drives/controllers have write caching enabled? May be try to disable if so? AFAIR there were some talk... Igor Fedotov
07:19 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
May be benchmark this drive using FIO?
And try to simulate the use pattern: mixed read + write + fdatasync.
Igor Fedotov
05:25 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
[2041833.966145] INFO: task bstore_kv_sync:79243 blocked for more than 120 seconds.
[2041833.966148] "echo 0 > /proc...
Gavin Baker
05:24 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
I also less frequently get these dmesg errors. Not sure if they are related.
[2041833.966150] bstore_kv_sync D ff...
Gavin Baker
05:03 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
No not SMR, these drives are Seagate Exos 10TB Enterprise sata drives. We are seeing this behavior on multiple types ... Gavin Baker
03:31 PM Bug #36364: Bluestore OSD IO Hangs near Flush (flush in 90.330556)
The code is just timing fdatasync(2), so the problem is almost certainly going to be below ceph (kernel or hardware)
...
Sage Weil
03:29 PM Bug #36364 (Need More Info): Bluestore OSD IO Hangs near Flush (flush in 90.330556)
This flush time is suspiciously close to 90s (flush in 90.330556)...
These aren't SMR drives, right?
Sage Weil
03:54 PM Bug #36268 (Fix Under Review): Unable to recover from ENOSPC in BlueFS
https://github.com/ceph/ceph/pull/25132 Igor Fedotov
03:45 PM Bug #23120 (Can't reproduce): OSDs continously crash during recovery
Sage Weil
03:44 PM Bug #25207 (Can't reproduce): ceph-volume lvm create gives segmentation fault
Sage Weil
03:38 PM Bug #36284 (Duplicate): Bluestore might be hanging OSD
Sage Weil
03:35 PM Bug #36303 (Duplicate): luminous: 12.2.8 - FAILED assert(0 == "put on missing extent (nothing bef...
Josh Durgin
03:34 PM Bug #36331: FAILED ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixNoCsum/2 (zeros)
... Sage Weil
03:26 PM Bug #36455: BlueStore: ENODATA not fully handled
Sage Weil
03:19 PM Bug #36567 (Duplicate): Segmentation fault in BlueStore::Blob::discard_unallocated
Sage Weil
03:18 PM Bug #37090 (Can't reproduce): BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
I have a feeling this is caused by http://tracker.ceph.com/issues/36526, the fix for which is in 12.2.10.
Sage Weil
03:15 PM Bug #37282 (Need More Info): rocksdb: submit_transaction_sync error: Corruption: block checksum m...
Josh Durgin
02:58 PM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
Somewhat similar issue, may be useful as recovery guidance:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018...
Igor Fedotov
03:11 PM Bug #25001 (Can't reproduce): Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
I believe this is related to the SharedBLob refcounting bugs. See 7031addfe6fcd070df8c4c7b175f374bda77a671 and ff883... Sage Weil
03:06 PM Bug #25050 (Need More Info): osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
Josh Durgin
02:55 PM Bug #37360 (Fix Under Review): bluefs-bdev-expand aborts
Igor Fedotov
02:55 PM Bug #37360: bluefs-bdev-expand aborts
https://github.com/ceph/ceph/pull/25308 Igor Fedotov
09:11 AM Bug #32731 (Resolved): fsck: cid is improperly matched to oid
Nathan Cutler
09:11 AM Backport #36145 (Resolved): luminous: fsck: cid is improperly matched to oid
Nathan Cutler
01:07 AM Backport #36145: luminous: fsck: cid is improperly matched to oid
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24705
merged
Yuri Weinstein
09:09 AM Backport #36638 (Resolved): luminous: rename does not old ref to replacement onode at old name
Nathan Cutler
01:04 AM Backport #36638: luminous: rename does not old ref to replacement onode at old name
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/24989
merged
Yuri Weinstein
06:14 AM Bug #24439 (Resolved): os/bluestore/BlueStore.cc: 1025: FAILED assert(buffer_bytes >= b->length) ...
Nathan Cutler
06:14 AM Backport #26943 (Resolved): luminous: os/bluestore/BlueStore.cc: 1025: FAILED assert(buffer_bytes...
Nathan Cutler
01:04 AM Backport #26943: luminous: os/bluestore/BlueStore.cc: 1025: FAILED assert(buffer_bytes >= b->leng...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24992
merged
Yuri Weinstein
04:28 AM Backport #36639 (In Progress): mimic: rename does not old ref to replacement onode at old name
https://github.com/ceph/ceph/pull/25313 Prashant D
01:08 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
https://github.com/ceph/ceph/pull/24649 mergedhttps://github.com/ceph/ceph/pull/24649 Yuri Weinstein

11/26/2018

11:41 PM Backport #36754 (Resolved): mimic: _aio_log_start inflight overlap of 0x10000~1000 with [65536~4096]
Nathan Cutler
08:49 PM Backport #36754: mimic: _aio_log_start inflight overlap of 0x10000~1000 with [65536~4096]
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25062
merged
Yuri Weinstein

11/22/2018

11:25 AM Bug #37360: bluefs-bdev-expand aborts
Got it. Thanks, Mark!
So as I said before main device resize isn't supported at the moment.
Will probably start a...
Igor Fedotov
11:10 AM Bug #37360: bluefs-bdev-expand aborts
I decided to enlarge OSD backing store device to be able to store more data on this OSD without re-creating it.
Se...
Марк Коренберг
10:17 AM Bug #37360: bluefs-bdev-expand aborts
Actually there are 2 aspects for this ticket:
1) the tool improperly handles OSD deployments that lack DB and/or WAL...
Igor Fedotov
09:34 AM Bug #37360 (In Progress): bluefs-bdev-expand aborts
Igor Fedotov
09:04 AM Bug #37360: bluefs-bdev-expand aborts
Problem is still triggered every time. Марк Коренберг
09:04 AM Bug #37360: bluefs-bdev-expand aborts
... Марк Коренберг
09:03 AM Bug #37360: bluefs-bdev-expand aborts
... Марк Коренберг
08:46 AM Bug #37360: bluefs-bdev-expand aborts
Wondering if bluefs-bdev-sizes command works fine? What's about fsck? Igor Fedotov

11/21/2018

09:35 PM Bug #37360 (Resolved): bluefs-bdev-expand aborts
root@node1:~# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-16
infering bluefs devices from b...
Марк Коренберг

11/16/2018

05:31 PM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
I have checked the kernel log and smartctl and do not see any errors. Jeff Smith
09:48 AM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
Firstly I suggest to verify the disk drive behind DB volume for physical errors. Igor Fedotov
05:28 AM Bug #37282 (Need More Info): rocksdb: submit_transaction_sync error: Corruption: block checksum m...
I have an OSD that will not start. It keep crashing. Not sure where to go from here. Unfortunately, it happened ri... Jeff Smith

11/14/2018

09:13 PM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Kjetil Joergensen wrote:
> Kjetil Joergensen wrote:
> > Ok - I think you can close this one. This is in all likelih...
Kjetil Joergensen
08:56 PM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Kjetil Joergensen wrote:
> Ok - I think you can close this one. This is in all likelihood a hardware error of some s...
Kjetil Joergensen
08:41 PM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Ok - I think you can close this one. This is in all likelihood a hardware error of some sort, on the same machine I h... Kjetil Joergensen
06:11 PM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Log posted with ceph-upload-file: fbc90b08-887d-40b9-99b9-0a843465a313
Console output below...
Kjetil Joergensen
09:47 AM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Could you please run fsck on this OSD with "debug bluestore" set to 20 and share the log? Igor Fedotov

11/13/2018

07:49 PM Bug #37090: BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Part of the osd log, should incude the first crash and maybe a couple of the subsequent ones, to make it fit within t... Kjetil Joergensen
07:27 PM Bug #37090 (Can't reproduce): BlueStore.cc: 3099: FAILED assert(0 == "uh oh, missing shared_blob")
Possibly a duplicate of #36303
What is slightly interesting, after setting the osd out and migrating off of it, it...
Kjetil Joergensen
06:38 PM Backport #36641 (Need More Info): mimic: Unable to recover from ENOSPC in BlueFS
Igor writes in the parent issue: "In fact previously mentioned PR is just a workaround to be able to manually fix the... Nathan Cutler
06:37 PM Backport #36640 (Need More Info): luminous: Unable to recover from ENOSPC in BlueFS
Igor writes in the parent issue: "In fact previously mentioned PR is just a workaround to be able to manually fix the... Nathan Cutler
10:36 AM Bug #36268 (In Progress): Unable to recover from ENOSPC in BlueFS
In fact previously mentioned PR is just a workaround to be able to manually fix the issue.
Working on the actual sol...
Igor Fedotov

11/12/2018

06:16 PM Backport #36755 (In Progress): luminous: _aio_log_start inflight overlap of 0x10000~1000 with [65...
Jonathan Brielmaier
04:26 PM Backport #36754 (In Progress): mimic: _aio_log_start inflight overlap of 0x10000~1000 with [65536...
Jonathan Brielmaier

11/10/2018

08:54 AM Backport #36755 (Rejected): luminous: _aio_log_start inflight overlap of 0x10000~1000 with [65536...
https://github.com/ceph/ceph/pull/25064 Nathan Cutler
08:54 AM Backport #36754 (Resolved): mimic: _aio_log_start inflight overlap of 0x10000~1000 with [65536~4096]
https://github.com/ceph/ceph/pull/25062 Nathan Cutler

11/08/2018

11:04 PM Bug #36606 (Resolved): osd: checksum failure during upgrade test
Igor Fedotov
11:04 PM Bug #36606: osd: checksum failure during upgrade test
Sage, no, it's specific to Nautilus for now. We need it when/if we backport BlueFS migrate stuff. Igor Fedotov
10:28 PM Bug #36606 (Pending Backport): osd: checksum failure during upgrade test
Igor, we should backport this, right? Sage Weil
10:29 PM Bug #36625 (Pending Backport): _aio_log_start inflight overlap of 0x10000~1000 with [65536~4096]
Sage Weil
01:56 PM Backport #26943 (In Progress): luminous: os/bluestore/BlueStore.cc: 1025: FAILED assert(buffer_by...
Jonathan Brielmaier
09:53 AM Backport #36638 (In Progress): luminous: rename does not old ref to replacement onode at old name
Jonathan Brielmaier

11/06/2018

03:37 PM Bug #36606: osd: checksum failure during upgrade test
https://github.com/ceph/ceph/pull/24948 Igor Fedotov
01:45 PM Bug #36606 (Fix Under Review): osd: checksum failure during upgrade test
Igor Fedotov
01:28 PM Bug #36606 (In Progress): osd: checksum failure during upgrade test
Igor Fedotov

11/05/2018

10:27 PM Bug #36526 (Resolved): segv in BlueStore::OldExtent::create
Nathan Cutler
10:26 PM Backport #36591 (Resolved): luminous: segv in BlueStore::OldExtent::create
Nathan Cutler
10:08 PM Backport #36591: luminous: segv in BlueStore::OldExtent::create
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24746
merged
Yuri Weinstein
 

Also available in: Atom