Project

General

Profile

Activity

From 04/23/2018 to 05/22/2018

05/22/2018

02:47 PM Bug #21480 (Pending Backport): bluestore: flush_commit is racy
Sage Weil
10:37 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Sage Weil wrote:
> Artemy: which cento srelease and kernel version is it?
CentOS 7.2 with custom kernel (a rebuil...
Artemy Kapitula

05/21/2018

04:18 PM Backport #23672: luminous: bluestore: ENODATA on aio
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21405
merged
Yuri Weinstein
04:17 PM Backport #23700: luminous: osd: KernelDevice.cc: 539: FAILED assert(r == 0)
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21407
merged
Yuri Weinstein
03:07 PM Bug #24211 (Fix Under Review): SharedBlob::put() racy
https://github.com/ceph/ceph/pull/22123 Sage Weil
03:06 PM Bug #24211 (Resolved): SharedBlob::put() racy
There is a narrow race possible:
A: lookup foo
A: put on foo
A: foo --nref == 0
B: lookup foo
B: put foo
B:...
Sage Weil
11:45 AM Backport #24154 (Resolved): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in roc...
Kefu Chai

05/18/2018

02:18 PM Bug #21480 (Fix Under Review): bluestore: flush_commit is racy
https://github.com/ceph/ceph/pull/22083 Sage Weil
01:40 PM Bug #21480: bluestore: flush_commit is racy
Hrm, this is used by peering,... Sage Weil
12:16 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Emmanuel Lacour wrote:
> Hi, seems we have same problem here. We just start using a new cluster and aw already 3 scr...
Emmanuel Lacour
09:58 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi, seems we have same problem here. We just start using a new cluster and aw already 3 scrub errors in one week, alw... Emmanuel Lacour

05/17/2018

06:09 AM Backport #24154 (In Progress): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in ...
Kefu Chai
06:09 AM Backport #24154 (Resolved): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in roc...
https://github.com/ceph/ceph/pull/22048 Kefu Chai
06:05 AM Bug #23653 (Resolved): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCac...
Kefu Chai
03:04 AM Bug #23653 (Fix Under Review): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb...
Kefu Chai
02:51 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
https://github.com/ceph/ceph/pull/22046 to drop the check for tcmalloc
https://github.com/facebook/rocksdb/pull/38...
Kefu Chai

05/16/2018

12:47 AM Bug #24051 (Resolved): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
Kefu Chai
12:46 AM Backport #24132 (Resolved): luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
Kefu Chai

05/15/2018

10:06 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Michael Prokop wrote:
> We have a memory upgrade scheduled for the cluster where we're running into this issue, on...
Michael Prokop
08:30 PM Backport #24132: luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21995
merged
Yuri Weinstein
06:21 AM Backport #24132 (Resolved): luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
https://github.com/ceph/ceph/pull/21995 Kefu Chai
12:58 PM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
we are now using centos 7.5 for building rpm. so we should drop this change in cmake.... Kefu Chai
06:22 AM Bug #24051 (Pending Backport): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
master: https://github.com/ceph/ceph/pull/20430 Kefu Chai

05/14/2018

09:36 PM Bug #23577 (Can't reproduce): Inconsistent PG refusing to deep-scrub or repair
David Zafman
04:20 PM Bug #23577: Inconsistent PG refusing to deep-scrub or repair
This took a month for our deep-scrub cycle to complete, but eventually scrubs started working on these PGs on their own. David Turner
02:42 PM Bug #23459 (Can't reproduce): BlueStore kv_sync_thread() crash
Sage Weil
02:41 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Artemy: which cento srelease and kernel version is it? Sage Weil
02:37 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
downgrading the priority here:
- not data loss, "just" osd crash
- appears to be an issue with the kernel, not ceph
Sage Weil
01:50 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
That can be a problem because I disabled the Compression and cluster is running in production. if I enable compressio... Yohay Azulay
09:34 AM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
Hi Yohay,
could you please collect a log for the crash with debug bluestore set to 20?
Igor Fedotov

05/11/2018

09:38 PM Bug #24051: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
I see it every time on @luminous@ PRs testing (see on @smithi123@)
It's reproducible and also failed as standalone...
Yuri Weinstein
11:01 AM Bug #24051: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
Mine isn't crashing.
Wondering if this is reproducible?
And is standalone bin/unittest_bluefs run failing?
Igor Fedotov
02:22 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
I had the same issue with 3 new clusters. compression set to FORCE, once I changed to none and restarted the whole cl... Yohay Azulay
10:29 AM Backport #22264: luminous: bluestore: db.slow used when db is not full
Sage Weil wrote:
> luminous cherry-pick is merged.
Just to clarify, the luminous commit is not a cherry-pick. The...
Nathan Cutler

05/10/2018

12:51 PM Bug #23459: BlueStore kv_sync_thread() crash
I have not seen this happen in 12.2.5 any more Alex Gorbachev
08:21 AM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
I don't find any bluefs and bluestore configuration documentions. There is an only a configuration page [http://docs.... Emre Eryilmaz

05/08/2018

09:09 PM Bug #24051 (Resolved): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
This was caught during PRs testing ctest run:... Yuri Weinstein

05/05/2018

01:48 PM Bug #23840 (Resolved): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
Nathan Cutler
01:48 PM Backport #23881 (Resolved): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.bloc...
Nathan Cutler

05/04/2018

05:17 PM Backport #23881: luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) =...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21740
merged
Yuri Weinstein

04/30/2018

02:13 PM Backport #23881 (In Progress): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.b...
https://github.com/ceph/ceph/pull/21740 Prashant D
11:18 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Thank Brian for the hint to rerun the deep scrub on the broken pg. It worked fine!
Previously I've been doing repa...
Dennis Björklund
10:09 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
FWIW, I got this error with checksum 0x6706be76 on 12.2.5. I upgraded a couple of days ago and the bug is still there... Dennis Björklund

04/28/2018

08:26 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I've managed to reproduce this on a test cluster but it's somewhat unreliable and took a few attempts.
1. fill tes...
Paul Emmerich

04/27/2018

11:48 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Oh, I just remembered something:
We did reduce the Bluestore cache size from 2 GB to 1 GB at around the same time ...
Paul Emmerich
07:18 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Sage Weil wrote:
> Paul: was swap ever enabled on the node(s) where you saw the issue?
Not being Paul but to ad...
Michael Prokop
07:39 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Sage Weil wrote:
> Artemy Kapitula wrote:
> > Sage Weil wrote:
> > > Artemy, is it possible the machine where you ...
Artemy Kapitula

04/26/2018

08:54 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
No, never. We initially deployed the OSDs with Filestore on kernel 4.9 and then switched to Bluestore and kernel 4.14. Paul Emmerich
06:56 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Paul Emmerich wrote:
> Update from the cluster where I saw this: the problem suddenly disappeared after a few seemin...
Sage Weil
02:48 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Artemy Kapitula wrote:
> Sage Weil wrote:
> > Artemy, is it possible the machine where you saw this was swapping?
...
Sage Weil
08:48 AM Backport #23881 (Resolved): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.bloc...
https://github.com/ceph/ceph/pull/21740 Nathan Cutler
06:28 AM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
I have met the same issue in Luminous. Enming Zhang
04:40 AM Bug #23840 (Pending Backport): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask(...
Kefu Chai

04/25/2018

12:49 PM Bug #23653 (Resolved): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCac...
Kefu Chai
03:59 AM Bug #23653 (Fix Under Review): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb...
- https://github.com/ceph/ceph/pull/21632
- https://github.com/ceph/rocksdb/pull/36
Kefu Chai
10:09 AM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
This change is working, I think it could be merge with master. Rafal Wadolowski

04/24/2018

10:11 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
https://github.com/ceph/ceph/pull/21629 Sage Weil
05:17 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
Moment of planned shutdown Rafal Wadolowski
05:12 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
What is interesting, it looks like the max read of file is limited to 0xffffffff.... Rafal Wadolowski
04:36 PM Bug #23840 (Resolved): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
We're using ceph 12.2.4, where db and wal are on seperate nvme.
After restart on some OSDs we see the following er...
Rafal Wadolowski
06:08 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Update from the cluster where I saw this: the problem suddenly disappeared after a few seemingly random changes to th... Paul Emmerich
03:52 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I'm having the same problem on a 3 nodes ceph cluster, all three nodes have swap enabled and used, not too much, but ... Marco Baldini
03:26 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
This bug is starting to sound like #22102, which *looks* like pread() is getting zeros.
Cristoph, does the machine...
Sage Weil
08:20 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Just a follow up: We downgraded the kernel end of last week, and didn't get the scrub error any more. While before, i... Christoph Glaubitz
03:30 PM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
Discussed on irc, it appears we can work around this by replacing the single aligned_alloc() call in rocksdb with pos... Josh Durgin
06:55 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
or we can notes this down as a known issue on RHEL7.5 and gperftools-libs 2.6.1-1. Kefu Chai
06:44 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
i was thinking about statically linking against tcmalloc, but seems it's a dead-end.
see https://sourceware.org/bu...
Kefu Chai
03:28 PM Bug #23426 (Won't Fix): aio thread got No space left on device
this looks like a provisioning/test error, not a bug, if we're getting ENOSPC. Sage Weil
03:24 PM Bug #23426: aio thread got No space left on device
see remote/*/log/syslog/* Sage Weil

04/23/2018

10:13 PM Bug #23540 (Need More Info): FAILED assert(0 == "can't mark unloaded shard dirty") with compressi...
Francisco, any update? Sage Weil
03:09 AM Bug #23819 (Won't Fix): how to make compactions smooth
when I set db_path into ssd(12 osds db in sdy) ,I found the Speed of reading is too high when the compactions is runn... yuanli zhu
 

Also available in: Atom