Activity
From 04/23/2018 to 05/22/2018
05/22/2018
- 02:47 PM Bug #21480 (Pending Backport): bluestore: flush_commit is racy
- 10:37 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
- Sage Weil wrote:
> Artemy: which cento srelease and kernel version is it?
CentOS 7.2 with custom kernel (a rebuil...
05/21/2018
- 04:18 PM Backport #23672: luminous: bluestore: ENODATA on aio
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21405
merged - 04:17 PM Backport #23700: luminous: osd: KernelDevice.cc: 539: FAILED assert(r == 0)
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21407
merged - 03:07 PM Bug #24211 (Fix Under Review): SharedBlob::put() racy
- https://github.com/ceph/ceph/pull/22123
- 03:06 PM Bug #24211 (Resolved): SharedBlob::put() racy
- There is a narrow race possible:
A: lookup foo
A: put on foo
A: foo --nref == 0
B: lookup foo
B: put foo
B:... - 11:45 AM Backport #24154 (Resolved): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in roc...
05/18/2018
- 02:18 PM Bug #21480 (Fix Under Review): bluestore: flush_commit is racy
- https://github.com/ceph/ceph/pull/22083
- 01:40 PM Bug #21480: bluestore: flush_commit is racy
- Hrm, this is used by peering,...
- 12:16 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Emmanuel Lacour wrote:
> Hi, seems we have same problem here. We just start using a new cluster and aw already 3 scr... - 09:58 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Hi, seems we have same problem here. We just start using a new cluster and aw already 3 scrub errors in one week, alw...
05/17/2018
- 06:09 AM Backport #24154 (In Progress): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in ...
- 06:09 AM Backport #24154 (Resolved): mimic: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in roc...
- https://github.com/ceph/ceph/pull/22048
- 06:05 AM Bug #23653 (Resolved): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCac...
- 03:04 AM Bug #23653 (Fix Under Review): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb...
- 02:51 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
- https://github.com/ceph/ceph/pull/22046 to drop the check for tcmalloc
https://github.com/facebook/rocksdb/pull/38...
05/16/2018
- 12:47 AM Bug #24051 (Resolved): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- 12:46 AM Backport #24132 (Resolved): luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
05/15/2018
- 10:06 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Michael Prokop wrote:
> We have a memory upgrade scheduled for the cluster where we're running into this issue, on... - 08:30 PM Backport #24132: luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21995
merged - 06:21 AM Backport #24132 (Resolved): luminous: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- https://github.com/ceph/ceph/pull/21995
- 12:58 PM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
- we are now using centos 7.5 for building rpm. so we should drop this change in cmake....
- 06:22 AM Bug #24051 (Pending Backport): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- master: https://github.com/ceph/ceph/pull/20430
05/14/2018
- 09:36 PM Bug #23577 (Can't reproduce): Inconsistent PG refusing to deep-scrub or repair
- 04:20 PM Bug #23577: Inconsistent PG refusing to deep-scrub or repair
- This took a month for our deep-scrub cycle to complete, but eventually scrubs started working on these PGs on their own.
- 02:42 PM Bug #23459 (Can't reproduce): BlueStore kv_sync_thread() crash
- 02:41 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
- Artemy: which cento srelease and kernel version is it?
- 02:37 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
- downgrading the priority here:
- not data loss, "just" osd crash
- appears to be an issue with the kernel, not ceph - 01:50 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
- That can be a problem because I disabled the Compression and cluster is running in production. if I enable compressio...
- 09:34 AM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
- Hi Yohay,
could you please collect a log for the crash with debug bluestore set to 20?
05/11/2018
- 09:38 PM Bug #24051: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- I see it every time on @luminous@ PRs testing (see on @smithi123@)
It's reproducible and also failed as standalone... - 11:01 AM Bug #24051: "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- Mine isn't crashing.
Wondering if this is reproducible?
And is standalone bin/unittest_bluefs run failing?
- 02:22 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
- I had the same issue with 3 new clusters. compression set to FORCE, once I changed to none and restarted the whole cl...
- 10:29 AM Backport #22264: luminous: bluestore: db.slow used when db is not full
- Sage Weil wrote:
> luminous cherry-pick is merged.
Just to clarify, the luminous commit is not a cherry-pick. The...
05/10/2018
- 12:51 PM Bug #23459: BlueStore kv_sync_thread() crash
- I have not seen this happen in 12.2.5 any more
- 08:21 AM Documentation #24075 (Resolved): Bluestore and Bluefs Config Reference
- I don't find any bluefs and bluestore configuration documentions. There is an only a configuration page [http://docs....
05/08/2018
- 09:09 PM Bug #24051 (Resolved): "122 - unittest_bluefs (OTHER_FAULT)" during ctest run
- This was caught during PRs testing ctest run:...
05/05/2018
- 01:48 PM Bug #23840 (Resolved): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- 01:48 PM Backport #23881 (Resolved): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.bloc...
05/04/2018
- 05:17 PM Backport #23881: luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) =...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21740
merged
04/30/2018
- 02:13 PM Backport #23881 (In Progress): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.b...
- https://github.com/ceph/ceph/pull/21740
- 11:18 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Thank Brian for the hint to rerun the deep scrub on the broken pg. It worked fine!
Previously I've been doing repa... - 10:09 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- FWIW, I got this error with checksum 0x6706be76 on 12.2.5. I upgraded a couple of days ago and the bug is still there...
04/28/2018
- 08:26 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- I've managed to reproduce this on a test cluster but it's somewhat unreliable and took a few attempts.
1. fill tes...
04/27/2018
- 11:48 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Oh, I just remembered something:
We did reduce the Bluestore cache size from 2 GB to 1 GB at around the same time ... - 07:18 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Sage Weil wrote:
> Paul: was swap ever enabled on the node(s) where you saw the issue?
Not being Paul but to ad... - 07:39 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
- Sage Weil wrote:
> Artemy Kapitula wrote:
> > Sage Weil wrote:
> > > Artemy, is it possible the machine where you ...
04/26/2018
- 08:54 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- No, never. We initially deployed the OSDs with Filestore on kernel 4.9 and then switched to Bluestore and kernel 4.14.
- 06:56 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Paul Emmerich wrote:
> Update from the cluster where I saw this: the problem suddenly disappeared after a few seemin... - 02:48 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
- Artemy Kapitula wrote:
> Sage Weil wrote:
> > Artemy, is it possible the machine where you saw this was swapping?
... - 08:48 AM Backport #23881 (Resolved): luminous: Bluestore OSD hit assert((log_reader->buf.pos & ~super.bloc...
- https://github.com/ceph/ceph/pull/21740
- 06:28 AM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- I have met the same issue in Luminous.
- 04:40 AM Bug #23840 (Pending Backport): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask(...
04/25/2018
- 12:49 PM Bug #23653 (Resolved): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCac...
- 03:59 AM Bug #23653 (Fix Under Review): tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb...
- - https://github.com/ceph/ceph/pull/21632
- https://github.com/ceph/rocksdb/pull/36 - 10:09 AM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- This change is working, I think it could be merge with master.
04/24/2018
- 10:11 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- https://github.com/ceph/ceph/pull/21629
- 05:17 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- Moment of planned shutdown
- 05:12 PM Bug #23840: Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- What is interesting, it looks like the max read of file is limited to 0xffffffff....
- 04:36 PM Bug #23840 (Resolved): Bluestore OSD hit assert((log_reader->buf.pos & ~super.block_mask()) == 0)
- We're using ceph 12.2.4, where db and wal are on seperate nvme.
After restart on some OSDs we see the following er... - 06:08 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Update from the cluster where I saw this: the problem suddenly disappeared after a few seemingly random changes to th...
- 03:52 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- I'm having the same problem on a 3 nodes ceph cluster, all three nodes have swap enabled and used, not too much, but ...
- 03:26 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- This bug is starting to sound like #22102, which *looks* like pread() is getting zeros.
Cristoph, does the machine... - 08:20 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Just a follow up: We downgraded the kernel end of last week, and didn't get the scrub error any more. While before, i...
- 03:30 PM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
- Discussed on irc, it appears we can work around this by replacing the single aligned_alloc() call in rocksdb with pos...
- 06:55 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
- or we can notes this down as a known issue on RHEL7.5 and gperftools-libs 2.6.1-1.
- 06:44 AM Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCac...
- i was thinking about statically linking against tcmalloc, but seems it's a dead-end.
see https://sourceware.org/bu... - 03:28 PM Bug #23426 (Won't Fix): aio thread got No space left on device
- this looks like a provisioning/test error, not a bug, if we're getting ENOSPC.
- 03:24 PM Bug #23426: aio thread got No space left on device
- see remote/*/log/syslog/*
04/23/2018
- 10:13 PM Bug #23540 (Need More Info): FAILED assert(0 == "can't mark unloaded shard dirty") with compressi...
- Francisco, any update?
- 03:09 AM Bug #23819 (Won't Fix): how to make compactions smooth
- when I set db_path into ssd(12 osds db in sdy) ,I found the Speed of reading is too high when the compactions is runn...
Also available in: Atom