Project

General

Profile

Activity

From 03/06/2018 to 04/04/2018

04/04/2018

07:49 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Scratch that, mark didn't hit the assert for num_readers ==0 , and the core indicates file isn't deleted.
_read_ra...
Sage Weil
03:40 PM Bug #22678: block checksum mismatch from rocksdb
I have similar issue with two OSDs (12.2.4) running on the same host. Recreating OSDs did not have any effect, I get ... Sergey Malinin
03:12 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
Francisco,
thanks for the update, very appreciated.
Curious if you can collect a log for the crushing OSD, with d...
Igor Fedotov
02:53 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
I disabled the compression for a while and no OSD's get the error, after enabled it again they back to get the proble... Francisco Freire

04/03/2018

09:36 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Current theory: bluefs is not protecting against a file open for read that is deleted. Mark observes that he sees th... Sage Weil

04/02/2018

11:50 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
Yeah. The whole cluster has compression enabled Francisco Freire
09:56 PM Bug #23540: FAILED assert(0 == "can't mark unloaded shard dirty") with compression enabled
Hi Francisco,
wondering if you have compression enabled for any of your pools or the whole bluestore?
Igor Fedotov
08:22 PM Bug #23540 (Resolved): FAILED assert(0 == "can't mark unloaded shard dirty") with compression ena...
We are using the latest ceph luminous version (12.2.4), and we have a SATA pool tiered by an SSD pool. All using blue... Francisco Freire
08:25 AM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
We have also had this fault a number of times.
This was during a migration to bluestore - so we were backfilling for...
Shane Voss

03/29/2018

06:38 PM Bug #23040 (Resolved): bluestore: statfs available can go negative
Nathan Cutler
06:38 PM Backport #23074 (Resolved): luminous: bluestore: statfs available can go negative
Nathan Cutler
01:19 PM Backport #23074: luminous: bluestore: statfs available can go negative
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20554
merged
Yuri Weinstein
12:43 PM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
Igor Fedotov
08:17 AM Bug #23141 (Resolved): BlueFS reports rotational journals if BDEV_WAL is not set
Nathan Cutler
08:17 AM Backport #23173 (Resolved): luminous: BlueFS reports rotational journals if BDEV_WAL is not set
Nathan Cutler

03/28/2018

10:27 PM Backport #23173: luminous: BlueFS reports rotational journals if BDEV_WAL is not set
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20651
merged
Yuri Weinstein
04:14 AM Backport #23226 (In Progress): luminous: bluestore_cache_data uses too much memory
https://github.com/ceph/ceph/pull/21059 Prashant D

03/26/2018

03:29 PM Bug #23463 (Can't reproduce): src/os/bluestore/StupidAllocator.cc: 336: FAILED assert(rm.empty())
The ceph-volume nightly tests have seen this failure on one run so far (March 25th) with 2 out of 6 OSDs deployed. We... Alfredo Deza
03:58 AM Bug #23459: BlueStore kv_sync_thread() crash
crash dump attached Alex Gorbachev
03:55 AM Bug #23459 (Can't reproduce): BlueStore kv_sync_thread() crash
2018-03-25 06:49:02.894926 7ff4fdc97700 -1 *** Caught signal (Aborted) **
in thread 7ff4fdc97700 thread_name:bstore...
Alex Gorbachev

03/22/2018

08:21 PM Documentation #23443 (Resolved): doc: object -> file -> disk is wrong for bluestore
http://docs.ceph.com/docs/master/architecture/#storing-data
object -> file -> disk
is wrong now (for bluesto...
Марк Коренберг
10:54 AM Bug #23372: osd: segfault
Nokia ceph-users wrote:
> We are having 5 node cluster with 5 mons and 120 OSDs.
>
> One of the OSD (osd.7) crash...
Nokia ceph-users

03/21/2018

08:45 PM Bug #23246 (Fix Under Review): [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Pull request: https://github.com/ceph/ceph/pull/20996. Radoslaw Zarzynski
04:33 PM Support #23433 (New): Ceph cluster doesn't start - ERROR: error creating empty object store in /d...
After running make vstart. When I try to start a ceph cluster with
@MON=3 OSD=1 MDS=1 MGR=1 RGW=1 ../src/vstart.sh ...
Neha Gupta

03/20/2018

11:28 PM Bug #23426: aio thread got No space left on device
Yeah, "the assertion came from _aio_t::get_return_value_":https://github.com/ceph/ceph/blob/820dac980e9416fe05998d50c... Radoslaw Zarzynski
10:33 PM Bug #23426: aio thread got No space left on device
might be dupe of #23333 Yuri Weinstein
10:32 PM Bug #23426 (Won't Fix): aio thread got No space left on device
Seems reproducible on all distros
Runs:
http://pulpito.ceph.com/teuthology-2018-03-20_05:02:01-smoke-master-tes...
Yuri Weinstein
10:08 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I'm seeing the same problem here.
When I get the notification about the deep scrub error, I don't need to do "repa...
Brian Marcotte
12:57 PM Bug #23333 (In Progress): bluestore: ENODATA on aio
> Mar 13 15:55:45 ceph02 kernel: [362540.919407] print_req_error: critical medium error, dev sde, sector 5245986552
...
Radoslaw Zarzynski

03/19/2018

09:25 PM Bug #23333: bluestore: ENODATA on aio
Robert Sander wrote:
> It's a little bit strange, but it seems to have healed itself.
Looking around I just fou...
Robert Sander
09:22 PM Bug #23333: bluestore: ENODATA on aio
Radoslaw Zarzynski wrote:
> Instead, could you please:
> * provide log with increased debug log levels (_debug_osd ...
Robert Sander
08:55 PM Bug #23333: bluestore: ENODATA on aio
Errata, the issue is related to _aio_t::get_return_value_, not to _io_getevents_. Our debug message is misleading. I'... Radoslaw Zarzynski
05:17 PM Bug #23333: bluestore: ENODATA on aio
Hi Robert,
thanks for providing the info! I've took a look on the implementation of _io_getevents_ in your kernel ...
Radoslaw Zarzynski
03:01 PM Bug #23333: bluestore: ENODATA on aio
Radoslaw Zarzynski wrote:
> Could you please provide more information about used OS and hardware? Especially kerne...
Robert Sander
01:50 PM Bug #23333 (Need More Info): bluestore: ENODATA on aio
This looks really interesting. The assertion failure came from the _io_getevents_ (called by one of the _bstore_aio_ ... Radoslaw Zarzynski
10:17 AM Bug #21259 (Fix Under Review): bluestore: segv in BlueStore::TwoQCache::_trim
https://github.com/ceph/ceph/pull/20956
Sage Weil

03/16/2018

02:45 PM Bug #23390 (Resolved): Identifying NVMe via PCI serial isn't sufficient (Bluestore/SPDK)
Hi,
the manual requires a serial to put in to be able to use the NVMe with spdk.
http://docs.ceph.com/docs/master/r...
Andreas Merk
02:42 PM Bug #23246 (In Progress): [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Sage Weil

03/15/2018

02:40 PM Bug #23266 (Won't Fix): "terminate called after throwing an instance of 'std::bad_alloc'" in upgr...
don't care about kraken Sage Weil
05:51 AM Bug #23372 (Can't reproduce): osd: segfault
We are having 5 node cluster with 5 mons and 120 OSDs.
One of the OSD (osd.7) crashed with following logs:...
Nokia ceph-users

03/14/2018

01:57 PM Bug #23333: bluestore: ENODATA on aio
Sage Weil
08:29 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi Eric,
I am trying to pinpoint whether problem is related to AIO. Have you tested on official ceph builds, or tri...
Adam Kupczyk

03/13/2018

04:27 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I am seeing this issue too.
We were running Proxmox 4.x with CEPH 12.2.2 until a few weeks ago, never had a problem....
Eric Blevins
02:05 PM Bug #23333 (Resolved): bluestore: ENODATA on aio
Since 3 days one of 18 BlueStore OSDs is constantly crashing:
2018-03-10 04:01:45.366202 mon.ceph01 mon.0 192.168....
Robert Sander

03/12/2018

10:40 AM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
Another one fired:
https://tracker.ceph.com/issues/23283
Igor Fedotov

03/10/2018

04:36 AM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
Well, thanks! What about compression ? Марк Коренберг

03/09/2018

08:12 PM Bug #22534 (Fix Under Review): Debian's bluestore *rocksdb* does not support neither fast CRC nor...
Pull requests:
* https://github.com/ceph/rocksdb/pull/35,
* https://github.com/ceph/ceph/pull/20825.
Radoslaw Zarzynski

03/08/2018

10:07 PM Bug #22977: High CPU load caused by operations on onode_map
Awesome, that fixed it, thanks :)... Paul Emmerich
01:07 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Ah, yeah I see that. Oops.
Corrected and I did set bdev_aio_max_queue_depth = 65536 aaand all OSD's are up!
Christoffer Lilja
12:44 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Hmm, that's surprising. Setting extreme values should cause *EINVAL* during *io_setup* because exceeding system limit... Radoslaw Zarzynski

03/07/2018

11:57 PM Bug #22977: High CPU load caused by operations on onode_map
Paul,
Just improved hashing. Please test.
https://shaman.ceph.com/builds/ceph/wip-22977-inspect-onode-map/
Adam Kupczyk
02:27 PM Bug #22977: High CPU load caused by operations on onode_map
... Paul Emmerich
02:23 PM Bug #22977: High CPU load caused by operations on onode_map
Yes, perf top now shows the new hash_helper struct as key in the table. Paul Emmerich
02:22 PM Bug #22977: High CPU load caused by operations on onode_map
Paul,
Have you been using latest builds? IDs 93123 - 93126 ?
This is the only build that actually tries to fix has...
Adam Kupczyk
01:56 PM Bug #22977: High CPU load caused by operations on onode_map
CPU load is ~30% lower than my "control group" now, but it's still pretty bad (as in: > 90% of the CPU time is spent ... Paul Emmerich
08:53 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Hi,
Thank you for looking into this.
I'd set dev_aio_max_queue_depth higher and increased it until I reached 6871...
Christoffer Lilja
04:03 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Yeah, saw that. However, your _IOContext_ is so large it exceeds even that. :-)
I would try bigger values just as a ...
Radoslaw Zarzynski
03:16 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
As I mentioned in the initial bug report :-)
"I've tested with "dev_aio_max_queue_depth = 4096" as some gave as a wo...
Christoffer Lilja
03:16 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Radoslaw Zarzynski
02:48 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Thanks for taking the log once again! It looks the *IOContext* carried really huge number of operations:... Radoslaw Zarzynski
12:24 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Here you have a osd log with debug_bluestore=20 and debug_bdev=20:
https://drive.google.com/open?id=11oW6yAG0M6rdMSz...
Christoffer Lilja
12:17 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
It still occurs and I'll come back with the logs asap. Christoffer Lilja
12:09 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Could be that the kernel was just reasonably rejecting the requests because of HW issue.
I would need even more logs...
Radoslaw Zarzynski
06:59 AM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Now I remember that the sata controller that this drive was connected to had a glitch and took down the drives for a ... Christoffer Lilja
04:59 PM Bug #23266 (Won't Fix): "terminate called after throwing an instance of 'std::bad_alloc'" in upgr...
Out of memory ?
Run: http://pulpito.ceph.com/teuthology-2018-03-07_03:25:02-upgrade:kraken-x-luminous-distro-basic...
Yuri Weinstein
02:39 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
We didn't run any deep scrubs the last day or so due to a longer backfill, will report back later. Paul Emmerich
01:45 PM Bug #23251 (Rejected): ceph daemon osd.NNN slow_used_bytes and slow_total_bytes wrong?
BlueStore has a BlueFS rebalance feature that dynamically reserves some amount of space for BlueFS at 'slow' device -... Igor Fedotov
01:24 PM Bug #23251: ceph daemon osd.NNN slow_used_bytes and slow_total_bytes wrong?
Thanks for responding, I didn't realize that, thought from looking at code that it was used for data as well. You ca... Ben England
09:31 AM Bug #23251: ceph daemon osd.NNN slow_used_bytes and slow_total_bytes wrong?
"slow_total_bytes" and "slow_used_bytes" are under "BlueFS" section and denotes just a fraction of BlueStore block de... Igor Fedotov

03/06/2018

08:41 PM Bug #22977: High CPU load caused by operations on onode_map
Daniel,
Version for 12.2.4:
https://shaman.ceph.com/builds/ceph/wip-22977-inspect-onode-map-12-2-4/
Adam Kupczyk
03:53 PM Bug #22977: High CPU load caused by operations on onode_map
Thanks! I've updated one host and will report back tomorrow.
This is what the output looked like after ~20 hours w...
Paul Emmerich
03:31 PM Bug #22977: High CPU load caused by operations on onode_map
Adam, can you get me a build for 12.2.4? My results are pretty immediate.
Daniel Pryor
09:47 AM Bug #22977: High CPU load caused by operations on onode_map
Hi Paul,
An attempt to rectify hash problem is here:
https://shaman.ceph.com/builds/ceph/wip-22977-inspect-onode...
Adam Kupczyk
09:17 AM Bug #22977: High CPU load caused by operations on onode_map
Paul Emmerich wrote:
> Thank you!
>
> I'll run this on one OSD and report back tomorrow as it takes some time for...
Daniel Pryor
09:14 AM Bug #22977: High CPU load caused by operations on onode_map
Paul Emmerich wrote:
> Thank you!
>
> I'll run this on one OSD and report back tomorrow as it takes some time for...
Daniel Pryor
07:52 PM Bug #23251 (Rejected): ceph daemon osd.NNN slow_used_bytes and slow_total_bytes wrong?
version: ceph-osd-12.2.1-34.el7cp.x86_64 = RHCS 3.0z1
In trying to understand ceph daemon osd.NNN perf dump counte...
Ben England
07:50 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Hi,
Here comes a logfile with debug info, it's pretty big so I share it through Google Drive:
https://drive.googl...
Christoffer Lilja
06:36 PM Bug #23246 (Need More Info): [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Radoslaw Zarzynski
06:35 PM Bug #23246: [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Yeah, looks like _io_submit()_ was constantly returning *EAGAIN* and the number of retries (16) has been exhausted. L... Radoslaw Zarzynski
03:08 PM Bug #23246 (Resolved): [OSD bug] KernelDevice.cc: 539: FAILED assert(r == 0)
Hi,
Got a bug at one of my OSD's, see a snippet below. Full logfile also attached.
I've tested with "dev_aio_max_...
Christoffer Lilja
01:50 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Packages with the paranoid checker "are available":https://shaman.ceph.com/builds/ceph/wip-bug22102-paranoid-checker-... Radoslaw Zarzynski
11:59 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
@Rory
I can't find find a call to _RocksDBStore::get()_ in the attached trace. The process died also because of _S...
Radoslaw Zarzynski
09:30 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi Paul,
Currently, I am continuing research on "crc 0x6706be76" issue. Its relations to deep-scrub errors will b...
Adam Kupczyk
 

Also available in: Atom