Project

General

Profile

Activity

From 09/29/2017 to 10/28/2017

10/26/2017

09:14 PM Backport #21544 (Resolved): luminous: mon osd feature checks for osdmap flags and require-osd-rel...
Sage Weil
09:13 PM Backport #21693 (Resolved): luminous: interval_map.h: 161: FAILED assert(len > 0)
Sage Weil
12:10 AM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
Logs and coredump saved in: teuthology:/home/pdonnell/1773350 Patrick Donnelly

10/25/2017

10:16 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
Dead OSD is accessible at smithi171 as of now.
http://pulpito.ceph.com/pdonnell-2017-10-25_18:05:03-kcephfs-wip-pd...
Patrick Donnelly
10:15 PM Bug #21931 (Resolved): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range...
... Patrick Donnelly
06:22 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

Wei Jin,
According to the inconsistent output, the object info said the object size is 3461120, but all the shar...
David Zafman
10:19 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I also ran into the same issue and fixed it by rados get/put command.
The inconsistent info is not exactly the same ...
wei jin
09:07 AM Bug #21925 (Need More Info): cluster capacity is much more smaller than it should be
Hi all,
I build a ceph cluster on a single host, with 1 mon, 3 osds. One osd created on a file path, two osds crea...
Jing Li
08:14 AM Backport #21924 (Resolved): luminous: ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic...
Nathan Cutler
08:14 AM Backport #21923 (Resolved): jewel: Objecter::C_ObjectOperation_sparse_read throws/catches excepti...
https://github.com/ceph/ceph/pull/18743 Nathan Cutler
08:14 AM Backport #21922 (Resolved): luminous: Objecter::C_ObjectOperation_sparse_read throws/catches exce...
https://github.com/ceph/ceph/pull/18744 Nathan Cutler
08:13 AM Backport #21921 (Resolved): luminous: Objecter::_send_op unnecessarily constructs costly hobject_t
https://github.com/ceph/ceph/pull/18745 Nathan Cutler
05:55 AM Bug #21880 (Resolved): ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap enabled
Kefu Chai
02:36 AM Bug #21878 (Resolved): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode.ino != 1)
Sage Weil
02:35 AM Bug #21766 (Resolved): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec...
Sage Weil

10/24/2017

08:16 PM Bug #21907 (Resolved): On pg repair the primary is not favored as was intended

The commit cd0d8b0 was supposed to favor the primary as the authoritative copy, but did the opposite.
David Zafman
12:11 PM Feature #21902 (New): Support bytearray in python binding
This will avoid some memory copy to prepare data for rados.
But this requires cython>=0.20, see
https://github....
Mehdi Abaakouk
06:33 AM Bug #21847: osd frequently been marked down and up

Hi sage , this same issue happened again, is there any any possible reason for this issue ?
how should i go forwa...
dongdong tao
03:53 AM Bug #21573 (Resolved): [upgrade] buffer::list ABI broken in luminous release
Kefu Chai
03:53 AM Backport #21899 (Resolved): luminous: [upgrade] buffer::list ABI broken in luminous release
https://github.com/ceph/ceph/pull/18491 Kefu Chai
03:52 AM Backport #21899 (Resolved): luminous: [upgrade] buffer::list ABI broken in luminous release
https://github.com/ceph/ceph/pull/18491 Kefu Chai
03:30 AM Bug #21878 (Pending Backport): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode...
Sage Weil
02:35 AM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
backport https://github.com/ceph/ceph/pull/18501 Sage Weil
02:34 AM Bug #21766 (Pending Backport): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.en...
Sage Weil

10/23/2017

03:53 PM Bug #21573 (Pending Backport): [upgrade] buffer::list ABI broken in luminous release
Sage Weil
03:31 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
Ken Dreyer wrote:
> How should we indicate that PR 18418 needs to go into Luminous (v12.2.2?)
Using the magic for...
Jason Dillaman
03:00 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
Jason Dillaman wrote:
> I posted PR https://github.com/ceph/ceph/pull/18418 as a temporary workaround for clients. I...
Ken Dreyer

10/22/2017

05:32 AM Bug #21847: osd frequently been marked down and up
from the dmesg, i found lots of "libceph: osd.6 down"
disk are all good. if you have never saw such kind weird log, ...
dongdong tao
03:17 AM Bug #21887 (Duplicate): degraded calculation is off during backfill
Sage Weil

10/21/2017

07:53 PM Bug #21887: degraded calculation is off during backfill
We should backport the fix to luminous. It is confusing/scary that the 'degraded' health warning comes up during a r... Sage Weil
04:23 PM Bug #21887 (Duplicate): degraded calculation is off during backfill
The PG is active+remapped+backfill_wait. There are 2 backfill targets, and 3 acting which are all up to date. There... Sage Weil
05:48 PM Bug #21750: scrub stat mismatch on bytes
Yeah, same here. https://github.com/ceph/ceph/pull/18396 was included in my run.
http://pulpito.ceph.com/sage-201...
Sage Weil
01:08 PM Bug #21750: scrub stat mismatch on bytes
Seeing more scrub-errors after https://github.com/ceph/ceph/pull/18396 is applied.
http://pulpito.ceph.com/sage-20...
xie xingguo
05:47 PM Bug #21844 (Pending Backport): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions ...
Sage Weil
05:45 PM Bug #21845 (Pending Backport): Objecter::_send_op unnecessarily constructs costly hobject_t
Sage Weil
04:16 PM Bug #20759: mon: valgrind detects a few leaks
/a/kchai-2017-10-21_09:27:38-rados-wip-kefu-testing-2017-10-21-1049-distro-basic-mira/1757648/remote/mira121/log/ceph... Kefu Chai
04:10 AM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
Sage Weil
04:09 AM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
Sage Weil
02:49 AM Bug #21880 (Fix Under Review): ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap en...
Kefu Chai

10/20/2017

09:33 PM Bug #21880: ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap enabled
https://github.com/ceph/ceph/pull/18452 Sage Weil
09:29 PM Bug #21880 (Resolved): ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap enabled
... Sage Weil
09:29 PM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Enabling fiemap makes FileStore's Synthetic/1 fail reliably:... Sage Weil
03:27 PM Bug #21825: OSD won't stay online and crashes with abort
Would you be interested in having a copy of the 2 GB PG which causes ceph-objectstore-tool to crash? Jérôme Poulin
03:25 PM Bug #21825: OSD won't stay online and crashes with abort
I did a quick check on my 4 hosts and jemalloc is not enabled. The cluster is now back to active+clean. Jérôme Poulin
02:25 PM Bug #21825: OSD won't stay online and crashes with abort
Can you confirm you're not using jemalloc (check /etc/{default,sysconfig}/ceph)?
Sage Weil
02:27 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
I posted PR https://github.com/ceph/ceph/pull/18418 as a temporary workaround for clients. I figured I would leave th... Jason Dillaman
02:19 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
Two options?
1. Just set debug ms = 0 by default for clients.
2. Fix the async msgr to not log the second message...
Sage Weil
02:17 PM Bug #21847 (Need More Info): osd frequently been marked down and up
Is there anything in 'dmesg' output? Maybe a bad disk? Sage Weil
01:54 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
Indeed -- in the OSDs. I only benchmarked the librbd clients under high IOPS workloads and that call is only executed... Jason Dillaman
01:46 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
we have a lots of "xx == hobject_t()" judgements in the codes... Haomai Wang
01:43 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
... I should also note that in unrelated "perf record" sessions for the "debug ms = 0/1" performance depredations, yo... Jason Dillaman
01:39 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
multiple runs of "perf record" didn't lie -- and neither did the fact that moving it increased performance by ~10% un... Jason Dillaman
01:36 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
of course, it's a good cleanup Haomai Wang
01:35 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
I really don't agree that this will cause 10% performance degraded... the construct should be nanoseconds level.. Haomai Wang
01:33 PM Bug #21845 (Fix Under Review): Objecter::_send_op unnecessarily constructs costly hobject_t
*PR*: https://github.com/ceph/ceph/pull/18427 Jason Dillaman
01:31 PM Bug #21845 (In Progress): Objecter::_send_op unnecessarily constructs costly hobject_t
Jason Dillaman
01:51 PM Bug #21878 (Fix Under Review): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode...
https://github.com/ceph/ceph/pull/18428 Sage Weil
01:29 PM Bug #21878 (Resolved): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode.ino != 1)
... Sage Weil
01:38 PM Bug #21842 (Resolved): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Kefu Chai
09:29 AM Backport #21872 (Resolved): jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
https://github.com/ceph/ceph/pull/20143 Nathan Cutler
09:29 AM Backport #21871 (Rejected): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
-https://github.com/ceph/ceph/pull/20448- Nathan Cutler
05:30 AM Bug #21573 (Fix Under Review): [upgrade] buffer::list ABI broken in luminous release
https://github.com/ceph/ceph/pull/18408 Kefu Chai
01:24 AM Backport #21693 (Fix Under Review): luminous: interval_map.h: 161: FAILED assert(len > 0)
Anonymous

10/19/2017

09:47 PM Bug #21204 (Resolved): DNS SRV default service name not used anymore
Nathan Cutler
09:43 PM Bug #21365 (Resolved): Daemons(OSD, Mon...) exit abnormally at injectargs command
Nathan Cutler
09:42 PM Backport #21343 (Resolved): luminous: DNS SRV default service name not used anymore
Sage Weil
09:40 PM Backport #21438 (Resolved): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
Sage Weil
01:37 PM Bug #21844 (Fix Under Review): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions ...
*PR*: https://github.com/ceph/ceph/pull/18400 Jason Dillaman
01:04 PM Bug #21844 (In Progress): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -E...
Jason Dillaman
01:56 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
Running RBD small IO performance tests against a mostly sparse image shows that the Objecter is throwing/catching a b... Jason Dillaman
01:30 PM Bug #21750: scrub stat mismatch on bytes
https://github.com/ceph/ceph/pull/18396 probably fixes this! Sage Weil
12:15 PM Backport #21783 (In Progress): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make...
Nathan Cutler
09:14 AM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
this would be a little bit tricky:... Kefu Chai
08:54 AM Backport #21150 (Fix Under Review): jewel: tests: btrfs copy_clone returns errno 95 (Operation no...
https://github.com/ceph/ceph/pull/18165 Kefu Chai
07:35 AM Bug #21842 (Fix Under Review): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Kefu Chai
07:34 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
> I can't figure out where creates "stat file: db". I guess we use this "stat file: db" as our dbname.
and it cons...
Kefu Chai
06:44 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Kefu, I know why repair failed.
rocksdb's Env imports a new member function called AreSameFiles. But our BlueRocks...
Chang Liu
06:31 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Chang, no worries. i am fixing it. Kefu Chai
03:36 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
I can't figure out where creates "stat file: db". I guess we use this "stat file: db" as our dbname. Chang Liu
03:13 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Rocksdb::RepairDB tries to find all files: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L168, then it... Chang Liu
02:14 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
Working on it Chang Liu
01:32 AM Bug #21842 (Resolved): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
... Kefu Chai
03:53 AM Bug #21847 (Need More Info): osd frequently been marked down and up
our ceph version is 10.2.5
we have encounter an issue that one of our osd has been marked down and up about 3 times ...
dongdong tao
02:41 AM Bug #21846 (Closed): Default ms log level results in ~40% performance degradation on RBD 4K rando...
Luminous is now 15% slower than Jewel and over 40% slower as compared to when the ms logs are disabled.
v10.2.10 d...
Jason Dillaman
02:25 AM Bug #21845 (Resolved): Objecter::_send_op unnecessarily constructs costly hobject_t
With zero backoffs, just constructing an hobject_t ("hobject_t hoid = op->target.get_hobj();") results in an approxim... Jason Dillaman

10/18/2017

11:30 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts

In the context of the newly created PGs:
pg[10.5a5s3( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0...
David Zafman
09:12 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
... Greg Farnum
09:46 PM Bug #21825: OSD won't stay online and crashes with abort
I had a chance to try and rm osd 3 today and replace the hard disk with a new one, no crash so far, it is rebalancing... Jérôme Poulin
06:26 AM Bug #21825: OSD won't stay online and crashes with abort
I think there is more to this, after active+clean, I shutdown osd.3 and then the PG went active+clean+snaptrim then o... Jérôme Poulin
05:11 AM Bug #21825: OSD won't stay online and crashes with abort
After tempering around with OSD kill and starting many, marking lost and unfound, I finally was able to recover all b... Jérôme Poulin
04:26 AM Bug #21825: OSD won't stay online and crashes with abort
You should bump up the OSD logging to see more of what is happening. David Zafman
03:33 AM Bug #21825 (Closed): OSD won't stay online and crashes with abort
I have an issue where 2 OSDs can't stay up at the same time and one will crash the other causing down PGs,
Exporti...
Jérôme Poulin
05:36 PM Bug #20243: Improve size scrub error handling and ignore system attrs in xattr checking

If we wanted to backport to Jewel it would be helpful to include this pull request first.
https://github.com/cep...
David Zafman

10/17/2017

09:28 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
... Sage Weil
08:41 PM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
@Kefu can you pls take a look? Yuri Weinstein
08:40 PM Backport #21544 (Fix Under Review): luminous: mon osd feature checks for osdmap flags and require...
Anonymous
08:20 PM Backport #21544 (In Progress): luminous: mon osd feature checks for osdmap flags and require-osd-...
Anonymous
07:03 PM Backport #21543 (Fix Under Review): luminous: bluestore fsck took 224.778802 seconds to complete ...
Anonymous
06:58 PM Backport #21543 (In Progress): luminous: bluestore fsck took 224.778802 seconds to complete which...
Anonymous
06:40 PM Feature #21760: add tools to stress RADOS omap
https://github.com/ceph/ceph/pull/18361 Douglas Fuller
05:29 PM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
Chang Liu
12:41 PM Bug #19198 (Closed): Bluestore doubles mem usage when caching object content
I talked to Igor. It seems this is really is a non-bug, as the UTs use the glibc allocator. A follow-up will be to us... Mohamad Gebai
04:56 AM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
... Kefu Chai
02:50 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
same problem: http://tracker.ceph.com/issues/21174 huang jun

10/16/2017

11:29 PM Bug #20981: ./run_seed_to_range.sh errored out
I was never able to reproduce this with the following command line test.
rm -rf /tmp/td td ; mkdir /tmp/td td ; cd...
David Zafman
09:12 PM Bug #18162 (Fix Under Review): osd/ReplicatedPG.cc: recover_replicas: object added to missing set...
https://github.com/ceph/ceph/pull/18145 David Zafman
06:41 AM Bug #20053: crush compile / decompile looses precision on weight
Loïc Dachary

10/13/2017

08:48 PM Bug #21750 (In Progress): scrub stat mismatch on bytes
Sage Weil
08:48 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
Sage Weil
05:15 PM Bug #21716 (Pending Backport): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu Chai
05:15 PM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu Chai
12:13 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
Nathan Cutler
12:13 PM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
https://github.com/ceph/ceph/pull/21184 Nathan Cutler
12:13 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
Nathan Cutler
12:13 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
https://github.com/ceph/ceph/pull/21158 Nathan Cutler
12:12 PM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
https://github.com/ceph/ceph/pull/18398 Nathan Cutler
04:11 AM Bug #21603 (Resolved): rocksdb is using slow crc
Kefu Chai
03:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
David Zafman

10/12/2017

08:08 PM Bug #21737 (Pending Backport): OSDMap cache assert on shutdown
Greg Farnum
05:32 PM Feature #21760 (In Progress): add tools to stress RADOS omap
Douglas Fuller
04:16 PM Bug #21750: scrub stat mismatch on bytes
http://pulpito.front.sepia.ceph.com/yuriw-2017-10-11_19:25:41-rados-wip-yuri3-testing-2017-10-11-1645-distro-basic-sm... Sage Weil
08:26 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
i have also met this problem when testing pull out disk and insert; ceph version 0.94.5,according @huang jun's osd lo... mingyue zhao
05:01 AM Bug #21603 (Fix Under Review): rocksdb is using slow crc
https://github.com/ceph/ceph/pull/18262 Kefu Chai
04:43 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
... Kefu Chai
12:46 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu, thanks for fixing this. Can you also indicate which of the mentioned PRs need to be backported to fix the test ... Nathan Cutler

10/11/2017

09:49 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
problem seems to be that the unsharing code isn't handling compressed extents properly.
https://github.com/ceph/ce...
Sage Weil
09:47 PM Bug #21766 (Resolved): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec...
... Sage Weil
05:20 PM Bug #21331 (Resolved): pg recovery priority inversion
https://github.com/ceph/ceph/pull/18025 is luminous backport
Sage Weil
05:19 PM Bug #21417 (Resolved): buffer_anon leak during deep scrub (on otherwise idle osd)
Sage Weil
01:49 PM Feature #21760: add tools to stress RADOS omap
I had a discussion with Douglas and in the current implementation, we can enhance following points:
1. Adding --he...
Vikhyat Umrao
01:37 PM Feature #21760 (In Progress): add tools to stress RADOS omap
Add the tools omap_create and omap_delete to stress the RADOS object map directly. Douglas Fuller
01:45 PM Bug #21758 (Pending Backport): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
Sage Weil
09:51 AM Bug #21758 (Fix Under Review): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
https://github.com/ceph/ceph/pull/18242 Kefu Chai
09:49 AM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
... Kefu Chai
09:37 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
https://github.com/ceph/ceph/pull/18241 huang jun
06:13 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
comment out in ceph.conf
#osd copyfrom max chunk = 524288
if we use this config, it works fine.
but if we comment ...
huang jun
06:01 AM Bug #21756 (New): /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i....
steps to reproduce:... huang jun
08:09 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
https://github.com/ceph/ceph/pull/18240 Kefu Chai
07:50 AM Bug #21757 (New): snapshotted RBD objects can't be automatically evicted from a cache tier when c...
[enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64
[procedure to ...
Xiaojun Liao
02:26 AM Bug #21750: scrub stat mismatch on bytes
/a/sage-2017-10-10_20:19:10-rados-wip-sage-testing2-2017-10-10-1320-distro-basic-smithi/1723818
rados/thrash/{0-size...
Sage Weil

10/10/2017

06:17 PM Bug #21407 (Pending Backport): backoff causes out of order op
Sage Weil
01:50 PM Bug #21750 (Resolved): scrub stat mismatch on bytes
... Sage Weil
01:32 PM Bug #21744 (Fix Under Review): Core when `ceph-kvstore-tool exists`
https://github.com/ceph/ceph/pull/16745/commits/46bbd32fad14579f9260765a0cb9bcfe0ba7defa Chang Liu
09:10 AM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
http://pulpito.ceph.com/sage-2017-10-09_22:17:19-rados-wip-sage-testing2-2017-10-09-1528-distro-basic-smithi/1718563/... Chang Liu

10/09/2017

09:09 PM Bug #21737 (Fix Under Review): OSDMap cache assert on shutdown
https://github.com/ceph/ceph/pull/18201 Greg Farnum
08:19 PM Bug #21737 (Resolved): OSDMap cache assert on shutdown
We don't want users to hit asserts if we've leaked memory references on shutdown. For instance:... Greg Farnum
08:44 PM Feature #18206 (Resolved): osd: osd_scrub_during_recovery only considers primary, not replicas
Nathan Cutler
08:43 PM Backport #21117 (Resolved): jewel: osd: osd_scrub_during_recovery only considers primary, not rep...
Nathan Cutler
05:01 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
Kefu Chai
12:25 PM Documentation #21733 (In Progress): OSD-Config-ref(osd max object size) section malformed
https://github.com/ceph/ceph/pull/18188 Jos Collin
12:09 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
Syntax error in
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
at section osd max object...
Joshua Schmid
11:21 AM Bug #21717 (Resolved): doc fails build with latest breathe
Kefu Chai
11:21 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
Kefu Chai
06:44 AM Bug #21721 (Can't reproduce): ceph pg force-backfill cmd failed with ENOENT error
Command failed on mira025 with status 2: u'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage t... huang jun

10/08/2017

04:31 PM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
Kefu Chai
08:13 AM Backport #21719 (In Progress): luminous: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18167 Kefu Chai
08:11 AM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18167 Kefu Chai
08:15 AM Bug #21717: doc fails build with latest breathe
recently breathe introduced a change not compatible with old sphinx, see https://github.com/michaeljones/breathe/comm... Kefu Chai
08:09 AM Bug #21717 (Pending Backport): doc fails build with latest breathe
https://github.com/ceph/ceph/pull/17025 Kefu Chai
08:09 AM Bug #21717 (Resolved): doc fails build with latest breathe
... Kefu Chai
08:10 AM Backport #21718 (In Progress): jewel: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18166 Kefu Chai
08:09 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18166 Kefu Chai
07:46 AM Bug #21716 (Fix Under Review): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
-https://github.com/ceph/ceph/pull/17550- Kefu Chai
07:42 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
https://github.com/ceph/ceph/pull/17313 might be relevant. Kefu Chai
07:41 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
... Kefu Chai
05:32 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
i suspected that btrfs somehow failed to handle the ioctl(BTRFS_IOC_CLONE_RANGE) call. but i checked linux kernel of ... Kefu Chai
04:20 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
David, sorry for the latency. yeah, it is causing test failures. the errno is 95 (Operation not supported), -it's not... Kefu Chai

10/06/2017

08:18 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
fast-tracking the backport, since it's already open Nathan Cutler
07:49 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks...
Yuri Weinstein
02:01 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Sage Weil
07:46 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
Sage Weil
07:36 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
@sage is this just a matter to execute "/usr/bin/rbd ls" line at some point of a tests? I'd be happy to add this. P... Yuri Weinstein
05:15 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
@Yuri, @Sage - I guess the upgrade/kraken-x suite did not catch this because it does not do "/usr/bin/rbd ls" ? Nathan Cutler
01:17 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Much appreciated! Sarah Brofeldt
12:39 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Sarah, the fix is in the current luminous branch now. Once it builds (~1 hrs), you can install the packages from htt... Sage Weil
12:39 PM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
Sage Weil
05:48 PM Feature #21710 (New): add wildcard for namespaces
implement * wildcard to allow access to namespaces starting with a given string
allow rw namespace=cephfs_a*
wo...
Douglas Fuller
12:39 PM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
Sage Weil
03:22 AM Backport #21692 (In Progress): luminous: Kraken client crash after upgrading cluster from Kraken ...
Nathan Cutler
03:18 AM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
https://github.com/ceph/ceph/pull/18140 Nathan Cutler
03:21 AM Backport #21702 (Resolved): luminous: BlueStore::umount will crash when the BlueStore is opened b...
https://github.com/ceph/ceph/pull/18750 Nathan Cutler
03:21 AM Backport #21701 (Resolved): luminous: ceph-kvstore-tool does not call bluestore's umount when exit
https://github.com/ceph/ceph/pull/18751 Nathan Cutler
03:21 AM Bug #21625: ceph-kvstore-tool does not call bluestore's umount when exit
https://github.com/ceph/ceph/pull/18083 Nathan Cutler
03:20 AM Bug #21624: BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
https://github.com/ceph/ceph/pull/18082 Nathan Cutler
03:18 AM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
https://github.com/ceph/ceph/pull/18753 Nathan Cutler
03:18 AM Backport #21693 (Resolved): luminous: interval_map.h: 161: FAILED assert(len > 0)
https://github.com/ceph/ceph/pull/18413 Nathan Cutler
02:02 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
Sage Weil
02:00 AM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
... Sage Weil

10/05/2017

10:33 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
https://github.com/ceph/ceph/pull/18140 backport Sage Weil
10:30 PM Bug #21660 (Pending Backport): Kraken client crash after upgrading cluster from Kraken to Luminous
Greg Farnum
08:27 PM Bug #21660 (Fix Under Review): Kraken client crash after upgrading cluster from Kraken to Luminous
... Sage Weil
04:47 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
fc655d9b-16cd-4342-bf4b-689a3c0d2891 generated on a Luminous client.
On the Kraken client, this results in:
<pr...
Sarah Brofeldt
04:08 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Hi Sarah,
Can you 'ceph osd getmap 308 -o 308' and 'ceph-post-file 308'?
Sage Weil
02:50 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
I wasn't clever enough to save the core file initially, so I've reproduced the issue on a reinstall of Kraken after u... Sarah Brofeldt
06:01 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Yuri's testing it (it will pass), so I went ahead and created a backport PR: https://github.com/ceph/ceph/pull/18132 Greg Farnum
04:17 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
https://github.com/ceph/ceph/pull/18130 Sage Weil
03:05 AM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
Sage Weil
11:59 AM Bug #21470 (Pending Backport): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
https://github.com/ceph/ceph/pull/18127 for the backport Sage Weil
03:04 AM Bug #21629 (Pending Backport): interval_map.h: 161: FAILED assert(len > 0)
Sage Weil

10/04/2017

10:19 PM Bug #21660 (Need More Info): Kraken client crash after upgrading cluster from Kraken to Luminous
Do you still have the core file? I would be very interested in seeing the epoch for the OSDMap that was being decode... Sage Weil
01:10 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Crash in the messenger layer of librados. Jason Dillaman
09:54 PM Bug #21470 (Fix Under Review): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
https://github.com/ceph/ceph/pull/18118
Thanks, Bob! Please let me know if you see it fail. This should be inclu...
Sage Weil
04:56 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Yep, left it running an entire night and wrote 1.5TB without crashing. Seems to be fixed. Thanks! Bob Bobington
05:52 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
This time I couldn't apply your changes to the original Luminous source release so I pulled the entire Git branch and... Bob Bobington
07:40 PM Bug #20910 (In Progress): spurious MON_DOWN, apparently slow/laggy mon
not resolved yet! Sage Weil
06:58 PM Bug #21624 (Pending Backport): BlueStore::umount will crash when the BlueStore is opened by start...
Sage Weil
06:56 PM Bug #21625 (Pending Backport): ceph-kvstore-tool does not call bluestore's umount when exit
Sage Weil
02:32 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
Kefu Chai

10/03/2017

09:49 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
I've pushed another patch to the same branch.. can you give it a try? Sage Weil
09:46 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
From that log I've narrowed the problem down to this line... Sage Weil
08:42 PM Bug #21303 (Resolved): rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Thanks! Sage Weil
06:40 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
/a/sage-2017-10-03_12:00:34-rados-wip-sage-testing2-2017-10-02-2121-distro-basic-smithi/1698722 Sage Weil
02:37 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
I managed to get some debug symbols working.... Sarah Brofeldt
05:41 AM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
I'm having some trouble making the debug symbols work, (I installed ceph-common-dbg, librbd1-dbg and librados2-dbg to... Sarah Brofeldt
02:58 AM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
https://github.com/ceph/ceph/pull/20081
With http://tracker.ceph.com/issues/22069
Nathan Cutler
02:58 AM Backport #21650 (Resolved): luminous: buffer_anon leak during deep scrub (on otherwise idle osd)
https://github.com/ceph/ceph/pull/18227 Nathan Cutler
02:57 AM Backport #21636 (Resolved): luminous: ceph-monstore-tool --readable mode doesn't understand FSMap...
https://github.com/ceph/ceph/pull/18754 Nathan Cutler

10/02/2017

11:14 PM Bug #18162 (In Progress): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for ...
David Zafman
09:35 PM Bug #21629 (Fix Under Review): interval_map.h: 161: FAILED assert(len > 0)
*PR*: https://github.com/ceph/ceph/pull/18088 Jason Dillaman
09:34 PM Bug #21629: interval_map.h: 161: FAILED assert(len > 0)
The compare-extent op was beyond the truncated extent of the object. The EC async read code does not handle zero-leng... Jason Dillaman
07:39 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
... Jason Dillaman
04:47 PM Bug #21611 (Closed): rename in BlueFS is not atomic
ceph-kvstore-tool doesn't call umount() of BlueStore. Chang Liu
04:12 PM Bug #21625 (Resolved): ceph-kvstore-tool does not call bluestore's umount when exit
It will not flush dirty log to durable storage and lost some data. for example, user set a KV pair by ceph-kvstore-to... Chang Liu
04:03 PM Bug #21624 (Resolved): BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
ceph-kvstore-tool use `start_kv_only` to mount a BlueStore. Chang Liu
01:50 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
Nathan Cutler
01:50 PM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df
Nathan Cutler
01:24 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
https://github.com/ceph/ceph/pull/18079 Sage Weil
01:21 PM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
... Sage Weil
12:21 PM Bug #21614 (Fix Under Review): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/s...
https://github.com/ceph/ceph/pull/18078 Sage Weil
03:47 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
http://pulpito.ceph.com/kchai-2017-10-01_17:38:10-rados-wip-kefu-testing-2017-10-01-2202-distro-basic-mira/1692959/
...
Kefu Chai
08:15 AM Backport #21283 (Resolved): luminous: spurious MON_DOWN, apparently slow/laggy mon
Abhishek Lekshmanan
08:14 AM Backport #21374 (Resolved): luminous: incorrect erasure-code space in command ceph df
Abhishek Lekshmanan
03:42 AM Bug #21566 (Pending Backport): OSDService::recovery_need_sleep read+updated without locking
Kefu Chai

10/01/2017

09:08 AM Bug #21611 (Closed): rename in BlueFS is not atomic
I testing repair command, and found that:
1. rocksdb creates new MANIFEST file during repair database, and wants t...
Chang Liu
02:20 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
TSAN unfortunately just caused the OSDs to core dump instantly. I'll see if I can find another way to find threading ... Bob Bobington

09/30/2017

07:22 AM Bug #21603: rocksdb is using slow crc
Mark, please let me know if i should update ceph/rocksdb with this fix and pick it up in ceph/ceph if you think we ne... Kefu Chai
07:20 AM Bug #21603: rocksdb is using slow crc
https://github.com/facebook/rocksdb/pull/2950 Kefu Chai
06:33 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
While I'm not intimately familiar with threaded programming, I'm okay with general C++. Could you possibly explain wh... Bob Bobington
03:05 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
No luck. I applied 1918c57c7c6304875501f4f4b04b9c82834395a3 from the aforementioned repo to my copy of the official L... Bob Bobington
05:31 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
After merged the following pacths, the error did't happend again. You can close the issue. Thanks!

pacth list:
h...
黄 维
04:11 AM Bug #21577 (Pending Backport): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Kefu Chai

09/29/2017

10:36 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks good. Greg Farnum
09:18 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Ah, found it: https://github.com/ceph/ceph-ci/tree/wip-21470-test Bob Bobington
09:12 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
I'm not on a Debian or Redhat derivative, is there a Git repository I can get the source from or a tarball you can li... Bob Bobington
06:54 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Ok, that's kind of embarrassing, I thinkt eh fix is pretty simple. Can you please test out this branch?
wip-21470-...
Sage Weil
06:39 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Can you repeat the fsck with --debug-bluefs 20?
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr ...
Sage Weil
06:11 PM Bug #21382 (Pending Backport): Erasure code recovery should send additional reads if necessary
David Zafman
06:08 PM Bug #21603: rocksdb is using slow crc
Kefu Chai wrote:
> i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fa...
Mark Nelson
05:37 PM Bug #21603: rocksdb is using slow crc
@kefu, that's really elegant work, thanks for the info
Matt
Matt Benjamin
04:49 PM Bug #21603: rocksdb is using slow crc
i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fast_CRC32() is always... Kefu Chai
03:08 PM Bug #21603: rocksdb is using slow crc
Matt Benjamin wrote:
> Just randomly, is this output just from ceph-osd running under perf?
This is output from m...
Mark Nelson
03:00 PM Bug #21603: rocksdb is using slow crc
Just randomly, is this output just from ceph-osd running under perf?
Matt
Matt Benjamin
02:42 PM Bug #21603 (Resolved): rocksdb is using slow crc
... Sage Weil
03:00 PM Bug #21249 (Resolved): Client client.admin marked osd.2 out, after it was down for 1504627577 sec...
Nathan Cutler
02:58 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
Nathan Cutler
02:38 PM Bug #21566 (Fix Under Review): OSDService::recovery_need_sleep read+updated without locking
https://github.com/ceph/ceph/pull/18022 should take care of this. Neha Ojha
12:11 PM Backport #21307 (Resolved): luminous: Client client.admin marked osd.2 out, after it was down for...
Sage Weil
12:11 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
Sage Weil
10:43 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
osd.6 remove object "0#2:c4b0339b:::benchmark_data_mira035.xsky.com_17216_object7868:head#" from backfillinfo.objects... huang jun
03:53 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
... huang jun
01:48 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
... huang jun
12:01 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
Is this on master?
Shouldn't osd.7 have the 149'793 log entry for the delete, and thus detect the retry as a dupli...
Josh Durgin
 

Also available in: Atom