Activity
From 09/23/2017 to 10/22/2017
10/22/2017
- 05:32 AM Bug #21847: osd frequently been marked down and up
- from the dmesg, i found lots of "libceph: osd.6 down"
disk are all good. if you have never saw such kind weird log, ... - 03:17 AM Bug #21887 (Duplicate): degraded calculation is off during backfill
10/21/2017
- 07:53 PM Bug #21887: degraded calculation is off during backfill
- We should backport the fix to luminous. It is confusing/scary that the 'degraded' health warning comes up during a r...
- 04:23 PM Bug #21887 (Duplicate): degraded calculation is off during backfill
- The PG is active+remapped+backfill_wait. There are 2 backfill targets, and 3 acting which are all up to date. There...
- 05:48 PM Bug #21750: scrub stat mismatch on bytes
- Yeah, same here. https://github.com/ceph/ceph/pull/18396 was included in my run.
http://pulpito.ceph.com/sage-201... - 01:08 PM Bug #21750: scrub stat mismatch on bytes
- Seeing more scrub-errors after https://github.com/ceph/ceph/pull/18396 is applied.
http://pulpito.ceph.com/sage-20... - 05:47 PM Bug #21844 (Pending Backport): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions ...
- 05:45 PM Bug #21845 (Pending Backport): Objecter::_send_op unnecessarily constructs costly hobject_t
- 04:16 PM Bug #20759: mon: valgrind detects a few leaks
- /a/kchai-2017-10-21_09:27:38-rados-wip-kefu-testing-2017-10-21-1049-distro-basic-mira/1757648/remote/mira121/log/ceph...
- 04:10 AM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
- 04:09 AM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- 02:49 AM Bug #21880 (Fix Under Review): ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap en...
10/20/2017
- 09:33 PM Bug #21880: ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap enabled
- https://github.com/ceph/ceph/pull/18452
- 09:29 PM Bug #21880 (Resolved): ObjectStore/StoreTest.Synthetic/1 (filestore) fails with fiemap enabled
- ...
- 09:29 PM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- Enabling fiemap makes FileStore's Synthetic/1 fail reliably:...
- 03:27 PM Bug #21825: OSD won't stay online and crashes with abort
- Would you be interested in having a copy of the 2 GB PG which causes ceph-objectstore-tool to crash?
- 03:25 PM Bug #21825: OSD won't stay online and crashes with abort
- I did a quick check on my 4 hosts and jemalloc is not enabled. The cluster is now back to active+clean.
- 02:25 PM Bug #21825: OSD won't stay online and crashes with abort
- Can you confirm you're not using jemalloc (check /etc/{default,sysconfig}/ceph)?
- 02:27 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
- I posted PR https://github.com/ceph/ceph/pull/18418 as a temporary workaround for clients. I figured I would leave th...
- 02:19 PM Bug #21846: Default ms log level results in ~40% performance degradation on RBD 4K random read IO
- Two options?
1. Just set debug ms = 0 by default for clients.
2. Fix the async msgr to not log the second message... - 02:17 PM Bug #21847 (Need More Info): osd frequently been marked down and up
- Is there anything in 'dmesg' output? Maybe a bad disk?
- 01:54 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- Indeed -- in the OSDs. I only benchmarked the librbd clients under high IOPS workloads and that call is only executed...
- 01:46 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- we have a lots of "xx == hobject_t()" judgements in the codes...
- 01:43 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- ... I should also note that in unrelated "perf record" sessions for the "debug ms = 0/1" performance depredations, yo...
- 01:39 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- multiple runs of "perf record" didn't lie -- and neither did the fact that moving it increased performance by ~10% un...
- 01:36 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- of course, it's a good cleanup
- 01:35 PM Bug #21845: Objecter::_send_op unnecessarily constructs costly hobject_t
- I really don't agree that this will cause 10% performance degraded... the construct should be nanoseconds level..
- 01:33 PM Bug #21845 (Fix Under Review): Objecter::_send_op unnecessarily constructs costly hobject_t
- *PR*: https://github.com/ceph/ceph/pull/18427
- 01:31 PM Bug #21845 (In Progress): Objecter::_send_op unnecessarily constructs costly hobject_t
- 01:51 PM Bug #21878 (Fix Under Review): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode...
- https://github.com/ceph/ceph/pull/18428
- 01:29 PM Bug #21878 (Resolved): bluefs: os/bluestore/BlueFS.cc: 1505: FAILED assert(h->file->fnode.ino != 1)
- ...
- 01:38 PM Bug #21842 (Resolved): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- 09:29 AM Backport #21872 (Resolved): jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- https://github.com/ceph/ceph/pull/20143
- 09:29 AM Backport #21871 (Rejected): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- -https://github.com/ceph/ceph/pull/20448-
- 05:30 AM Bug #21573 (Fix Under Review): [upgrade] buffer::list ABI broken in luminous release
- https://github.com/ceph/ceph/pull/18408
- 01:24 AM Backport #21693 (Fix Under Review): luminous: interval_map.h: 161: FAILED assert(len > 0)
10/19/2017
- 09:47 PM Bug #21204 (Resolved): DNS SRV default service name not used anymore
- 09:43 PM Bug #21365 (Resolved): Daemons(OSD, Mon...) exit abnormally at injectargs command
- 09:42 PM Backport #21343 (Resolved): luminous: DNS SRV default service name not used anymore
- 09:40 PM Backport #21438 (Resolved): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
- 01:37 PM Bug #21844 (Fix Under Review): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions ...
- *PR*: https://github.com/ceph/ceph/pull/18400
- 01:04 PM Bug #21844 (In Progress): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -E...
- 01:56 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
- Running RBD small IO performance tests against a mostly sparse image shows that the Objecter is throwing/catching a b...
- 01:30 PM Bug #21750: scrub stat mismatch on bytes
- https://github.com/ceph/ceph/pull/18396 probably fixes this!
- 12:15 PM Backport #21783 (In Progress): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make...
- 09:14 AM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
- this would be a little bit tricky:...
- 08:54 AM Backport #21150 (Fix Under Review): jewel: tests: btrfs copy_clone returns errno 95 (Operation no...
- https://github.com/ceph/ceph/pull/18165
- 07:35 AM Bug #21842 (Fix Under Review): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- 07:34 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- > I can't figure out where creates "stat file: db". I guess we use this "stat file: db" as our dbname.
and it cons... - 06:44 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- Kefu, I know why repair failed.
rocksdb's Env imports a new member function called AreSameFiles. But our BlueRocks... - 06:31 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- Chang, no worries. i am fixing it.
- 03:36 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- I can't figure out where creates "stat file: db". I guess we use this "stat file: db" as our dbname.
- 03:13 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- Rocksdb::RepairDB tries to find all files: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L168, then it...
- 02:14 AM Bug #21842: "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- Working on it
- 01:32 AM Bug #21842 (Resolved): "repair kvstore failed" in qa/workunits/cephtool/test_kvstore_tool.sh
- ...
- 03:53 AM Bug #21847 (Need More Info): osd frequently been marked down and up
- our ceph version is 10.2.5
we have encounter an issue that one of our osd has been marked down and up about 3 times ... - 02:41 AM Bug #21846 (Closed): Default ms log level results in ~40% performance degradation on RBD 4K rando...
- Luminous is now 15% slower than Jewel and over 40% slower as compared to when the ms logs are disabled.
v10.2.10 d... - 02:25 AM Bug #21845 (Resolved): Objecter::_send_op unnecessarily constructs costly hobject_t
- With zero backoffs, just constructing an hobject_t ("hobject_t hoid = op->target.get_hobj();") results in an approxim...
10/18/2017
- 11:30 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
In the context of the newly created PGs:
pg[10.5a5s3( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0...- 09:12 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- ...
- 09:46 PM Bug #21825: OSD won't stay online and crashes with abort
- I had a chance to try and rm osd 3 today and replace the hard disk with a new one, no crash so far, it is rebalancing...
- 06:26 AM Bug #21825: OSD won't stay online and crashes with abort
- I think there is more to this, after active+clean, I shutdown osd.3 and then the PG went active+clean+snaptrim then o...
- 05:11 AM Bug #21825: OSD won't stay online and crashes with abort
- After tempering around with OSD kill and starting many, marking lost and unfound, I finally was able to recover all b...
- 04:26 AM Bug #21825: OSD won't stay online and crashes with abort
- You should bump up the OSD logging to see more of what is happening.
- 03:33 AM Bug #21825 (Closed): OSD won't stay online and crashes with abort
- I have an issue where 2 OSDs can't stay up at the same time and one will crash the other causing down PGs,
Exporti... - 05:36 PM Bug #20243: Improve size scrub error handling and ignore system attrs in xattr checking
If we wanted to backport to Jewel it would be helpful to include this pull request first.
https://github.com/cep...
10/17/2017
- 09:28 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
- ...
- 08:41 PM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
- @Kefu can you pls take a look?
- 08:40 PM Backport #21544 (Fix Under Review): luminous: mon osd feature checks for osdmap flags and require...
- 08:20 PM Backport #21544 (In Progress): luminous: mon osd feature checks for osdmap flags and require-osd-...
- 07:03 PM Backport #21543 (Fix Under Review): luminous: bluestore fsck took 224.778802 seconds to complete ...
- 06:58 PM Backport #21543 (In Progress): luminous: bluestore fsck took 224.778802 seconds to complete which...
- 06:40 PM Feature #21760: add tools to stress RADOS omap
- https://github.com/ceph/ceph/pull/18361
- 05:29 PM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
- 12:41 PM Bug #19198 (Closed): Bluestore doubles mem usage when caching object content
- I talked to Igor. It seems this is really is a non-bug, as the UTs use the glibc allocator. A follow-up will be to us...
- 04:56 AM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
- ...
- 02:50 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- same problem: http://tracker.ceph.com/issues/21174
10/16/2017
- 11:29 PM Bug #20981: ./run_seed_to_range.sh errored out
- I was never able to reproduce this with the following command line test.
rm -rf /tmp/td td ; mkdir /tmp/td td ; cd... - 09:12 PM Bug #18162 (Fix Under Review): osd/ReplicatedPG.cc: recover_replicas: object added to missing set...
- https://github.com/ceph/ceph/pull/18145
- 06:41 AM Bug #20053: crush compile / decompile looses precision on weight
10/13/2017
- 08:48 PM Bug #21750 (In Progress): scrub stat mismatch on bytes
- 08:48 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
- 05:15 PM Bug #21716 (Pending Backport): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- 05:15 PM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- 12:13 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
- 12:13 PM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
- https://github.com/ceph/ceph/pull/21184
- 12:13 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
- 12:13 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- https://github.com/ceph/ceph/pull/21158
- 12:12 PM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- https://github.com/ceph/ceph/pull/18398
- 04:11 AM Bug #21603 (Resolved): rocksdb is using slow crc
- 03:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
10/12/2017
- 08:08 PM Bug #21737 (Pending Backport): OSDMap cache assert on shutdown
- 05:32 PM Feature #21760 (In Progress): add tools to stress RADOS omap
- 04:16 PM Bug #21750: scrub stat mismatch on bytes
- http://pulpito.front.sepia.ceph.com/yuriw-2017-10-11_19:25:41-rados-wip-yuri3-testing-2017-10-11-1645-distro-basic-sm...
- 08:26 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- i have also met this problem when testing pull out disk and insert; ceph version 0.94.5,according @huang jun's osd lo...
- 05:01 AM Bug #21603 (Fix Under Review): rocksdb is using slow crc
- https://github.com/ceph/ceph/pull/18262
- 04:43 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- ...
- 12:46 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- Kefu, thanks for fixing this. Can you also indicate which of the mentioned PRs need to be backported to fix the test ...
10/11/2017
- 09:49 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
- problem seems to be that the unsharing code isn't handling compressed extents properly.
https://github.com/ceph/ce... - 09:47 PM Bug #21766 (Resolved): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec...
- ...
- 05:20 PM Bug #21331 (Resolved): pg recovery priority inversion
- https://github.com/ceph/ceph/pull/18025 is luminous backport
- 05:19 PM Bug #21417 (Resolved): buffer_anon leak during deep scrub (on otherwise idle osd)
- 01:49 PM Feature #21760: add tools to stress RADOS omap
- I had a discussion with Douglas and in the current implementation, we can enhance following points:
1. Adding --he... - 01:37 PM Feature #21760 (In Progress): add tools to stress RADOS omap
- Add the tools omap_create and omap_delete to stress the RADOS object map directly.
- 01:45 PM Bug #21758 (Pending Backport): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 09:51 AM Bug #21758 (Fix Under Review): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- https://github.com/ceph/ceph/pull/18242
- 09:49 AM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- ...
- 09:37 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
- https://github.com/ceph/ceph/pull/18241
- 06:13 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
- comment out in ceph.conf
#osd copyfrom max chunk = 524288
if we use this config, it works fine.
but if we comment ... - 06:01 AM Bug #21756 (New): /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i....
- steps to reproduce:...
- 08:09 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- https://github.com/ceph/ceph/pull/18240
- 07:50 AM Bug #21757 (New): snapshotted RBD objects can't be automatically evicted from a cache tier when c...
- [enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64
[procedure to ... - 02:26 AM Bug #21750: scrub stat mismatch on bytes
- /a/sage-2017-10-10_20:19:10-rados-wip-sage-testing2-2017-10-10-1320-distro-basic-smithi/1723818
rados/thrash/{0-size...
10/10/2017
- 06:17 PM Bug #21407 (Pending Backport): backoff causes out of order op
- 01:50 PM Bug #21750 (Resolved): scrub stat mismatch on bytes
- ...
- 01:32 PM Bug #21744 (Fix Under Review): Core when `ceph-kvstore-tool exists`
- https://github.com/ceph/ceph/pull/16745/commits/46bbd32fad14579f9260765a0cb9bcfe0ba7defa
- 09:10 AM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
- http://pulpito.ceph.com/sage-2017-10-09_22:17:19-rados-wip-sage-testing2-2017-10-09-1528-distro-basic-smithi/1718563/...
10/09/2017
- 09:09 PM Bug #21737 (Fix Under Review): OSDMap cache assert on shutdown
- https://github.com/ceph/ceph/pull/18201
- 08:19 PM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- We don't want users to hit asserts if we've leaked memory references on shutdown. For instance:...
- 08:44 PM Feature #18206 (Resolved): osd: osd_scrub_during_recovery only considers primary, not replicas
- 08:43 PM Backport #21117 (Resolved): jewel: osd: osd_scrub_during_recovery only considers primary, not rep...
- 05:01 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
- 12:25 PM Documentation #21733 (In Progress): OSD-Config-ref(osd max object size) section malformed
- https://github.com/ceph/ceph/pull/18188
- 12:09 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
- Syntax error in
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
at section osd max object... - 11:21 AM Bug #21717 (Resolved): doc fails build with latest breathe
- 11:21 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
- 06:44 AM Bug #21721 (Can't reproduce): ceph pg force-backfill cmd failed with ENOENT error
- Command failed on mira025 with status 2: u'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage t...
10/08/2017
- 04:31 PM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
- 08:13 AM Backport #21719 (In Progress): luminous: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18167
- 08:11 AM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18167
- 08:15 AM Bug #21717: doc fails build with latest breathe
- recently breathe introduced a change not compatible with old sphinx, see https://github.com/michaeljones/breathe/comm...
- 08:09 AM Bug #21717 (Pending Backport): doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/17025
- 08:09 AM Bug #21717 (Resolved): doc fails build with latest breathe
- ...
- 08:10 AM Backport #21718 (In Progress): jewel: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18166
- 08:09 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18166
- 07:46 AM Bug #21716 (Fix Under Review): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- -https://github.com/ceph/ceph/pull/17550-
- 07:42 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- https://github.com/ceph/ceph/pull/17313 might be relevant.
- 07:41 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- ...
- 05:32 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
- i suspected that btrfs somehow failed to handle the ioctl(BTRFS_IOC_CLONE_RANGE) call. but i checked linux kernel of ...
- 04:20 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
- David, sorry for the latency. yeah, it is causing test failures. the errno is 95 (Operation not supported), -it's not...
10/06/2017
- 08:18 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- fast-tracking the backport, since it's already open
- 07:49 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks... - 02:01 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- 07:46 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
- 07:36 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- @sage is this just a matter to execute "/usr/bin/rbd ls" line at some point of a tests? I'd be happy to add this. P...
- 05:15 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- @Yuri, @Sage - I guess the upgrade/kraken-x suite did not catch this because it does not do "/usr/bin/rbd ls" ?
- 01:17 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Much appreciated!
- 12:39 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Sarah, the fix is in the current luminous branch now. Once it builds (~1 hrs), you can install the packages from htt...
- 12:39 PM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
- 05:48 PM Feature #21710 (New): add wildcard for namespaces
- implement * wildcard to allow access to namespaces starting with a given string
allow rw namespace=cephfs_a*
wo... - 12:39 PM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
- 03:22 AM Backport #21692 (In Progress): luminous: Kraken client crash after upgrading cluster from Kraken ...
- 03:18 AM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
- https://github.com/ceph/ceph/pull/18140
- 03:21 AM Backport #21702 (Resolved): luminous: BlueStore::umount will crash when the BlueStore is opened b...
- https://github.com/ceph/ceph/pull/18750
- 03:21 AM Backport #21701 (Resolved): luminous: ceph-kvstore-tool does not call bluestore's umount when exit
- https://github.com/ceph/ceph/pull/18751
- 03:21 AM Bug #21625: ceph-kvstore-tool does not call bluestore's umount when exit
- https://github.com/ceph/ceph/pull/18083
- 03:20 AM Bug #21624: BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
- https://github.com/ceph/ceph/pull/18082
- 03:18 AM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
- https://github.com/ceph/ceph/pull/18753
- 03:18 AM Backport #21693 (Resolved): luminous: interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/18413
- 02:02 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
- 02:00 AM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
- ...
10/05/2017
- 10:33 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- https://github.com/ceph/ceph/pull/18140 backport
- 10:30 PM Bug #21660 (Pending Backport): Kraken client crash after upgrading cluster from Kraken to Luminous
- 08:27 PM Bug #21660 (Fix Under Review): Kraken client crash after upgrading cluster from Kraken to Luminous
- ...
- 04:47 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- fc655d9b-16cd-4342-bf4b-689a3c0d2891 generated on a Luminous client.
On the Kraken client, this results in:
<pr... - 04:08 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Hi Sarah,
Can you 'ceph osd getmap 308 -o 308' and 'ceph-post-file 308'? - 02:50 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- I wasn't clever enough to save the core file initially, so I've reproduced the issue on a reinstall of Kraken after u...
- 06:01 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Yuri's testing it (it will pass), so I went ahead and created a backport PR: https://github.com/ceph/ceph/pull/18132
- 04:17 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- https://github.com/ceph/ceph/pull/18130
- 03:05 AM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- 11:59 AM Bug #21470 (Pending Backport): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
- https://github.com/ceph/ceph/pull/18127 for the backport
- 03:04 AM Bug #21629 (Pending Backport): interval_map.h: 161: FAILED assert(len > 0)
10/04/2017
- 10:19 PM Bug #21660 (Need More Info): Kraken client crash after upgrading cluster from Kraken to Luminous
- Do you still have the core file? I would be very interested in seeing the epoch for the OSDMap that was being decode...
- 01:10 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Crash in the messenger layer of librados.
- 09:54 PM Bug #21470 (Fix Under Review): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
- https://github.com/ceph/ceph/pull/18118
Thanks, Bob! Please let me know if you see it fail. This should be inclu... - 04:56 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Yep, left it running an entire night and wrote 1.5TB without crashing. Seems to be fixed. Thanks!
- 05:52 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- This time I couldn't apply your changes to the original Luminous source release so I pulled the entire Git branch and...
- 07:40 PM Bug #20910 (In Progress): spurious MON_DOWN, apparently slow/laggy mon
- not resolved yet!
- 06:58 PM Bug #21624 (Pending Backport): BlueStore::umount will crash when the BlueStore is opened by start...
- 06:56 PM Bug #21625 (Pending Backport): ceph-kvstore-tool does not call bluestore's umount when exit
- 02:32 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
10/03/2017
- 09:49 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- I've pushed another patch to the same branch.. can you give it a try?
- 09:46 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- From that log I've narrowed the problem down to this line...
- 08:42 PM Bug #21303 (Resolved): rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Thanks!
- 06:40 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- /a/sage-2017-10-03_12:00:34-rados-wip-sage-testing2-2017-10-02-2121-distro-basic-smithi/1698722
- 02:37 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- I managed to get some debug symbols working....
- 05:41 AM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
- I'm having some trouble making the debug symbols work, (I installed ceph-common-dbg, librbd1-dbg and librados2-dbg to...
- 02:58 AM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
- https://github.com/ceph/ceph/pull/20081
With http://tracker.ceph.com/issues/22069 - 02:58 AM Backport #21650 (Resolved): luminous: buffer_anon leak during deep scrub (on otherwise idle osd)
- https://github.com/ceph/ceph/pull/18227
- 02:57 AM Backport #21636 (Resolved): luminous: ceph-monstore-tool --readable mode doesn't understand FSMap...
- https://github.com/ceph/ceph/pull/18754
10/02/2017
- 11:14 PM Bug #18162 (In Progress): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for ...
- 09:35 PM Bug #21629 (Fix Under Review): interval_map.h: 161: FAILED assert(len > 0)
- *PR*: https://github.com/ceph/ceph/pull/18088
- 09:34 PM Bug #21629: interval_map.h: 161: FAILED assert(len > 0)
- The compare-extent op was beyond the truncated extent of the object. The EC async read code does not handle zero-leng...
- 07:39 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
- ...
- 04:47 PM Bug #21611 (Closed): rename in BlueFS is not atomic
- ceph-kvstore-tool doesn't call umount() of BlueStore.
- 04:12 PM Bug #21625 (Resolved): ceph-kvstore-tool does not call bluestore's umount when exit
- It will not flush dirty log to durable storage and lost some data. for example, user set a KV pair by ceph-kvstore-to...
- 04:03 PM Bug #21624 (Resolved): BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
- ceph-kvstore-tool use `start_kv_only` to mount a BlueStore.
- 01:50 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
- 01:50 PM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df
- 01:24 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- https://github.com/ceph/ceph/pull/18079
- 01:21 PM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- ...
- 12:21 PM Bug #21614 (Fix Under Review): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/s...
- https://github.com/ceph/ceph/pull/18078
- 03:47 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
- http://pulpito.ceph.com/kchai-2017-10-01_17:38:10-rados-wip-kefu-testing-2017-10-01-2202-distro-basic-mira/1692959/
... - 08:15 AM Backport #21283 (Resolved): luminous: spurious MON_DOWN, apparently slow/laggy mon
- 08:14 AM Backport #21374 (Resolved): luminous: incorrect erasure-code space in command ceph df
- 03:42 AM Bug #21566 (Pending Backport): OSDService::recovery_need_sleep read+updated without locking
10/01/2017
- 09:08 AM Bug #21611 (Closed): rename in BlueFS is not atomic
- I testing repair command, and found that:
1. rocksdb creates new MANIFEST file during repair database, and wants t... - 02:20 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- TSAN unfortunately just caused the OSDs to core dump instantly. I'll see if I can find another way to find threading ...
09/30/2017
- 07:22 AM Bug #21603: rocksdb is using slow crc
- Mark, please let me know if i should update ceph/rocksdb with this fix and pick it up in ceph/ceph if you think we ne...
- 07:20 AM Bug #21603: rocksdb is using slow crc
- https://github.com/facebook/rocksdb/pull/2950
- 06:33 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- While I'm not intimately familiar with threaded programming, I'm okay with general C++. Could you possibly explain wh...
- 03:05 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- No luck. I applied 1918c57c7c6304875501f4f4b04b9c82834395a3 from the aforementioned repo to my copy of the official L...
- 05:31 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- After merged the following pacths, the error did't happend again. You can close the issue. Thanks!
pacth list:
h... - 04:11 AM Bug #21577 (Pending Backport): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
09/29/2017
- 10:36 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks good.
- 09:18 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Ah, found it: https://github.com/ceph/ceph-ci/tree/wip-21470-test
- 09:12 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- I'm not on a Debian or Redhat derivative, is there a Git repository I can get the source from or a tarball you can li...
- 06:54 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Ok, that's kind of embarrassing, I thinkt eh fix is pretty simple. Can you please test out this branch?
wip-21470-... - 06:39 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Can you repeat the fsck with --debug-bluefs 20?
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr ... - 06:11 PM Bug #21382 (Pending Backport): Erasure code recovery should send additional reads if necessary
- 06:08 PM Bug #21603: rocksdb is using slow crc
- Kefu Chai wrote:
> i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fa... - 05:37 PM Bug #21603: rocksdb is using slow crc
- @kefu, that's really elegant work, thanks for the info
Matt - 04:49 PM Bug #21603: rocksdb is using slow crc
- i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fast_CRC32() is always...
- 03:08 PM Bug #21603: rocksdb is using slow crc
- Matt Benjamin wrote:
> Just randomly, is this output just from ceph-osd running under perf?
This is output from m... - 03:00 PM Bug #21603: rocksdb is using slow crc
- Just randomly, is this output just from ceph-osd running under perf?
Matt - 02:42 PM Bug #21603 (Resolved): rocksdb is using slow crc
- ...
- 03:00 PM Bug #21249 (Resolved): Client client.admin marked osd.2 out, after it was down for 1504627577 sec...
- 02:58 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
- 02:38 PM Bug #21566 (Fix Under Review): OSDService::recovery_need_sleep read+updated without locking
- https://github.com/ceph/ceph/pull/18022 should take care of this.
- 12:11 PM Backport #21307 (Resolved): luminous: Client client.admin marked osd.2 out, after it was down for...
- 12:11 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
- 10:43 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- osd.6 remove object "0#2:c4b0339b:::benchmark_data_mira035.xsky.com_17216_object7868:head#" from backfillinfo.objects...
- 03:53 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- ...
- 01:48 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- ...
- 12:01 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- Is this on master?
Shouldn't osd.7 have the 149'793 log entry for the delete, and thus detect the retry as a dupli...
09/28/2017
- 01:30 PM Bug #21417 (Pending Backport): buffer_anon leak during deep scrub (on otherwise idle osd)
- 01:27 PM Bug #21592 (Resolved): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- ...
09/27/2017
- 09:13 PM Bug #21577 (Fix Under Review): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
- https://github.com/ceph/ceph/pull/18005
Marking for backport -- I consider this a bugfix because the mdsmap dumpin... - 06:44 PM Bug #21577 (Resolved): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
- Annoying for anyone wanting to inspect these. I never updated it because I don't think I knew it existed :-)
- 07:52 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
- ok, the problem is that as scrub (or whatever) happens, the bluestore cache is populated, but the attrs weren't in th...
- 07:50 PM Bug #21417 (Fix Under Review): buffer_anon leak during deep scrub (on otherwise idle osd)
- https://github.com/ceph/ceph/pull/18001
- 07:41 PM Bug #21580 (Resolved): osd: stalled recovery ends up in recovery_wait
- With https://github.com/ceph/ceph/pull/17839 a stalled recovery (due to remaining unfound objects) goes back into rec...
- 07:28 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
- [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
Description of problem:
[RFE] Stop OSD's removal... - 02:56 PM Bug #21573 (Resolved): [upgrade] buffer::list ABI broken in luminous release
- A client application that was compiled against a pre-Luminous librados C++ API and therefore utilizing bufferlist wil...
- 01:22 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Another case: http://tracker.ceph.com/issues/21537
- 06:28 AM Bug #21566 (Resolved): OSDService::recovery_need_sleep read+updated without locking
- Unless I'm misreading this, OSD::do_recovery() is invoked from the ShardedOpQueue without holding any locks on global...
- 03:05 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- @Josh do you have time to look at it?
09/26/2017
- 01:34 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
- ...
- 10:37 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- osd7:
91'473 (0'0) modify
151'793 (0'0) error
osd.6
91'473 (0'0) modify
149'793 (91'473) delete - 09:01 AM Bug #21555 (New): src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- pg 2.3s0 up/acting is [7,0,2]/[6,0,2]
in backfill_toofull state, osd.6 got write op, bc object > last_backfill, an... - 03:30 AM Bug #21338 (Resolved): There is a big risk in function bufferlist::claim_prepend()
- 12:24 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Okay. Assuming sortbitwise is just a messaging scheme (I think it is), we should be safe to change the assert to requ...
- 12:10 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Okay, the one I'm looking at is crashing on pg 126.b7, at epoch 5350. Pool 126 does not presently exist; epoch 5350 (...
09/25/2017
- 09:21 PM Backport #21544 (Resolved): luminous: mon osd feature checks for osdmap flags and require-osd-rel...
- https://github.com/ceph/ceph/pull/18364
- 09:21 PM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
- https://github.com/ceph/ceph/pull/18362
- 05:37 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- ...although even a slow disk shouldn't be long enough for the the heartbeat to time out. :/
- 05:36 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- It looks like a zilli threads are blocked at...
- 05:25 PM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
- [10:22:39] <@sage> it looks like everyone is waiting for log flush.. which is deep in snprintf in the core. can't te...
- 05:23 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- The log ends 14 minutes prior to the signal, which I imagine is related to #21507....
- 03:47 AM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
- ...
- 02:19 AM Bug #21471 (Pending Backport): mon osd feature checks for osdmap flags and require-osd-release fa...
- 02:15 AM Bug #21474 (Pending Backport): bluestore fsck took 224.778802 seconds to complete which caused "t...
- 02:13 AM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
09/23/2017
- 04:39 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- With a similar but slightly different setup, this same crash happened to me.
Installed via ceph-deploy install --r... - 10:15 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- @Daniel,
Yes.no OSD running on xfs shows the problem in question. I think one of different between the db based o... - 02:25 AM Bug #21382: Erasure code recovery should send additional reads if necessary
- https://github.com/ceph/ceph/pull/17920
- 02:25 AM Bug #21382 (Fix Under Review): Erasure code recovery should send additional reads if necessary
Also available in: Atom