Activity
From 09/19/2017 to 10/18/2017
10/18/2017
- 11:30 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
In the context of the newly created PGs:
pg[10.5a5s3( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0...- 09:12 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- ...
- 09:46 PM Bug #21825: OSD won't stay online and crashes with abort
- I had a chance to try and rm osd 3 today and replace the hard disk with a new one, no crash so far, it is rebalancing...
- 06:26 AM Bug #21825: OSD won't stay online and crashes with abort
- I think there is more to this, after active+clean, I shutdown osd.3 and then the PG went active+clean+snaptrim then o...
- 05:11 AM Bug #21825: OSD won't stay online and crashes with abort
- After tempering around with OSD kill and starting many, marking lost and unfound, I finally was able to recover all b...
- 04:26 AM Bug #21825: OSD won't stay online and crashes with abort
- You should bump up the OSD logging to see more of what is happening.
- 03:33 AM Bug #21825 (Closed): OSD won't stay online and crashes with abort
- I have an issue where 2 OSDs can't stay up at the same time and one will crash the other causing down PGs,
Exporti... - 05:36 PM Bug #20243: Improve size scrub error handling and ignore system attrs in xattr checking
If we wanted to backport to Jewel it would be helpful to include this pull request first.
https://github.com/cep...
10/17/2017
- 09:28 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
- ...
- 08:41 PM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
- @Kefu can you pls take a look?
- 08:40 PM Backport #21544 (Fix Under Review): luminous: mon osd feature checks for osdmap flags and require...
- 08:20 PM Backport #21544 (In Progress): luminous: mon osd feature checks for osdmap flags and require-osd-...
- 07:03 PM Backport #21543 (Fix Under Review): luminous: bluestore fsck took 224.778802 seconds to complete ...
- 06:58 PM Backport #21543 (In Progress): luminous: bluestore fsck took 224.778802 seconds to complete which...
- 06:40 PM Feature #21760: add tools to stress RADOS omap
- https://github.com/ceph/ceph/pull/18361
- 05:29 PM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
- 12:41 PM Bug #19198 (Closed): Bluestore doubles mem usage when caching object content
- I talked to Igor. It seems this is really is a non-bug, as the UTs use the glibc allocator. A follow-up will be to us...
- 04:56 AM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
- ...
- 02:50 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- same problem: http://tracker.ceph.com/issues/21174
10/16/2017
- 11:29 PM Bug #20981: ./run_seed_to_range.sh errored out
- I was never able to reproduce this with the following command line test.
rm -rf /tmp/td td ; mkdir /tmp/td td ; cd... - 09:12 PM Bug #18162 (Fix Under Review): osd/ReplicatedPG.cc: recover_replicas: object added to missing set...
- https://github.com/ceph/ceph/pull/18145
- 06:41 AM Bug #20053: crush compile / decompile looses precision on weight
10/13/2017
- 08:48 PM Bug #21750 (In Progress): scrub stat mismatch on bytes
- 08:48 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
- 05:15 PM Bug #21716 (Pending Backport): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- 05:15 PM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- 12:13 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
- 12:13 PM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
- https://github.com/ceph/ceph/pull/21184
- 12:13 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
- 12:13 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- https://github.com/ceph/ceph/pull/21158
- 12:12 PM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- https://github.com/ceph/ceph/pull/18398
- 04:11 AM Bug #21603 (Resolved): rocksdb is using slow crc
- 03:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
10/12/2017
- 08:08 PM Bug #21737 (Pending Backport): OSDMap cache assert on shutdown
- 05:32 PM Feature #21760 (In Progress): add tools to stress RADOS omap
- 04:16 PM Bug #21750: scrub stat mismatch on bytes
- http://pulpito.front.sepia.ceph.com/yuriw-2017-10-11_19:25:41-rados-wip-yuri3-testing-2017-10-11-1645-distro-basic-sm...
- 08:26 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- i have also met this problem when testing pull out disk and insert; ceph version 0.94.5,according @huang jun's osd lo...
- 05:01 AM Bug #21603 (Fix Under Review): rocksdb is using slow crc
- https://github.com/ceph/ceph/pull/18262
- 04:43 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- ...
- 12:46 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- Kefu, thanks for fixing this. Can you also indicate which of the mentioned PRs need to be backported to fix the test ...
10/11/2017
- 09:49 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
- problem seems to be that the unsharing code isn't handling compressed extents properly.
https://github.com/ceph/ce... - 09:47 PM Bug #21766 (Resolved): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec...
- ...
- 05:20 PM Bug #21331 (Resolved): pg recovery priority inversion
- https://github.com/ceph/ceph/pull/18025 is luminous backport
- 05:19 PM Bug #21417 (Resolved): buffer_anon leak during deep scrub (on otherwise idle osd)
- 01:49 PM Feature #21760: add tools to stress RADOS omap
- I had a discussion with Douglas and in the current implementation, we can enhance following points:
1. Adding --he... - 01:37 PM Feature #21760 (In Progress): add tools to stress RADOS omap
- Add the tools omap_create and omap_delete to stress the RADOS object map directly.
- 01:45 PM Bug #21758 (Pending Backport): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 09:51 AM Bug #21758 (Fix Under Review): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- https://github.com/ceph/ceph/pull/18242
- 09:49 AM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- ...
- 09:37 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
- https://github.com/ceph/ceph/pull/18241
- 06:13 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
- comment out in ceph.conf
#osd copyfrom max chunk = 524288
if we use this config, it works fine.
but if we comment ... - 06:01 AM Bug #21756 (New): /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i....
- steps to reproduce:...
- 08:09 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- https://github.com/ceph/ceph/pull/18240
- 07:50 AM Bug #21757 (New): snapshotted RBD objects can't be automatically evicted from a cache tier when c...
- [enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64
[procedure to ... - 02:26 AM Bug #21750: scrub stat mismatch on bytes
- /a/sage-2017-10-10_20:19:10-rados-wip-sage-testing2-2017-10-10-1320-distro-basic-smithi/1723818
rados/thrash/{0-size...
10/10/2017
- 06:17 PM Bug #21407 (Pending Backport): backoff causes out of order op
- 01:50 PM Bug #21750 (Resolved): scrub stat mismatch on bytes
- ...
- 01:32 PM Bug #21744 (Fix Under Review): Core when `ceph-kvstore-tool exists`
- https://github.com/ceph/ceph/pull/16745/commits/46bbd32fad14579f9260765a0cb9bcfe0ba7defa
- 09:10 AM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
- http://pulpito.ceph.com/sage-2017-10-09_22:17:19-rados-wip-sage-testing2-2017-10-09-1528-distro-basic-smithi/1718563/...
10/09/2017
- 09:09 PM Bug #21737 (Fix Under Review): OSDMap cache assert on shutdown
- https://github.com/ceph/ceph/pull/18201
- 08:19 PM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- We don't want users to hit asserts if we've leaked memory references on shutdown. For instance:...
- 08:44 PM Feature #18206 (Resolved): osd: osd_scrub_during_recovery only considers primary, not replicas
- 08:43 PM Backport #21117 (Resolved): jewel: osd: osd_scrub_during_recovery only considers primary, not rep...
- 05:01 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
- 12:25 PM Documentation #21733 (In Progress): OSD-Config-ref(osd max object size) section malformed
- https://github.com/ceph/ceph/pull/18188
- 12:09 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
- Syntax error in
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
at section osd max object... - 11:21 AM Bug #21717 (Resolved): doc fails build with latest breathe
- 11:21 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
- 06:44 AM Bug #21721 (Can't reproduce): ceph pg force-backfill cmd failed with ENOENT error
- Command failed on mira025 with status 2: u'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage t...
10/08/2017
- 04:31 PM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
- 08:13 AM Backport #21719 (In Progress): luminous: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18167
- 08:11 AM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18167
- 08:15 AM Bug #21717: doc fails build with latest breathe
- recently breathe introduced a change not compatible with old sphinx, see https://github.com/michaeljones/breathe/comm...
- 08:09 AM Bug #21717 (Pending Backport): doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/17025
- 08:09 AM Bug #21717 (Resolved): doc fails build with latest breathe
- ...
- 08:10 AM Backport #21718 (In Progress): jewel: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18166
- 08:09 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
- https://github.com/ceph/ceph/pull/18166
- 07:46 AM Bug #21716 (Fix Under Review): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- -https://github.com/ceph/ceph/pull/17550-
- 07:42 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- https://github.com/ceph/ceph/pull/17313 might be relevant.
- 07:41 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
- ...
- 05:32 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
- i suspected that btrfs somehow failed to handle the ioctl(BTRFS_IOC_CLONE_RANGE) call. but i checked linux kernel of ...
- 04:20 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
- David, sorry for the latency. yeah, it is causing test failures. the errno is 95 (Operation not supported), -it's not...
10/06/2017
- 08:18 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- fast-tracking the backport, since it's already open
- 07:49 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks... - 02:01 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- 07:46 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
- 07:36 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- @sage is this just a matter to execute "/usr/bin/rbd ls" line at some point of a tests? I'd be happy to add this. P...
- 05:15 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- @Yuri, @Sage - I guess the upgrade/kraken-x suite did not catch this because it does not do "/usr/bin/rbd ls" ?
- 01:17 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Much appreciated!
- 12:39 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Sarah, the fix is in the current luminous branch now. Once it builds (~1 hrs), you can install the packages from htt...
- 12:39 PM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
- 05:48 PM Feature #21710 (New): add wildcard for namespaces
- implement * wildcard to allow access to namespaces starting with a given string
allow rw namespace=cephfs_a*
wo... - 12:39 PM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
- 03:22 AM Backport #21692 (In Progress): luminous: Kraken client crash after upgrading cluster from Kraken ...
- 03:18 AM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
- https://github.com/ceph/ceph/pull/18140
- 03:21 AM Backport #21702 (Resolved): luminous: BlueStore::umount will crash when the BlueStore is opened b...
- https://github.com/ceph/ceph/pull/18750
- 03:21 AM Backport #21701 (Resolved): luminous: ceph-kvstore-tool does not call bluestore's umount when exit
- https://github.com/ceph/ceph/pull/18751
- 03:21 AM Bug #21625: ceph-kvstore-tool does not call bluestore's umount when exit
- https://github.com/ceph/ceph/pull/18083
- 03:20 AM Bug #21624: BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
- https://github.com/ceph/ceph/pull/18082
- 03:18 AM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
- https://github.com/ceph/ceph/pull/18753
- 03:18 AM Backport #21693 (Resolved): luminous: interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/18413
- 02:02 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
- 02:00 AM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
- ...
10/05/2017
- 10:33 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- https://github.com/ceph/ceph/pull/18140 backport
- 10:30 PM Bug #21660 (Pending Backport): Kraken client crash after upgrading cluster from Kraken to Luminous
- 08:27 PM Bug #21660 (Fix Under Review): Kraken client crash after upgrading cluster from Kraken to Luminous
- ...
- 04:47 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- fc655d9b-16cd-4342-bf4b-689a3c0d2891 generated on a Luminous client.
On the Kraken client, this results in:
<pr... - 04:08 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Hi Sarah,
Can you 'ceph osd getmap 308 -o 308' and 'ceph-post-file 308'? - 02:50 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- I wasn't clever enough to save the core file initially, so I've reproduced the issue on a reinstall of Kraken after u...
- 06:01 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Yuri's testing it (it will pass), so I went ahead and created a backport PR: https://github.com/ceph/ceph/pull/18132
- 04:17 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- https://github.com/ceph/ceph/pull/18130
- 03:05 AM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- 11:59 AM Bug #21470 (Pending Backport): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
- https://github.com/ceph/ceph/pull/18127 for the backport
- 03:04 AM Bug #21629 (Pending Backport): interval_map.h: 161: FAILED assert(len > 0)
10/04/2017
- 10:19 PM Bug #21660 (Need More Info): Kraken client crash after upgrading cluster from Kraken to Luminous
- Do you still have the core file? I would be very interested in seeing the epoch for the OSDMap that was being decode...
- 01:10 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- Crash in the messenger layer of librados.
- 09:54 PM Bug #21470 (Fix Under Review): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
- https://github.com/ceph/ceph/pull/18118
Thanks, Bob! Please let me know if you see it fail. This should be inclu... - 04:56 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Yep, left it running an entire night and wrote 1.5TB without crashing. Seems to be fixed. Thanks!
- 05:52 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- This time I couldn't apply your changes to the original Luminous source release so I pulled the entire Git branch and...
- 07:40 PM Bug #20910 (In Progress): spurious MON_DOWN, apparently slow/laggy mon
- not resolved yet!
- 06:58 PM Bug #21624 (Pending Backport): BlueStore::umount will crash when the BlueStore is opened by start...
- 06:56 PM Bug #21625 (Pending Backport): ceph-kvstore-tool does not call bluestore's umount when exit
- 02:32 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
10/03/2017
- 09:49 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- I've pushed another patch to the same branch.. can you give it a try?
- 09:46 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- From that log I've narrowed the problem down to this line...
- 08:42 PM Bug #21303 (Resolved): rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Thanks!
- 06:40 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- /a/sage-2017-10-03_12:00:34-rados-wip-sage-testing2-2017-10-02-2121-distro-basic-smithi/1698722
- 02:37 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
- I managed to get some debug symbols working....
- 05:41 AM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
- I'm having some trouble making the debug symbols work, (I installed ceph-common-dbg, librbd1-dbg and librados2-dbg to...
- 02:58 AM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
- https://github.com/ceph/ceph/pull/20081
With http://tracker.ceph.com/issues/22069 - 02:58 AM Backport #21650 (Resolved): luminous: buffer_anon leak during deep scrub (on otherwise idle osd)
- https://github.com/ceph/ceph/pull/18227
- 02:57 AM Backport #21636 (Resolved): luminous: ceph-monstore-tool --readable mode doesn't understand FSMap...
- https://github.com/ceph/ceph/pull/18754
10/02/2017
- 11:14 PM Bug #18162 (In Progress): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for ...
- 09:35 PM Bug #21629 (Fix Under Review): interval_map.h: 161: FAILED assert(len > 0)
- *PR*: https://github.com/ceph/ceph/pull/18088
- 09:34 PM Bug #21629: interval_map.h: 161: FAILED assert(len > 0)
- The compare-extent op was beyond the truncated extent of the object. The EC async read code does not handle zero-leng...
- 07:39 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
- ...
- 04:47 PM Bug #21611 (Closed): rename in BlueFS is not atomic
- ceph-kvstore-tool doesn't call umount() of BlueStore.
- 04:12 PM Bug #21625 (Resolved): ceph-kvstore-tool does not call bluestore's umount when exit
- It will not flush dirty log to durable storage and lost some data. for example, user set a KV pair by ceph-kvstore-to...
- 04:03 PM Bug #21624 (Resolved): BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
- ceph-kvstore-tool use `start_kv_only` to mount a BlueStore.
- 01:50 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
- 01:50 PM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df
- 01:24 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- https://github.com/ceph/ceph/pull/18079
- 01:21 PM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
- ...
- 12:21 PM Bug #21614 (Fix Under Review): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/s...
- https://github.com/ceph/ceph/pull/18078
- 03:47 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
- http://pulpito.ceph.com/kchai-2017-10-01_17:38:10-rados-wip-kefu-testing-2017-10-01-2202-distro-basic-mira/1692959/
... - 08:15 AM Backport #21283 (Resolved): luminous: spurious MON_DOWN, apparently slow/laggy mon
- 08:14 AM Backport #21374 (Resolved): luminous: incorrect erasure-code space in command ceph df
- 03:42 AM Bug #21566 (Pending Backport): OSDService::recovery_need_sleep read+updated without locking
10/01/2017
- 09:08 AM Bug #21611 (Closed): rename in BlueFS is not atomic
- I testing repair command, and found that:
1. rocksdb creates new MANIFEST file during repair database, and wants t... - 02:20 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- TSAN unfortunately just caused the OSDs to core dump instantly. I'll see if I can find another way to find threading ...
09/30/2017
- 07:22 AM Bug #21603: rocksdb is using slow crc
- Mark, please let me know if i should update ceph/rocksdb with this fix and pick it up in ceph/ceph if you think we ne...
- 07:20 AM Bug #21603: rocksdb is using slow crc
- https://github.com/facebook/rocksdb/pull/2950
- 06:33 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- While I'm not intimately familiar with threaded programming, I'm okay with general C++. Could you possibly explain wh...
- 03:05 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- No luck. I applied 1918c57c7c6304875501f4f4b04b9c82834395a3 from the aforementioned repo to my copy of the official L...
- 05:31 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- After merged the following pacths, the error did't happend again. You can close the issue. Thanks!
pacth list:
h... - 04:11 AM Bug #21577 (Pending Backport): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
09/29/2017
- 10:36 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks good.
- 09:18 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Ah, found it: https://github.com/ceph/ceph-ci/tree/wip-21470-test
- 09:12 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- I'm not on a Debian or Redhat derivative, is there a Git repository I can get the source from or a tarball you can li...
- 06:54 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Ok, that's kind of embarrassing, I thinkt eh fix is pretty simple. Can you please test out this branch?
wip-21470-... - 06:39 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Can you repeat the fsck with --debug-bluefs 20?
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr ... - 06:11 PM Bug #21382 (Pending Backport): Erasure code recovery should send additional reads if necessary
- 06:08 PM Bug #21603: rocksdb is using slow crc
- Kefu Chai wrote:
> i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fa... - 05:37 PM Bug #21603: rocksdb is using slow crc
- @kefu, that's really elegant work, thanks for the info
Matt - 04:49 PM Bug #21603: rocksdb is using slow crc
- i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fast_CRC32() is always...
- 03:08 PM Bug #21603: rocksdb is using slow crc
- Matt Benjamin wrote:
> Just randomly, is this output just from ceph-osd running under perf?
This is output from m... - 03:00 PM Bug #21603: rocksdb is using slow crc
- Just randomly, is this output just from ceph-osd running under perf?
Matt - 02:42 PM Bug #21603 (Resolved): rocksdb is using slow crc
- ...
- 03:00 PM Bug #21249 (Resolved): Client client.admin marked osd.2 out, after it was down for 1504627577 sec...
- 02:58 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
- 02:38 PM Bug #21566 (Fix Under Review): OSDService::recovery_need_sleep read+updated without locking
- https://github.com/ceph/ceph/pull/18022 should take care of this.
- 12:11 PM Backport #21307 (Resolved): luminous: Client client.admin marked osd.2 out, after it was down for...
- 12:11 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
- 10:43 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- osd.6 remove object "0#2:c4b0339b:::benchmark_data_mira035.xsky.com_17216_object7868:head#" from backfillinfo.objects...
- 03:53 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- ...
- 01:48 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- ...
- 12:01 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- Is this on master?
Shouldn't osd.7 have the 149'793 log entry for the delete, and thus detect the retry as a dupli...
09/28/2017
- 01:30 PM Bug #21417 (Pending Backport): buffer_anon leak during deep scrub (on otherwise idle osd)
- 01:27 PM Bug #21592 (Resolved): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- ...
09/27/2017
- 09:13 PM Bug #21577 (Fix Under Review): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
- https://github.com/ceph/ceph/pull/18005
Marking for backport -- I consider this a bugfix because the mdsmap dumpin... - 06:44 PM Bug #21577 (Resolved): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
- Annoying for anyone wanting to inspect these. I never updated it because I don't think I knew it existed :-)
- 07:52 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
- ok, the problem is that as scrub (or whatever) happens, the bluestore cache is populated, but the attrs weren't in th...
- 07:50 PM Bug #21417 (Fix Under Review): buffer_anon leak during deep scrub (on otherwise idle osd)
- https://github.com/ceph/ceph/pull/18001
- 07:41 PM Bug #21580 (Resolved): osd: stalled recovery ends up in recovery_wait
- With https://github.com/ceph/ceph/pull/17839 a stalled recovery (due to remaining unfound objects) goes back into rec...
- 07:28 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
- [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
Description of problem:
[RFE] Stop OSD's removal... - 02:56 PM Bug #21573 (Resolved): [upgrade] buffer::list ABI broken in luminous release
- A client application that was compiled against a pre-Luminous librados C++ API and therefore utilizing bufferlist wil...
- 01:22 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Another case: http://tracker.ceph.com/issues/21537
- 06:28 AM Bug #21566 (Resolved): OSDService::recovery_need_sleep read+updated without locking
- Unless I'm misreading this, OSD::do_recovery() is invoked from the ShardedOpQueue without holding any locks on global...
- 03:05 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- @Josh do you have time to look at it?
09/26/2017
- 01:34 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
- ...
- 10:37 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- osd7:
91'473 (0'0) modify
151'793 (0'0) error
osd.6
91'473 (0'0) modify
149'793 (91'473) delete - 09:01 AM Bug #21555 (New): src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
- pg 2.3s0 up/acting is [7,0,2]/[6,0,2]
in backfill_toofull state, osd.6 got write op, bc object > last_backfill, an... - 03:30 AM Bug #21338 (Resolved): There is a big risk in function bufferlist::claim_prepend()
- 12:24 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Okay. Assuming sortbitwise is just a messaging scheme (I think it is), we should be safe to change the assert to requ...
- 12:10 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Okay, the one I'm looking at is crashing on pg 126.b7, at epoch 5350. Pool 126 does not presently exist; epoch 5350 (...
09/25/2017
- 09:21 PM Backport #21544 (Resolved): luminous: mon osd feature checks for osdmap flags and require-osd-rel...
- https://github.com/ceph/ceph/pull/18364
- 09:21 PM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
- https://github.com/ceph/ceph/pull/18362
- 05:37 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- ...although even a slow disk shouldn't be long enough for the the heartbeat to time out. :/
- 05:36 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- It looks like a zilli threads are blocked at...
- 05:25 PM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
- [10:22:39] <@sage> it looks like everyone is waiting for log flush.. which is deep in snprintf in the core. can't te...
- 05:23 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
- The log ends 14 minutes prior to the signal, which I imagine is related to #21507....
- 03:47 AM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
- ...
- 02:19 AM Bug #21471 (Pending Backport): mon osd feature checks for osdmap flags and require-osd-release fa...
- 02:15 AM Bug #21474 (Pending Backport): bluestore fsck took 224.778802 seconds to complete which caused "t...
- 02:13 AM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
09/23/2017
- 04:39 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- With a similar but slightly different setup, this same crash happened to me.
Installed via ceph-deploy install --r... - 10:15 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- @Daniel,
Yes.no OSD running on xfs shows the problem in question. I think one of different between the db based o... - 02:25 AM Bug #21382: Erasure code recovery should send additional reads if necessary
- https://github.com/ceph/ceph/pull/17920
- 02:25 AM Bug #21382 (Fix Under Review): Erasure code recovery should send additional reads if necessary
09/22/2017
- 09:49 PM Bug #21511 (Fix Under Review): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::m...
- https://github.com/ceph/ceph/pull/17927
- 06:00 PM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
- ...
- 09:04 PM Bug #21408 (Resolved): osd: "fsck error: free extent 0x2000~2000 intersects allocated blocks"
- 08:43 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
- @Sage,
Pls, would you have the reproducer for this, so I could give it a try and check it out in my environment? ... - 05:51 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- @Wei,
Yes, the log file shows the same error with 12.2.0 build running on. I agree with @Josh and you, it seems t... - 02:45 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- @Sage @Daniel Huang and I use the same cluster. We use xfs insteads of bluefs for some osds in our cluster, the issue...
- 01:54 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Sage Weil wrote:
> Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and a... - 12:51 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- Daniel Oliveira wrote:
> @Wei,
>
> Please, would you mind describing a bit more your environment? Also, how ofte... - 11:36 AM Bug #20871 (Resolved): core dump when bluefs's mkdir returns -EEXIST
- 03:31 AM Bug #20759: mon: valgrind detects a few leaks
- /kchai-2017-09-21_06:22:45-rados-wip-kefu-testing-2017-09-21-1013-distro-basic-mira/1654844/remote/mira038/log/valgri...
- 03:13 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
- ...
- 03:06 AM Bug #21474 (Fix Under Review): bluestore fsck took 224.778802 seconds to complete which caused "t...
- https://github.com/ceph/ceph/pull/17902
09/21/2017
- 11:00 PM Bug #21382 (In Progress): Erasure code recovery should send additional reads if necessary
- 09:50 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Done. Took less than an hour and happened on two OSDs. Uploaded one of them:
ceph-post-file: 6e0ed6ab-1528-428d-aa... - 08:26 PM Bug #21470 (Need More Info): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after...
- Okay, thanks for confirmation that the #21171 fix is applied. can you reproduce with debug bluestore = 20, and then ...
- 08:25 PM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
- 08:08 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Got a report of this happening in downstream Red Hat packages at https://bugzilla.redhat.com/show_bug.cgi?id=1494238
... - 08:02 PM Bug #21496 (Fix Under Review): doc: Manually editing a CRUSH map, Word 'type' missing.
- http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/
In the section "CRUSH map rules", in the overvi... - 07:59 PM Bug #21303 (Need More Info): rocksdb get a error: "Compaction error: Corruption: block checksum m...
- Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and attach the output?
... - 04:59 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- @Wei,
Please, would you mind describing a bit more your environment? Also, how often does it happen? Can we repro... - 06:11 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- This issue can reproduce in our cluster, we are willing to give more information if you need.
- 07:36 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
- I'm going to assume this was #21171
- 06:48 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
- definitely happens from an ec pool.
- 04:03 PM Bug #21410 (Resolved): pg_upmap_items can duplicate an item
- 02:45 PM Bug #21410 (Pending Backport): pg_upmap_items can duplicate an item
- 04:02 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
- ...
- 04:07 AM Backport #21465 (In Progress): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" e...
- 04:05 AM Backport #21438 (In Progress): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
- 04:03 AM Backport #21343 (In Progress): luminous: DNS SRV default service name not used anymore
- 04:01 AM Backport #21307 (In Progress): luminous: Client client.admin marked osd.2 out, after it was down ...
09/20/2017
- 08:37 PM Bug #21428: luminous: osd: does not request latest map from mon
- Fix:
* master https://github.com/ceph/ceph/pull/17828
* luminous https://github.com/ceph/ceph/pull/17829 - 03:15 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
- 05:17 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
- fixing bug in the patch
- 04:39 PM Bug #21408 (Fix Under Review): osd: "fsck error: free extent 0x2000~2000 intersects allocated blo...
- https://github.com/ceph/ceph/pull/17845
- 04:03 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
- potentially a bug in bluefs
- 03:53 PM Bug #21407: backoff causes out of order op
- 03:46 PM Bug #20924: osd: leaked Session on osd.7
- /a/yuriw-2017-09-19_19:54:13-rados-wip-yuri-testing3-2017-09-19-1710-distro-basic-smithi/1648800
osd.7 again! weird - 03:01 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
- cool. will update the test.
- 12:14 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
- Sigh.. yeah. I can't decide if we should stop doing these fsck's entirely, or reduce the debug level just for fsck, ...
- 05:30 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
- Sage, if you believe that it's normal for bluestore to take around 4 minutes to complete a deep fsck. i will prolong ...
- 05:28 AM Bug #21474 (Resolved): bluestore fsck took 224.778802 seconds to complete which caused "timed out...
- /a/kchai-2017-09-19_14:50:44-rados-wip-kefu-testing-2017-09-19-1954-distro-basic-mira/1648644...
- 11:19 AM Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping req...
- Seems, its a duplicate of this tracker http://tracker.ceph.com/issues/21180 . Please verify..
- 11:18 AM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
- ~~~
2017-09-18 14:51:59.895746 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068824 seconds old... - 10:25 AM Bug #21471 (In Progress): mon osd feature checks for osdmap flags and require-osd-release fail if...
- https://github.com/ceph/ceph/pull/17831
- 02:29 AM Bug #21471 (Resolved): mon osd feature checks for osdmap flags and require-osd-release fail if 0 ...
- the various checks test get_up_osd_features() but that returns 0 if no osds are up.
needs to be fixed in luminous ... - 02:19 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
- Oh, forgot to add that I've tried the workarounds on the related issues. Adding this to my ceph.conf makes no differe...
- 02:16 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
- This is a copy of http://tracker.ceph.com/issues/21314, which was marked as resolved. It's not resolved after applyin...
09/19/2017
- 11:46 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
- backport was https://github.com/ceph/ceph/pull/17796
- 07:24 AM Bug #21428 (Fix Under Review): luminous: osd: does not request latest map from mon
- https://github.com/ceph/ceph/pull/17795
- 02:02 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
- 12:16 AM Bug #21428: luminous: osd: does not request latest map from mon
- I think this is from the fast_dispatch refactor in luminous, and the latest test timing just happened to show it.
- 12:12 AM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
- On the current luminous branch, a couple tests saw slow requests > 1 hour due to ops waiting for maps.
One is /a/y... - 08:25 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
- https://github.com/ceph/ceph/pull/17865
- 06:01 PM Bug #20944 (Pending Backport): OSD metadata 'backend_filestore_dev_node' is "unknown" even for si...
- 11:36 AM Backport #21438 (Resolved): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
- https://github.com/ceph/ceph/pull/17864
- 08:20 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
- I had to delete affected pool to reclaim occupied space so I am unable to verify any fixes
- 03:31 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
- duplicate issue: http://tracker.ceph.com/issues/16279
Also available in: Atom