Project

General

Profile

Activity

From 09/19/2017 to 10/18/2017

10/18/2017

11:30 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts

In the context of the newly created PGs:
pg[10.5a5s3( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0...
David Zafman
09:12 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
... Greg Farnum
09:46 PM Bug #21825: OSD won't stay online and crashes with abort
I had a chance to try and rm osd 3 today and replace the hard disk with a new one, no crash so far, it is rebalancing... Jérôme Poulin
06:26 AM Bug #21825: OSD won't stay online and crashes with abort
I think there is more to this, after active+clean, I shutdown osd.3 and then the PG went active+clean+snaptrim then o... Jérôme Poulin
05:11 AM Bug #21825: OSD won't stay online and crashes with abort
After tempering around with OSD kill and starting many, marking lost and unfound, I finally was able to recover all b... Jérôme Poulin
04:26 AM Bug #21825: OSD won't stay online and crashes with abort
You should bump up the OSD logging to see more of what is happening. David Zafman
03:33 AM Bug #21825 (Closed): OSD won't stay online and crashes with abort
I have an issue where 2 OSDs can't stay up at the same time and one will crash the other causing down PGs,
Exporti...
Jérôme Poulin
05:36 PM Bug #20243: Improve size scrub error handling and ignore system attrs in xattr checking

If we wanted to backport to Jewel it would be helpful to include this pull request first.
https://github.com/cep...
David Zafman

10/17/2017

09:28 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
... Sage Weil
08:41 PM Bug #21573: [upgrade] buffer::list ABI broken in luminous release
@Kefu can you pls take a look? Yuri Weinstein
08:40 PM Backport #21544 (Fix Under Review): luminous: mon osd feature checks for osdmap flags and require...
Anonymous
08:20 PM Backport #21544 (In Progress): luminous: mon osd feature checks for osdmap flags and require-osd-...
Anonymous
07:03 PM Backport #21543 (Fix Under Review): luminous: bluestore fsck took 224.778802 seconds to complete ...
Anonymous
06:58 PM Backport #21543 (In Progress): luminous: bluestore fsck took 224.778802 seconds to complete which...
Anonymous
06:40 PM Feature #21760: add tools to stress RADOS omap
https://github.com/ceph/ceph/pull/18361 Douglas Fuller
05:29 PM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
Chang Liu
12:41 PM Bug #19198 (Closed): Bluestore doubles mem usage when caching object content
I talked to Igor. It seems this is really is a non-bug, as the UTs use the glibc allocator. A follow-up will be to us... Mohamad Gebai
04:56 AM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
... Kefu Chai
02:50 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
same problem: http://tracker.ceph.com/issues/21174 huang jun

10/16/2017

11:29 PM Bug #20981: ./run_seed_to_range.sh errored out
I was never able to reproduce this with the following command line test.
rm -rf /tmp/td td ; mkdir /tmp/td td ; cd...
David Zafman
09:12 PM Bug #18162 (Fix Under Review): osd/ReplicatedPG.cc: recover_replicas: object added to missing set...
https://github.com/ceph/ceph/pull/18145 David Zafman
06:41 AM Bug #20053: crush compile / decompile looses precision on weight
Loïc Dachary

10/13/2017

08:48 PM Bug #21750 (In Progress): scrub stat mismatch on bytes
Sage Weil
08:48 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
Sage Weil
05:15 PM Bug #21716 (Pending Backport): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu Chai
05:15 PM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu Chai
12:13 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
Nathan Cutler
12:13 PM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
https://github.com/ceph/ceph/pull/21184 Nathan Cutler
12:13 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
Nathan Cutler
12:13 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
https://github.com/ceph/ceph/pull/21158 Nathan Cutler
12:12 PM Backport #21783 (Resolved): luminous: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
https://github.com/ceph/ceph/pull/18398 Nathan Cutler
04:11 AM Bug #21603 (Resolved): rocksdb is using slow crc
Kefu Chai
03:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
David Zafman

10/12/2017

08:08 PM Bug #21737 (Pending Backport): OSDMap cache assert on shutdown
Greg Farnum
05:32 PM Feature #21760 (In Progress): add tools to stress RADOS omap
Douglas Fuller
04:16 PM Bug #21750: scrub stat mismatch on bytes
http://pulpito.front.sepia.ceph.com/yuriw-2017-10-11_19:25:41-rados-wip-yuri3-testing-2017-10-11-1645-distro-basic-sm... Sage Weil
08:26 AM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
i have also met this problem when testing pull out disk and insert; ceph version 0.94.5,according @huang jun's osd lo... mingyue zhao
05:01 AM Bug #21603 (Fix Under Review): rocksdb is using slow crc
https://github.com/ceph/ceph/pull/18262 Kefu Chai
04:43 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
... Kefu Chai
12:46 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Kefu, thanks for fixing this. Can you also indicate which of the mentioned PRs need to be backported to fix the test ... Nathan Cutler

10/11/2017

09:49 PM Bug #21766: os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec + compress...
problem seems to be that the unsharing code isn't handling compressed extents properly.
https://github.com/ceph/ce...
Sage Weil
09:47 PM Bug #21766 (Resolved): os/bluestore/bluestore_types.h: 740: FAILED assert(p != extents.end()) (ec...
... Sage Weil
05:20 PM Bug #21331 (Resolved): pg recovery priority inversion
https://github.com/ceph/ceph/pull/18025 is luminous backport
Sage Weil
05:19 PM Bug #21417 (Resolved): buffer_anon leak during deep scrub (on otherwise idle osd)
Sage Weil
01:49 PM Feature #21760: add tools to stress RADOS omap
I had a discussion with Douglas and in the current implementation, we can enhance following points:
1. Adding --he...
Vikhyat Umrao
01:37 PM Feature #21760 (In Progress): add tools to stress RADOS omap
Add the tools omap_create and omap_delete to stress the RADOS object map directly. Douglas Fuller
01:45 PM Bug #21758 (Pending Backport): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
Sage Weil
09:51 AM Bug #21758 (Fix Under Review): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
https://github.com/ceph/ceph/pull/18242 Kefu Chai
09:49 AM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
... Kefu Chai
09:37 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
https://github.com/ceph/ceph/pull/18241 huang jun
06:13 AM Bug #21756: /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i.first)...
comment out in ceph.conf
#osd copyfrom max chunk = 524288
if we use this config, it works fine.
but if we comment ...
huang jun
06:01 AM Bug #21756 (New): /usr/src/ceph/src/osd/ECTransaction.h: 179: FAILED assert(plan.to_read.count(i....
steps to reproduce:... huang jun
08:09 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
https://github.com/ceph/ceph/pull/18240 Kefu Chai
07:50 AM Bug #21757 (New): snapshotted RBD objects can't be automatically evicted from a cache tier when c...
[enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64
[procedure to ...
Xiaojun Liao
02:26 AM Bug #21750: scrub stat mismatch on bytes
/a/sage-2017-10-10_20:19:10-rados-wip-sage-testing2-2017-10-10-1320-distro-basic-smithi/1723818
rados/thrash/{0-size...
Sage Weil

10/10/2017

06:17 PM Bug #21407 (Pending Backport): backoff causes out of order op
Sage Weil
01:50 PM Bug #21750 (Resolved): scrub stat mismatch on bytes
... Sage Weil
01:32 PM Bug #21744 (Fix Under Review): Core when `ceph-kvstore-tool exists`
https://github.com/ceph/ceph/pull/16745/commits/46bbd32fad14579f9260765a0cb9bcfe0ba7defa Chang Liu
09:10 AM Bug #21744 (Resolved): Core when `ceph-kvstore-tool exists`
http://pulpito.ceph.com/sage-2017-10-09_22:17:19-rados-wip-sage-testing2-2017-10-09-1528-distro-basic-smithi/1718563/... Chang Liu

10/09/2017

09:09 PM Bug #21737 (Fix Under Review): OSDMap cache assert on shutdown
https://github.com/ceph/ceph/pull/18201 Greg Farnum
08:19 PM Bug #21737 (Resolved): OSDMap cache assert on shutdown
We don't want users to hit asserts if we've leaked memory references on shutdown. For instance:... Greg Farnum
08:44 PM Feature #18206 (Resolved): osd: osd_scrub_during_recovery only considers primary, not replicas
Nathan Cutler
08:43 PM Backport #21117 (Resolved): jewel: osd: osd_scrub_during_recovery only considers primary, not rep...
Nathan Cutler
05:01 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
Kefu Chai
12:25 PM Documentation #21733 (In Progress): OSD-Config-ref(osd max object size) section malformed
https://github.com/ceph/ceph/pull/18188 Jos Collin
12:09 PM Documentation #21733 (Resolved): OSD-Config-ref(osd max object size) section malformed
Syntax error in
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
at section osd max object...
Joshua Schmid
11:21 AM Bug #21717 (Resolved): doc fails build with latest breathe
Kefu Chai
11:21 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
Kefu Chai
06:44 AM Bug #21721 (Can't reproduce): ceph pg force-backfill cmd failed with ENOENT error
Command failed on mira025 with status 2: u'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage t... huang jun

10/08/2017

04:31 PM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
Kefu Chai
08:13 AM Backport #21719 (In Progress): luminous: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18167 Kefu Chai
08:11 AM Backport #21719 (Resolved): luminous: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18167 Kefu Chai
08:15 AM Bug #21717: doc fails build with latest breathe
recently breathe introduced a change not compatible with old sphinx, see https://github.com/michaeljones/breathe/comm... Kefu Chai
08:09 AM Bug #21717 (Pending Backport): doc fails build with latest breathe
https://github.com/ceph/ceph/pull/17025 Kefu Chai
08:09 AM Bug #21717 (Resolved): doc fails build with latest breathe
... Kefu Chai
08:10 AM Backport #21718 (In Progress): jewel: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18166 Kefu Chai
08:09 AM Backport #21718 (Resolved): jewel: doc fails build with latest breathe
https://github.com/ceph/ceph/pull/18166 Kefu Chai
07:46 AM Bug #21716 (Fix Under Review): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
-https://github.com/ceph/ceph/pull/17550- Kefu Chai
07:42 AM Bug #21716: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
https://github.com/ceph/ceph/pull/17313 might be relevant. Kefu Chai
07:41 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
... Kefu Chai
05:32 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
i suspected that btrfs somehow failed to handle the ioctl(BTRFS_IOC_CLONE_RANGE) call. but i checked linux kernel of ... Kefu Chai
04:20 AM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
David, sorry for the latency. yeah, it is causing test failures. the errno is 95 (Operation not supported), -it's not... Kefu Chai

10/06/2017

08:18 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
fast-tracking the backport, since it's already open Nathan Cutler
07:49 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks...
Yuri Weinstein
02:01 AM Bug #20416 (Resolved): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Sage Weil
07:46 PM Bug #19300 (Can't reproduce): "Segmentation fault ceph_test_objectstore --gtest_filter=-*/3"
Sage Weil
07:36 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
@sage is this just a matter to execute "/usr/bin/rbd ls" line at some point of a tests? I'd be happy to add this. P... Yuri Weinstein
05:15 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
@Yuri, @Sage - I guess the upgrade/kraken-x suite did not catch this because it does not do "/usr/bin/rbd ls" ? Nathan Cutler
01:17 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Much appreciated! Sarah Brofeldt
12:39 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Sarah, the fix is in the current luminous branch now. Once it builds (~1 hrs), you can install the packages from htt... Sage Weil
12:39 PM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
Sage Weil
05:48 PM Feature #21710 (New): add wildcard for namespaces
implement * wildcard to allow access to namespaces starting with a given string
allow rw namespace=cephfs_a*
wo...
Douglas Fuller
12:39 PM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
Sage Weil
03:22 AM Backport #21692 (In Progress): luminous: Kraken client crash after upgrading cluster from Kraken ...
Nathan Cutler
03:18 AM Backport #21692 (Resolved): luminous: Kraken client crash after upgrading cluster from Kraken to ...
https://github.com/ceph/ceph/pull/18140 Nathan Cutler
03:21 AM Backport #21702 (Resolved): luminous: BlueStore::umount will crash when the BlueStore is opened b...
https://github.com/ceph/ceph/pull/18750 Nathan Cutler
03:21 AM Backport #21701 (Resolved): luminous: ceph-kvstore-tool does not call bluestore's umount when exit
https://github.com/ceph/ceph/pull/18751 Nathan Cutler
03:21 AM Bug #21625: ceph-kvstore-tool does not call bluestore's umount when exit
https://github.com/ceph/ceph/pull/18083 Nathan Cutler
03:20 AM Bug #21624: BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
https://github.com/ceph/ceph/pull/18082 Nathan Cutler
03:18 AM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
https://github.com/ceph/ceph/pull/18753 Nathan Cutler
03:18 AM Backport #21693 (Resolved): luminous: interval_map.h: 161: FAILED assert(len > 0)
https://github.com/ceph/ceph/pull/18413 Nathan Cutler
02:02 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
Sage Weil
02:00 AM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
... Sage Weil

10/05/2017

10:33 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
https://github.com/ceph/ceph/pull/18140 backport Sage Weil
10:30 PM Bug #21660 (Pending Backport): Kraken client crash after upgrading cluster from Kraken to Luminous
Greg Farnum
08:27 PM Bug #21660 (Fix Under Review): Kraken client crash after upgrading cluster from Kraken to Luminous
... Sage Weil
04:47 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
fc655d9b-16cd-4342-bf4b-689a3c0d2891 generated on a Luminous client.
On the Kraken client, this results in:
<pr...
Sarah Brofeldt
04:08 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Hi Sarah,
Can you 'ceph osd getmap 308 -o 308' and 'ceph-post-file 308'?
Sage Weil
02:50 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
I wasn't clever enough to save the core file initially, so I've reproduced the issue on a reinstall of Kraken after u... Sarah Brofeldt
06:01 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Yuri's testing it (it will pass), so I went ahead and created a backport PR: https://github.com/ceph/ceph/pull/18132 Greg Farnum
04:17 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
https://github.com/ceph/ceph/pull/18130 Sage Weil
03:05 AM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
Sage Weil
11:59 AM Bug #21470 (Pending Backport): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
https://github.com/ceph/ceph/pull/18127 for the backport Sage Weil
03:04 AM Bug #21629 (Pending Backport): interval_map.h: 161: FAILED assert(len > 0)
Sage Weil

10/04/2017

10:19 PM Bug #21660 (Need More Info): Kraken client crash after upgrading cluster from Kraken to Luminous
Do you still have the core file? I would be very interested in seeing the epoch for the OSDMap that was being decode... Sage Weil
01:10 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
Crash in the messenger layer of librados. Jason Dillaman
09:54 PM Bug #21470 (Fix Under Review): Ceph OSDs crashing in BlueStore::queue_transactions() using EC aft...
https://github.com/ceph/ceph/pull/18118
Thanks, Bob! Please let me know if you see it fail. This should be inclu...
Sage Weil
04:56 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Yep, left it running an entire night and wrote 1.5TB without crashing. Seems to be fixed. Thanks! Bob Bobington
05:52 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
This time I couldn't apply your changes to the original Luminous source release so I pulled the entire Git branch and... Bob Bobington
07:40 PM Bug #20910 (In Progress): spurious MON_DOWN, apparently slow/laggy mon
not resolved yet! Sage Weil
06:58 PM Bug #21624 (Pending Backport): BlueStore::umount will crash when the BlueStore is opened by start...
Sage Weil
06:56 PM Bug #21625 (Pending Backport): ceph-kvstore-tool does not call bluestore's umount when exit
Sage Weil
02:32 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
Kefu Chai

10/03/2017

09:49 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
I've pushed another patch to the same branch.. can you give it a try? Sage Weil
09:46 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
From that log I've narrowed the problem down to this line... Sage Weil
08:42 PM Bug #21303 (Resolved): rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Thanks! Sage Weil
06:40 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
/a/sage-2017-10-03_12:00:34-rados-wip-sage-testing2-2017-10-02-2121-distro-basic-smithi/1698722 Sage Weil
02:37 PM Bug #21660: Kraken client crash after upgrading cluster from Kraken to Luminous
I managed to get some debug symbols working.... Sarah Brofeldt
05:41 AM Bug #21660 (Resolved): Kraken client crash after upgrading cluster from Kraken to Luminous
I'm having some trouble making the debug symbols work, (I installed ceph-common-dbg, librbd1-dbg and librados2-dbg to... Sarah Brofeldt
02:58 AM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
https://github.com/ceph/ceph/pull/20081
With http://tracker.ceph.com/issues/22069
Nathan Cutler
02:58 AM Backport #21650 (Resolved): luminous: buffer_anon leak during deep scrub (on otherwise idle osd)
https://github.com/ceph/ceph/pull/18227 Nathan Cutler
02:57 AM Backport #21636 (Resolved): luminous: ceph-monstore-tool --readable mode doesn't understand FSMap...
https://github.com/ceph/ceph/pull/18754 Nathan Cutler

10/02/2017

11:14 PM Bug #18162 (In Progress): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for ...
David Zafman
09:35 PM Bug #21629 (Fix Under Review): interval_map.h: 161: FAILED assert(len > 0)
*PR*: https://github.com/ceph/ceph/pull/18088 Jason Dillaman
09:34 PM Bug #21629: interval_map.h: 161: FAILED assert(len > 0)
The compare-extent op was beyond the truncated extent of the object. The EC async read code does not handle zero-leng... Jason Dillaman
07:39 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
... Jason Dillaman
04:47 PM Bug #21611 (Closed): rename in BlueFS is not atomic
ceph-kvstore-tool doesn't call umount() of BlueStore. Chang Liu
04:12 PM Bug #21625 (Resolved): ceph-kvstore-tool does not call bluestore's umount when exit
It will not flush dirty log to durable storage and lost some data. for example, user set a KV pair by ceph-kvstore-to... Chang Liu
04:03 PM Bug #21624 (Resolved): BlueStore::umount will crash when the BlueStore is opened by start_kv_only()
ceph-kvstore-tool use `start_kv_only` to mount a BlueStore. Chang Liu
01:50 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
Nathan Cutler
01:50 PM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df
Nathan Cutler
01:24 PM Bug #21618: standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
https://github.com/ceph/ceph/pull/18079 Sage Weil
01:21 PM Bug #21618 (Resolved): standalone/scrub/osd-scrub-repair.sh ambiguous diff failure
... Sage Weil
12:21 PM Bug #21614 (Fix Under Review): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/s...
https://github.com/ceph/ceph/pull/18078 Sage Weil
03:47 AM Bug #21614 (Resolved): "ceph tell osd.* config set osd_recovery_sleep 0" fails in rados/singleton...
http://pulpito.ceph.com/kchai-2017-10-01_17:38:10-rados-wip-kefu-testing-2017-10-01-2202-distro-basic-mira/1692959/
...
Kefu Chai
08:15 AM Backport #21283 (Resolved): luminous: spurious MON_DOWN, apparently slow/laggy mon
Abhishek Lekshmanan
08:14 AM Backport #21374 (Resolved): luminous: incorrect erasure-code space in command ceph df
Abhishek Lekshmanan
03:42 AM Bug #21566 (Pending Backport): OSDService::recovery_need_sleep read+updated without locking
Kefu Chai

10/01/2017

09:08 AM Bug #21611 (Closed): rename in BlueFS is not atomic
I testing repair command, and found that:
1. rocksdb creates new MANIFEST file during repair database, and wants t...
Chang Liu
02:20 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
TSAN unfortunately just caused the OSDs to core dump instantly. I'll see if I can find another way to find threading ... Bob Bobington

09/30/2017

07:22 AM Bug #21603: rocksdb is using slow crc
Mark, please let me know if i should update ceph/rocksdb with this fix and pick it up in ceph/ceph if you think we ne... Kefu Chai
07:20 AM Bug #21603: rocksdb is using slow crc
https://github.com/facebook/rocksdb/pull/2950 Kefu Chai
06:33 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
While I'm not intimately familiar with threaded programming, I'm okay with general C++. Could you possibly explain wh... Bob Bobington
03:05 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
No luck. I applied 1918c57c7c6304875501f4f4b04b9c82834395a3 from the aforementioned repo to my copy of the official L... Bob Bobington
05:31 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
After merged the following pacths, the error did't happend again. You can close the issue. Thanks!

pacth list:
h...
黄 维
04:11 AM Bug #21577 (Pending Backport): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Kefu Chai

09/29/2017

10:36 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
https://github.com/ceph/ceph/pull/18047 for the fix. I'll backport it to Luminous if that looks good. Greg Farnum
09:18 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Ah, found it: https://github.com/ceph/ceph-ci/tree/wip-21470-test Bob Bobington
09:12 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
I'm not on a Debian or Redhat derivative, is there a Git repository I can get the source from or a tarball you can li... Bob Bobington
06:54 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Ok, that's kind of embarrassing, I thinkt eh fix is pretty simple. Can you please test out this branch?
wip-21470-...
Sage Weil
06:39 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Can you repeat the fsck with --debug-bluefs 20?
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr ...
Sage Weil
06:11 PM Bug #21382 (Pending Backport): Erasure code recovery should send additional reads if necessary
David Zafman
06:08 PM Bug #21603: rocksdb is using slow crc
Kefu Chai wrote:
> i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fa...
Mark Nelson
05:37 PM Bug #21603: rocksdb is using slow crc
@kefu, that's really elegant work, thanks for the info
Matt
Matt Benjamin
04:49 PM Bug #21603: rocksdb is using slow crc
i set a breakpoint in Fast_CRC32() and Slow_CRC32() when debugging ceph-mon, the breakpoint in Fast_CRC32() is always... Kefu Chai
03:08 PM Bug #21603: rocksdb is using slow crc
Matt Benjamin wrote:
> Just randomly, is this output just from ceph-osd running under perf?
This is output from m...
Mark Nelson
03:00 PM Bug #21603: rocksdb is using slow crc
Just randomly, is this output just from ceph-osd running under perf?
Matt
Matt Benjamin
02:42 PM Bug #21603 (Resolved): rocksdb is using slow crc
... Sage Weil
03:00 PM Bug #21249 (Resolved): Client client.admin marked osd.2 out, after it was down for 1504627577 sec...
Nathan Cutler
02:58 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
Nathan Cutler
02:38 PM Bug #21566 (Fix Under Review): OSDService::recovery_need_sleep read+updated without locking
https://github.com/ceph/ceph/pull/18022 should take care of this. Neha Ojha
12:11 PM Backport #21307 (Resolved): luminous: Client client.admin marked osd.2 out, after it was down for...
Sage Weil
12:11 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
Sage Weil
10:43 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
osd.6 remove object "0#2:c4b0339b:::benchmark_data_mira035.xsky.com_17216_object7868:head#" from backfillinfo.objects... huang jun
03:53 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
... huang jun
01:48 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
... huang jun
12:01 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
Is this on master?
Shouldn't osd.7 have the 149'793 log entry for the delete, and thus detect the retry as a dupli...
Josh Durgin

09/28/2017

01:30 PM Bug #21417 (Pending Backport): buffer_anon leak during deep scrub (on otherwise idle osd)
Sage Weil
01:27 PM Bug #21592 (Resolved): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
... Sage Weil

09/27/2017

09:13 PM Bug #21577 (Fix Under Review): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
https://github.com/ceph/ceph/pull/18005
Marking for backport -- I consider this a bugfix because the mdsmap dumpin...
John Spray
06:44 PM Bug #21577 (Resolved): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Annoying for anyone wanting to inspect these. I never updated it because I don't think I knew it existed :-) John Spray
07:52 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
ok, the problem is that as scrub (or whatever) happens, the bluestore cache is populated, but the attrs weren't in th... Sage Weil
07:50 PM Bug #21417 (Fix Under Review): buffer_anon leak during deep scrub (on otherwise idle osd)
https://github.com/ceph/ceph/pull/18001 Sage Weil
07:41 PM Bug #21580 (Resolved): osd: stalled recovery ends up in recovery_wait
With https://github.com/ceph/ceph/pull/17839 a stalled recovery (due to remaining unfound objects) goes back into rec... Sage Weil
07:28 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
[RFE] Stop OSD's removal if the OSD's are part of inactive PGs
Description of problem:
[RFE] Stop OSD's removal...
Vikhyat Umrao
02:56 PM Bug #21573 (Resolved): [upgrade] buffer::list ABI broken in luminous release
A client application that was compiled against a pre-Luminous librados C++ API and therefore utilizing bufferlist wil... Jason Dillaman
01:22 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
Another case: http://tracker.ceph.com/issues/21537 Jason Dillaman
06:28 AM Bug #21566 (Resolved): OSDService::recovery_need_sleep read+updated without locking
Unless I'm misreading this, OSD::do_recovery() is invoked from the ShardedOpQueue without holding any locks on global... Greg Farnum
03:05 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
@Josh do you have time to look at it? huang jun

09/26/2017

01:34 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
... Sage Weil
10:37 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
osd7:
91'473 (0'0) modify
151'793 (0'0) error
osd.6
91'473 (0'0) modify
149'793 (91'473) delete
huang jun
09:01 AM Bug #21555 (New): src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
pg 2.3s0 up/acting is [7,0,2]/[6,0,2]
in backfill_toofull state, osd.6 got write op, bc object > last_backfill, an...
huang jun
03:30 AM Bug #21338 (Resolved): There is a big risk in function bufferlist::claim_prepend()
Kefu Chai
12:24 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Okay. Assuming sortbitwise is just a messaging scheme (I think it is), we should be safe to change the assert to requ... Greg Farnum
12:10 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Okay, the one I'm looking at is crashing on pg 126.b7, at epoch 5350. Pool 126 does not presently exist; epoch 5350 (... Greg Farnum

09/25/2017

09:21 PM Backport #21544 (Resolved): luminous: mon osd feature checks for osdmap flags and require-osd-rel...
https://github.com/ceph/ceph/pull/18364 Nathan Cutler
09:21 PM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
https://github.com/ceph/ceph/pull/18362 Nathan Cutler
05:37 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
...although even a slow disk shouldn't be long enough for the the heartbeat to time out. :/ Sage Weil
05:36 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
It looks like a zilli threads are blocked at... Sage Weil
05:25 PM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
[10:22:39] <@sage> it looks like everyone is waiting for log flush.. which is deep in snprintf in the core. can't te... Greg Farnum
05:23 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
The log ends 14 minutes prior to the signal, which I imagine is related to #21507.... Greg Farnum
03:47 AM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
... Patrick Donnelly
02:19 AM Bug #21471 (Pending Backport): mon osd feature checks for osdmap flags and require-osd-release fa...
Sage Weil
02:15 AM Bug #21474 (Pending Backport): bluestore fsck took 224.778802 seconds to complete which caused "t...
Sage Weil
02:13 AM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
Sage Weil

09/23/2017

04:39 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
With a similar but slightly different setup, this same crash happened to me.
Installed via ceph-deploy install --r...
Roy Hooper
10:15 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Daniel,
Yes.no OSD running on xfs shows the problem in question. I think one of different between the db based o...
wei qiaomiao
02:25 AM Bug #21382: Erasure code recovery should send additional reads if necessary
https://github.com/ceph/ceph/pull/17920 David Zafman
02:25 AM Bug #21382 (Fix Under Review): Erasure code recovery should send additional reads if necessary
David Zafman

09/22/2017

09:49 PM Bug #21511 (Fix Under Review): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::m...
https://github.com/ceph/ceph/pull/17927 Sage Weil
06:00 PM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
... Sage Weil
09:04 PM Bug #21408 (Resolved): osd: "fsck error: free extent 0x2000~2000 intersects allocated blocks"
Sage Weil
08:43 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
@Sage,
Pls, would you have the reproducer for this, so I could give it a try and check it out in my environment? ...
Daniel Oliveira
05:51 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Wei,
Yes, the log file shows the same error with 12.2.0 build running on. I agree with @Josh and you, it seems t...
Daniel Oliveira
02:45 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Sage @Daniel Huang and I use the same cluster. We use xfs insteads of bluefs for some osds in our cluster, the issue... wei qiaomiao
01:54 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Sage Weil wrote:
> Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and a...
黄 维
12:51 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Daniel Oliveira wrote:
> @Wei,
>
> Please, would you mind describing a bit more your environment? Also, how ofte...
wei qiaomiao
11:36 AM Bug #20871 (Resolved): core dump when bluefs's mkdir returns -EEXIST
Chang Liu
03:31 AM Bug #20759: mon: valgrind detects a few leaks
/kchai-2017-09-21_06:22:45-rados-wip-kefu-testing-2017-09-21-1013-distro-basic-mira/1654844/remote/mira038/log/valgri... Kefu Chai
03:13 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
... Kefu Chai
03:06 AM Bug #21474 (Fix Under Review): bluestore fsck took 224.778802 seconds to complete which caused "t...
https://github.com/ceph/ceph/pull/17902 Kefu Chai

09/21/2017

11:00 PM Bug #21382 (In Progress): Erasure code recovery should send additional reads if necessary
David Zafman
09:50 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Done. Took less than an hour and happened on two OSDs. Uploaded one of them:
ceph-post-file: 6e0ed6ab-1528-428d-aa...
Bob Bobington
08:26 PM Bug #21470 (Need More Info): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after...
Okay, thanks for confirmation that the #21171 fix is applied. can you reproduce with debug bluestore = 20, and then ... Sage Weil
08:25 PM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
Sage Weil
08:08 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Got a report of this happening in downstream Red Hat packages at https://bugzilla.redhat.com/show_bug.cgi?id=1494238
...
Greg Farnum
08:02 PM Bug #21496 (Fix Under Review): doc: Manually editing a CRUSH map, Word 'type' missing.
http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/
In the section "CRUSH map rules", in the overvi...
Anonymous
07:59 PM Bug #21303 (Need More Info): rocksdb get a error: "Compaction error: Corruption: block checksum m...
Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and attach the output?
...
Sage Weil
04:59 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Wei,
Please, would you mind describing a bit more your environment? Also, how often does it happen? Can we repro...
Daniel Oliveira
06:11 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
This issue can reproduce in our cluster, we are willing to give more information if you need. wei qiaomiao
07:36 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
I'm going to assume this was #21171 Sage Weil
06:48 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
definitely happens from an ec pool. Sage Weil
04:03 PM Bug #21410 (Resolved): pg_upmap_items can duplicate an item
Sage Weil
02:45 PM Bug #21410 (Pending Backport): pg_upmap_items can duplicate an item
Sage Weil
04:02 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
... Sage Weil
04:07 AM Backport #21465 (In Progress): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" e...
Nathan Cutler
04:05 AM Backport #21438 (In Progress): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
Nathan Cutler
04:03 AM Backport #21343 (In Progress): luminous: DNS SRV default service name not used anymore
Nathan Cutler
04:01 AM Backport #21307 (In Progress): luminous: Client client.admin marked osd.2 out, after it was down ...
Nathan Cutler

09/20/2017

08:37 PM Bug #21428: luminous: osd: does not request latest map from mon
Fix:
* master https://github.com/ceph/ceph/pull/17828
* luminous https://github.com/ceph/ceph/pull/17829
Nathan Cutler
03:15 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
Josh Durgin
05:17 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
fixing bug in the patch Josh Durgin
04:39 PM Bug #21408 (Fix Under Review): osd: "fsck error: free extent 0x2000~2000 intersects allocated blo...
https://github.com/ceph/ceph/pull/17845 Sage Weil
04:03 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
potentially a bug in bluefs Josh Durgin
03:53 PM Bug #21407: backoff causes out of order op
Josh Durgin
03:46 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2017-09-19_19:54:13-rados-wip-yuri-testing3-2017-09-19-1710-distro-basic-smithi/1648800
osd.7 again! weird
Sage Weil
03:01 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
cool. will update the test. Kefu Chai
12:14 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
Sigh.. yeah. I can't decide if we should stop doing these fsck's entirely, or reduce the debug level just for fsck, ... Sage Weil
05:30 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
Sage, if you believe that it's normal for bluestore to take around 4 minutes to complete a deep fsck. i will prolong ... Kefu Chai
05:28 AM Bug #21474 (Resolved): bluestore fsck took 224.778802 seconds to complete which caused "timed out...
/a/kchai-2017-09-19_14:50:44-rados-wip-kefu-testing-2017-09-19-1954-distro-basic-mira/1648644... Kefu Chai
11:19 AM Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping req...
Seems, its a duplicate of this tracker http://tracker.ceph.com/issues/21180 . Please verify.. Nokia ceph-users
11:18 AM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
~~~
2017-09-18 14:51:59.895746 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068824 seconds old...
Nokia ceph-users
10:25 AM Bug #21471 (In Progress): mon osd feature checks for osdmap flags and require-osd-release fail if...
https://github.com/ceph/ceph/pull/17831 Brad Hubbard
02:29 AM Bug #21471 (Resolved): mon osd feature checks for osdmap flags and require-osd-release fail if 0 ...
the various checks test get_up_osd_features() but that returns 0 if no osds are up.
needs to be fixed in luminous ...
Sage Weil
02:19 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Oh, forgot to add that I've tried the workarounds on the related issues. Adding this to my ceph.conf makes no differe... Bob Bobington
02:16 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
This is a copy of http://tracker.ceph.com/issues/21314, which was marked as resolved. It's not resolved after applyin... Bob Bobington

09/19/2017

11:46 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
backport was https://github.com/ceph/ceph/pull/17796 Josh Durgin
07:24 AM Bug #21428 (Fix Under Review): luminous: osd: does not request latest map from mon
https://github.com/ceph/ceph/pull/17795 Josh Durgin
02:02 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
Josh Durgin
12:16 AM Bug #21428: luminous: osd: does not request latest map from mon
I think this is from the fast_dispatch refactor in luminous, and the latest test timing just happened to show it. Josh Durgin
12:12 AM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
On the current luminous branch, a couple tests saw slow requests > 1 hour due to ops waiting for maps.
One is /a/y...
Josh Durgin
08:25 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
https://github.com/ceph/ceph/pull/17865 Nathan Cutler
06:01 PM Bug #20944 (Pending Backport): OSD metadata 'backend_filestore_dev_node' is "unknown" even for si...
Sage Weil
11:36 AM Backport #21438 (Resolved): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
https://github.com/ceph/ceph/pull/17864 Nathan Cutler
08:20 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
I had to delete affected pool to reclaim occupied space so I am unable to verify any fixes Henrik Korkuc
03:31 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
duplicate issue: http://tracker.ceph.com/issues/16279 huang jun
 

Also available in: Atom