Project

General

Profile

Activity

From 08/28/2017 to 09/26/2017

09/26/2017

01:34 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
... Sage Weil
10:37 AM Bug #21555: src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
osd7:
91'473 (0'0) modify
151'793 (0'0) error
osd.6
91'473 (0'0) modify
149'793 (91'473) delete
huang jun
09:01 AM Bug #21555 (New): src/osd/PGLog.h: 1455: FAILED assert(miter != missing.get_items().end())
pg 2.3s0 up/acting is [7,0,2]/[6,0,2]
in backfill_toofull state, osd.6 got write op, bc object > last_backfill, an...
huang jun
03:30 AM Bug #21338 (Resolved): There is a big risk in function bufferlist::claim_prepend()
Kefu Chai
12:24 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Okay. Assuming sortbitwise is just a messaging scheme (I think it is), we should be safe to change the assert to requ... Greg Farnum
12:10 AM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Okay, the one I'm looking at is crashing on pg 126.b7, at epoch 5350. Pool 126 does not presently exist; epoch 5350 (... Greg Farnum

09/25/2017

09:21 PM Backport #21544 (Resolved): luminous: mon osd feature checks for osdmap flags and require-osd-rel...
https://github.com/ceph/ceph/pull/18364 Nathan Cutler
09:21 PM Backport #21543 (Resolved): luminous: bluestore fsck took 224.778802 seconds to complete which ca...
https://github.com/ceph/ceph/pull/18362 Nathan Cutler
05:37 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
...although even a slow disk shouldn't be long enough for the the heartbeat to time out. :/ Sage Weil
05:36 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
It looks like a zilli threads are blocked at... Sage Weil
05:25 PM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
[10:22:39] <@sage> it looks like everyone is waiting for log flush.. which is deep in snprintf in the core. can't te... Greg Farnum
05:23 PM Bug #21532: osd: Abort in thread_name:tp_osd_tp
The log ends 14 minutes prior to the signal, which I imagine is related to #21507.... Greg Farnum
03:47 AM Bug #21532 (Need More Info): osd: Abort in thread_name:tp_osd_tp
... Patrick Donnelly
02:19 AM Bug #21471 (Pending Backport): mon osd feature checks for osdmap flags and require-osd-release fa...
Sage Weil
02:15 AM Bug #21474 (Pending Backport): bluestore fsck took 224.778802 seconds to complete which caused "t...
Sage Weil
02:13 AM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
Sage Weil

09/23/2017

04:39 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
With a similar but slightly different setup, this same crash happened to me.
Installed via ceph-deploy install --r...
Roy Hooper
10:15 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Daniel,
Yes.no OSD running on xfs shows the problem in question. I think one of different between the db based o...
wei qiaomiao
02:25 AM Bug #21382: Erasure code recovery should send additional reads if necessary
https://github.com/ceph/ceph/pull/17920 David Zafman
02:25 AM Bug #21382 (Fix Under Review): Erasure code recovery should send additional reads if necessary
David Zafman

09/22/2017

09:49 PM Bug #21511 (Fix Under Review): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::m...
https://github.com/ceph/ceph/pull/17927 Sage Weil
06:00 PM Bug #21511 (Resolved): rados/standalone/scrub.yaml: can't decode 'snapset' attr buffer::malformed...
... Sage Weil
09:04 PM Bug #21408 (Resolved): osd: "fsck error: free extent 0x2000~2000 intersects allocated blocks"
Sage Weil
08:43 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
@Sage,
Pls, would you have the reproducer for this, so I could give it a try and check it out in my environment? ...
Daniel Oliveira
05:51 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Wei,
Yes, the log file shows the same error with 12.2.0 build running on. I agree with @Josh and you, it seems t...
Daniel Oliveira
02:45 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Sage @Daniel Huang and I use the same cluster. We use xfs insteads of bluefs for some osds in our cluster, the issue... wei qiaomiao
01:54 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Sage Weil wrote:
> Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and a...
黄 维
12:51 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
Daniel Oliveira wrote:
> @Wei,
>
> Please, would you mind describing a bit more your environment? Also, how ofte...
wei qiaomiao
11:36 AM Bug #20871 (Resolved): core dump when bluefs's mkdir returns -EEXIST
Chang Liu
03:31 AM Bug #20759: mon: valgrind detects a few leaks
/kchai-2017-09-21_06:22:45-rados-wip-kefu-testing-2017-09-21-1013-distro-basic-mira/1654844/remote/mira038/log/valgri... Kefu Chai
03:13 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
... Kefu Chai
03:06 AM Bug #21474 (Fix Under Review): bluestore fsck took 224.778802 seconds to complete which caused "t...
https://github.com/ceph/ceph/pull/17902 Kefu Chai

09/21/2017

11:00 PM Bug #21382 (In Progress): Erasure code recovery should send additional reads if necessary
David Zafman
09:50 PM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Done. Took less than an hour and happened on two OSDs. Uploaded one of them:
ceph-post-file: 6e0ed6ab-1528-428d-aa...
Bob Bobington
08:26 PM Bug #21470 (Need More Info): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after...
Okay, thanks for confirmation that the #21171 fix is applied. can you reproduce with debug bluestore = 20, and then ... Sage Weil
08:25 PM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
Sage Weil
08:08 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
Got a report of this happening in downstream Red Hat packages at https://bugzilla.redhat.com/show_bug.cgi?id=1494238
...
Greg Farnum
08:02 PM Bug #21496 (Fix Under Review): doc: Manually editing a CRUSH map, Word 'type' missing.
http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/
In the section "CRUSH map rules", in the overvi...
Anonymous
07:59 PM Bug #21303 (Need More Info): rocksdb get a error: "Compaction error: Corruption: block checksum m...
Can you please upgrade to 12.2.0 (or better yet, latest luminous branch), and then run fsck and attach the output?
...
Sage Weil
04:59 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
@Wei,
Please, would you mind describing a bit more your environment? Also, how often does it happen? Can we repro...
Daniel Oliveira
06:11 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
This issue can reproduce in our cluster, we are willing to give more information if you need. wei qiaomiao
07:36 PM Bug #20653 (Can't reproduce): bluestore: aios don't complete on very large writes on xenial
I'm going to assume this was #21171 Sage Weil
06:48 PM Bug #21417: buffer_anon leak during deep scrub (on otherwise idle osd)
definitely happens from an ec pool. Sage Weil
04:03 PM Bug #21410 (Resolved): pg_upmap_items can duplicate an item
Sage Weil
02:45 PM Bug #21410 (Pending Backport): pg_upmap_items can duplicate an item
Sage Weil
04:02 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
... Sage Weil
04:07 AM Backport #21465 (In Progress): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" e...
Nathan Cutler
04:05 AM Backport #21438 (In Progress): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
Nathan Cutler
04:03 AM Backport #21343 (In Progress): luminous: DNS SRV default service name not used anymore
Nathan Cutler
04:01 AM Backport #21307 (In Progress): luminous: Client client.admin marked osd.2 out, after it was down ...
Nathan Cutler

09/20/2017

08:37 PM Bug #21428: luminous: osd: does not request latest map from mon
Fix:
* master https://github.com/ceph/ceph/pull/17828
* luminous https://github.com/ceph/ceph/pull/17829
Nathan Cutler
03:15 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
Josh Durgin
05:17 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
fixing bug in the patch Josh Durgin
04:39 PM Bug #21408 (Fix Under Review): osd: "fsck error: free extent 0x2000~2000 intersects allocated blo...
https://github.com/ceph/ceph/pull/17845 Sage Weil
04:03 PM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
potentially a bug in bluefs Josh Durgin
03:53 PM Bug #21407: backoff causes out of order op
Josh Durgin
03:46 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2017-09-19_19:54:13-rados-wip-yuri-testing3-2017-09-19-1710-distro-basic-smithi/1648800
osd.7 again! weird
Sage Weil
03:01 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
cool. will update the test. Kefu Chai
12:14 PM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
Sigh.. yeah. I can't decide if we should stop doing these fsck's entirely, or reduce the debug level just for fsck, ... Sage Weil
05:30 AM Bug #21474: bluestore fsck took 224.778802 seconds to complete which caused "timed out waiting fo...
Sage, if you believe that it's normal for bluestore to take around 4 minutes to complete a deep fsck. i will prolong ... Kefu Chai
05:28 AM Bug #21474 (Resolved): bluestore fsck took 224.778802 seconds to complete which caused "timed out...
/a/kchai-2017-09-19_14:50:44-rados-wip-kefu-testing-2017-09-19-1954-distro-basic-mira/1648644... Kefu Chai
11:19 AM Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping req...
Seems, its a duplicate of this tracker http://tracker.ceph.com/issues/21180 . Please verify.. Nokia ceph-users
11:18 AM Bug #21475 (Duplicate): 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropp...
~~~
2017-09-18 14:51:59.895746 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068824 seconds old...
Nokia ceph-users
10:25 AM Bug #21471 (In Progress): mon osd feature checks for osdmap flags and require-osd-release fail if...
https://github.com/ceph/ceph/pull/17831 Brad Hubbard
02:29 AM Bug #21471 (Resolved): mon osd feature checks for osdmap flags and require-osd-release fail if 0 ...
the various checks test get_up_osd_features() but that returns 0 if no osds are up.
needs to be fixed in luminous ...
Sage Weil
02:19 AM Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix
Oh, forgot to add that I've tried the workarounds on the related issues. Adding this to my ceph.conf makes no differe... Bob Bobington
02:16 AM Bug #21470 (Resolved): Ceph OSDs crashing in BlueStore::queue_transactions() using EC after apply...
This is a copy of http://tracker.ceph.com/issues/21314, which was marked as resolved. It's not resolved after applyin... Bob Bobington

09/19/2017

11:46 PM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
backport was https://github.com/ceph/ceph/pull/17796 Josh Durgin
07:24 AM Bug #21428 (Fix Under Review): luminous: osd: does not request latest map from mon
https://github.com/ceph/ceph/pull/17795 Josh Durgin
02:02 AM Bug #21428 (In Progress): luminous: osd: does not request latest map from mon
Josh Durgin
12:16 AM Bug #21428: luminous: osd: does not request latest map from mon
I think this is from the fast_dispatch refactor in luminous, and the latest test timing just happened to show it. Josh Durgin
12:12 AM Bug #21428 (Resolved): luminous: osd: does not request latest map from mon
On the current luminous branch, a couple tests saw slow requests > 1 hour due to ops waiting for maps.
One is /a/y...
Josh Durgin
08:25 PM Backport #21465 (Resolved): luminous: OSD metadata 'backend_filestore_dev_node' is "unknown" even...
https://github.com/ceph/ceph/pull/17865 Nathan Cutler
06:01 PM Bug #20944 (Pending Backport): OSD metadata 'backend_filestore_dev_node' is "unknown" even for si...
Sage Weil
11:36 AM Backport #21438 (Resolved): luminous: Daemons(OSD, Mon...) exit abnormally at injectargs command
https://github.com/ceph/ceph/pull/17864 Nathan Cutler
08:20 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
I had to delete affected pool to reclaim occupied space so I am unable to verify any fixes Henrik Korkuc
03:31 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
duplicate issue: http://tracker.ceph.com/issues/16279 huang jun

09/18/2017

08:57 PM Bug #19790 (Resolved): rados ls on pool with no access returns no error
Nathan Cutler
08:57 PM Backport #20723 (Resolved): jewel: rados ls on pool with no access returns no error
Nathan Cutler
02:45 PM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
with https://github.com/ceph/ceph/pull/17179, we also met this error:
mon.b@0(leader).osd e15 tester.test_with_fork...
huang jun
02:53 AM Bug #21171: bluestore: aio submission deadlock
Since my issue (http://tracker.ceph.com/issues/21314) was marked as a dupe of this and I haven't received a response ... Bob Bobington
01:42 AM Bug #21417 (Resolved): buffer_anon leak during deep scrub (on otherwise idle osd)
observed gobs of ram (11gb rss) and most of it buffer_anon (~8gb) on a basically idle cluster with replication, ec, a... Sage Weil

09/16/2017

05:59 PM Bug #21409 (Resolved): per-pool full flags set incorrectly?
Sage Weil
05:46 AM Bug #21409 (Fix Under Review): per-pool full flags set incorrectly?
https://github.com/ceph/ceph/pull/17763 xie xingguo

09/15/2017

09:54 PM Bug #21408 (In Progress): osd: "fsck error: free extent 0x2000~2000 intersects allocated blocks"
Sage Weil
07:50 PM Bug #21408 (Resolved): osd: "fsck error: free extent 0x2000~2000 intersects allocated blocks"
Run: http://pulpito.ceph.com/teuthology-2017-09-15_17:30:33-upgrade:luminous-x-master-distro-basic-smithi/
Jobs: man...
Yuri Weinstein
08:57 PM Bug #21410 (Fix Under Review): pg_upmap_items can duplicate an item
https://github.com/ceph/ceph/pull/17760 Sage Weil
08:43 PM Bug #21410 (Resolved): pg_upmap_items can duplicate an item
... Sage Weil
08:34 PM Bug #21309 (Resolved): mon/OSDMonitor: deleting pool while pgs are being created leads to assert(...
Nathan Cutler
08:34 PM Backport #21341 (Resolved): luminous: mon/OSDMonitor: deleting pool while pgs are being created l...
Nathan Cutler
08:32 PM Bug #21409 (Resolved): per-pool full flags set incorrectly?
http://pulpito.ceph.com/sage-2017-09-15_15:50:19-rados-wip-sage-testing2-2017-09-14-1256-distro-basic-smithi/1635852
...
Sage Weil
08:28 PM Bug #21407: backoff causes out of order op
https://github.com/ceph/ceph/pull/17759 Sage Weil
07:44 PM Bug #21407: backoff causes out of order op
problem seems to be that we are requeueing waiting_for_peered before we are actually peered. that happens from on_fl... Sage Weil
07:19 PM Bug #21407 (Resolved): backoff causes out of order op
- receive op a and b... Sage Weil
11:53 AM Bug #21365 (Pending Backport): Daemons(OSD, Mon...) exit abnormally at injectargs command
https://github.com/ceph/ceph/pull/17664 Kefu Chai
08:11 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
We did an scp of the OSD object file then a rados put and deep-scrub got it back to OK.
Thanks for your quick answer!
Laurent GUERBY

09/14/2017

10:20 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I forgot the mention that on get you have to prevent the code from checking the digest by doing reads that are smalle... David Zafman
10:18 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

After the deep-scrub, "rados get" give us some errors:
# rados list-inconsistent-obj 58.6c1 --format=json-pretty...
Mehdi Abaakouk
03:35 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

Doing the following should produce list-inconsistent-obj information:
$ ceph pg deep-scrub 58.6c1
(Wait for scr...
David Zafman
01:37 PM Bug #21388 (Duplicate): inconsistent pg but repair does nothing reporting head data_digest != dat...
ceph pg repair is currently not fixing three "inconsistent" objects
on one of our pg on a replica 3 pool.
For al...
Laurent GUERBY
10:18 PM Backport #21117 (In Progress): jewel: osd: osd_scrub_during_recovery only considers primary, not ...
This requires merge resolution which I've begun looking at. David Zafman
12:25 PM Bug #20924: osd: leaked Session on osd.7
/a/sage-2017-09-13_13:31:57-rados-wip-sage-testing-2017-09-12-1750-distro-basic-smithi/1627916
is it just me or is...
Sage Weil
12:22 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
another run, eio injection led to a read_log_and_missing assert:... Sage Weil
12:15 PM Bug #21387 (Can't reproduce): mark_unfound_lost hangs
... Sage Weil
10:51 AM Bug #21354: Possible bug in interval_set.intersect_of()
great, i was searching this ticket in "My Page" on tracker =D Kefu Chai
09:29 AM Backport #21374 (In Progress): luminous: incorrect erasure-code space in command ceph df
Abhishek Lekshmanan
09:22 AM Documentation #21386: rados: manpage missing import/export
Also there seem to be hidden secret options like '--workers', nowhere to be found (not even in rados --help) but acce... Peter Gervai
09:17 AM Documentation #21386 (New): rados: manpage missing import/export
IMPORT AND EXPORT
export [filename]
Serialize pool contents to a file or standard out.
import [--dry-...
Peter Gervai

09/13/2017

11:59 PM Bug #21382 (Resolved): Erasure code recovery should send additional reads if necessary

We don't send additional reads when recovery experiences errors on some of the shards. For recovery we send k read...
David Zafman
09:54 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
https://github.com/ceph/ceph/pull/17707 Sage Weil

09/12/2017

06:59 PM Feature #21084 (Fix Under Review): auth: add osd auth caps based on pool metadata
https://github.com/ceph/ceph/pull/17678 Douglas Fuller
05:19 PM Bug #20981: ./run_seed_to_range.sh errored out
I'm lowering the priority because it appears to me to be a test code issue. The test doesn't detect the failure it i... David Zafman
02:17 PM Backport #21374 (Resolved): luminous: incorrect erasure-code space in command ceph df
https://github.com/ceph/ceph/pull/17724 Nathan Cutler
01:44 PM Bug #21243 (Pending Backport): incorrect erasure-code space in command ceph df
Kefu Chai
12:32 PM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df
Chang Liu
12:31 PM Bug #21258 (Closed): "ceph df"'s MAX AVAIL is not correct
Chang Liu
11:04 AM Bug #21354 (Closed): Possible bug in interval_set.intersect_of()
Closing, as the real reason for the issue was a git-merge that went wrong, leaving extra "insert(start, en-start);" c... Piotr Dalek
04:21 AM Bug #21354: Possible bug in interval_set.intersect_of()
I have tried to reproduce this problem myself, but I got the same results with and without the intersection_size_asym... Zac Medico
07:01 AM Feature #21366: tools/ceph-objectstore-tool: split filestore directories offline to target hash l...
https://github.com/ceph/ceph/pull/17666 Zhi Zhang
07:00 AM Feature #21366 (Resolved): tools/ceph-objectstore-tool: split filestore directories offline to ta...
Currently ceph-objectstore-tool can only split dirs that already meet the usual object number criteria. It won't redu... Zhi Zhang
06:21 AM Bug #21338 (Fix Under Review): There is a big risk in function bufferlist::claim_prepend()
https://github.com/ceph/ceph/pull/17661 Kefu Chai
04:13 AM Bug #21365 (Resolved): Daemons(OSD, Mon...) exit abnormally at injectargs command
Use tell injectargs command to adjust log level of osd, get the following error:... Yan Jun

09/11/2017

11:16 PM Bug #20981: ./run_seed_to_range.sh errored out

This bug was filed because the ceph_test_filestore_idempotent_sequence wasn't completing the _exit() in _inject_fai...
David Zafman
03:44 PM Bug #21354 (Closed): Possible bug in interval_set.intersect_of()
I've been working on different kind of optimization of pg_pool_t::build_removed_snaps (that gets rid of intersect int... Piotr Dalek
09:39 AM Backport #21341 (In Progress): luminous: mon/OSDMonitor: deleting pool while pgs are being create...
Nathan Cutler
09:37 AM Backport #21341 (Resolved): luminous: mon/OSDMonitor: deleting pool while pgs are being created l...
https://github.com/ceph/ceph/pull/17634 Nathan Cutler
09:38 AM Backport #21343 (Resolved): luminous: DNS SRV default service name not used anymore
https://github.com/ceph/ceph/pull/17863 Nathan Cutler
08:17 AM Bug #21338 (Resolved): There is a big risk in function bufferlist::claim_prepend()
Recently i found a design flaw in the study of the bufferlist. There is a big risk if we call buffer::list::claim_pre... Ivan Guan
08:10 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
[root@ceph241 hw]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-1 fsck
action fsck
2017-09-11 15:37:35.698119 ...
黄 维
04:06 AM Bug #18749: OSD: allow EC PGs to do recovery below min_size
https://github.com/ceph/ceph/pull/17619
Greg Farnum, would you mind taking a look?
Chang Liu

09/10/2017

07:17 PM Bug #21309 (Pending Backport): mon/OSDMonitor: deleting pool while pgs are being created leads to...
Sage Weil
07:15 PM Bug #20924: osd: leaked Session on osd.7
/a/sage-2017-09-10_02:50:18-rados-wip-sage-testing-2017-09-08-1434-distro-basic-smithi/1615133 Sage Weil
06:58 PM Bug #21180 (Resolved): Bluestore throttler causes down OSD
Pretty sure this was #21171, fixed merged to master and luminous, will be in 12.2.1. Sage Weil
06:57 PM Bug #21246 (Resolved): bluestore: hang while replaying deferred ios from journal
Pretty sure this was #21171. Fix is merged to master and luminous branch, will be in v12.2.1. Sage Weil
06:57 PM Backport #21325 (Resolved): luminous: bluestore: aio submission deadlock
Sage Weil
06:57 PM Bug #21171 (Resolved): bluestore: aio submission deadlock
Sage Weil
12:41 AM Bug #21331: pg recovery priority inversion
... Sage Weil

09/09/2017

08:38 PM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
I also tried the workaround in http://tracker.ceph.com/issues/21180 by adding these to ceph.conf but no luck:
<pre...
Bob Bobington
07:19 PM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
After a few crashes the OSDs become permanently lost, consistently displaying errors like this upon startup:... Bob Bobington
05:31 PM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
I've applied the changes in the Git pull request referenced in that issue and the issue still persists:... Bob Bobington
05:51 AM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Hmm, I found another log file and came across this:... Bob Bobington
07:42 PM Bug #21331: pg recovery priority inversion
Actually, this isn't quite right.
The real problem is that the *primary* has an ancient last_complete, because it ...
Sage Weil
06:53 PM Bug #21331: pg recovery priority inversion
it looks lke peer_last_commit_ondisk for osd.26 isn't getting updated since it is not in acting (it's backfill target... Sage Weil
06:21 PM Bug #21331 (Resolved): pg recovery priority inversion
... Sage Weil
08:45 AM Bug #21303: rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
the gdb info maybe helpful.It return null when rocsdb read metadata from the sst file
(gdb) n
rocksdb::ReadBlockC...
黄 维
04:08 AM Bug #21204 (Pending Backport): DNS SRV default service name not used anymore
Kefu Chai

09/08/2017

08:21 PM Backport #21325 (In Progress): luminous: bluestore: aio submission deadlock
Nathan Cutler
08:20 PM Backport #21325 (Resolved): luminous: bluestore: aio submission deadlock
https://github.com/ceph/ceph/pull/17601 Nathan Cutler
08:18 PM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
There are no log entries regarding failed heartbeat checks on the failing OSDs, only on the other OSDs witnessing the... Bob Bobington
07:37 PM Bug #21314 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
It is hard to tell because the lines preceding the snippet are missing, but I'm pretty sure this is a dup of #21171, ... Sage Weil
06:22 PM Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
... Greg Farnum
03:44 PM Bug #21314 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Log is attached. 3 of my 4 OSDs have crashed in a similar manner at different times. I'm running Ceph on a single nod... Bob Bobington
06:37 PM Bug #21250 (Resolved): os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.extents.empty())
Nathan Cutler
06:36 PM Backport #21276 (Resolved): luminous: os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnod...
Nathan Cutler
03:48 PM Backport #21276: luminous: os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.extents.e...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/17562
merged
Yuri Weinstein
05:51 PM Bug #21123 (Resolved): osd/PrimaryLogPG: sparse read won't trigger repair correctly
Nathan Cutler
05:50 PM Bug #21162 (Resolved): 'osd crush rule rename' not idempotent
Nathan Cutler
05:50 PM Bug #21207 (Resolved): bluestore: asyn cdeferred_try_submit deadlock
Nathan Cutler
05:13 PM Bug #19605 (Resolved): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
Nathan Cutler
05:12 PM Bug #20888 (Resolved): "Health check update" log spam
Nathan Cutler
03:57 PM Backport #21133 (Resolved): luminous: osd/PrimaryLogPG: sparse read won't trigger repair correctly
Sage Weil
03:56 PM Backport #21234 (Resolved): luminous: bluestore: asyn cdeferred_try_submit deadlock
Sage Weil
03:56 PM Backport #21182 (Resolved): luminous: 'osd crush rule rename' not idempotent
Sage Weil
03:41 PM Bug #20370: leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
/a/yuriw-2017-09-07_19:30:56-rados-wip-yuri-testing4-2017-09-07-1811-distro-basic-smithi/1607597 Sage Weil
02:42 PM Backport #21308: jewel: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout i...
Nathan, thanks for creating this ticket! Kefu Chai
08:16 AM Backport #21308 (In Progress): jewel: pre-luminous: aio_read returns erroneous data when rados_os...
Nathan Cutler
08:15 AM Backport #21308 (Resolved): jewel: pre-luminous: aio_read returns erroneous data when rados_osd_o...
https://github.com/ceph/ceph/pull/17594 Nathan Cutler
02:26 PM Backport #21242 (Resolved): luminous: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue...
Sage Weil
02:25 PM Backport #21240 (Resolved): luminous: "Health check update" log spam
Sage Weil
02:24 PM Backport #21238 (Resolved): luminous: test_health_warnings.sh can fail
Sage Weil
12:20 PM Bug #21293 (Resolved): bluestore: spanning blob doesn't match expected ref_map
Sage Weil
12:18 PM Bug #21171 (Pending Backport): bluestore: aio submission deadlock
https://github.com/ceph/ceph/pull/17601 is teh backport Sage Weil
12:05 PM Bug #21309 (Fix Under Review): mon/OSDMonitor: deleting pool while pgs are being created leads to...
https://github.com/ceph/ceph/pull/17600 Joao Eduardo Luis
11:56 AM Bug #21309 (In Progress): mon/OSDMonitor: deleting pool while pgs are being created leads to asse...
Joao Eduardo Luis
11:55 AM Bug #21309 (Resolved): mon/OSDMonitor: deleting pool while pgs are being created leads to assert(...
ceph version 13.0.0-429-gbc5fe2e (bc5fe2e9099dbb560c2153d3ac85f38b46593a77) mimic (dev)
Easily reproducible on a v...
Joao Eduardo Luis
08:14 AM Backport #21307 (Resolved): luminous: Client client.admin marked osd.2 out, after it was down for...
https://github.com/ceph/ceph/pull/17862 Nathan Cutler
08:14 AM Bug #20616 (Pending Backport): pre-luminous: aio_read returns erroneous data when rados_osd_op_ti...
Fixed in Infernalis by https://github.com/ceph/ceph/commit/64bca33ae76646879e6801c45e6d91852e488f8b
Needs backport...
Nathan Cutler
07:32 AM Bug #20616 (Fix Under Review): pre-luminous: aio_read returns erroneous data when rados_osd_op_ti...
this only happens if "rados_osd_op_timeout > 0", where the rx_buffer optimization is disabled, due to #9582. in that ... Kefu Chai
06:04 AM Bug #21303 (Resolved): rocksdb get a error: "Compaction error: Corruption: block checksum mismatch"
ceph --version
ceph version 12.1.0.5 (27f32562975c5fd3b785a124c818599c677b3f67) luminous (dev)
osd log:
2017-09-...
黄 维

09/07/2017

09:23 PM Bug #21249 (Pending Backport): Client client.admin marked osd.2 out, after it was down for 150462...
Sage Weil
08:47 PM Bug #21171: bluestore: aio submission deadlock
There wsa also an aio submission bug that dropped ios on the floor. it was consistently reproducible with... Sage Weil
06:42 PM Bug #20910 (In Progress): spurious MON_DOWN, apparently slow/laggy mon
Ok, this is still happening.. and it correlated with (1) bluestore and (2) bluestore fsck on mount, which spews an un... Sage Weil
01:57 AM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
*master PR for backport*: https://github.com/ceph/ceph/pull/17505 Nathan Cutler
01:29 PM Bug #21293: bluestore: spanning blob doesn't match expected ref_map
... Sage Weil
01:29 PM Bug #21293 (Fix Under Review): bluestore: spanning blob doesn't match expected ref_map
https://github.com/ceph/ceph/pull/17569 Sage Weil
01:11 PM Bug #21293 (Resolved): bluestore: spanning blob doesn't match expected ref_map
... Sage Weil
01:02 PM Backport #21283 (In Progress): luminous: spurious MON_DOWN, apparently slow/laggy mon
Abhishek Lekshmanan
07:36 AM Backport #21283 (Resolved): luminous: spurious MON_DOWN, apparently slow/laggy mon
https://github.com/ceph/ceph/pull/17564 Nathan Cutler
01:00 PM Backport #21276 (In Progress): luminous: os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->f...
Abhishek Lekshmanan
07:35 AM Backport #21276 (Resolved): luminous: os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnod...
https://github.com/ceph/ceph/pull/17562 Nathan Cutler
10:49 AM Bug #20616: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but no...
i am able to reproduce this issue with the last jewel, but not master.
reverting 126d0b30e990519b8f845f99ba893fdcd...
Kefu Chai
09:01 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
btw down pg is 1.1735.
Starting OSD 381 crashes 65, 133 and 118. Stoping 65 enables to start remaining OSDs, start...
Henrik Korkuc
08:14 AM Bug #21287 (Duplicate): 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->i...
One PG went down for me during large rebalance (I added racks to OSD placement, almost all data had to be shuffled). ... Henrik Korkuc
08:16 AM Bug #21180: Bluestore throttler causes down OSD
pool used for this workload is blocked by down PG (#21287), but I'll try to replicate on same cluster with newly crea... Henrik Korkuc
05:27 AM Bug #21204 (Fix Under Review): DNS SRV default service name not used anymore
https://github.com/ceph/ceph/pull/17539 Kefu Chai
02:44 AM Bug #21258: "ceph df"'s MAX AVAIL is not correct
Josh Durgin wrote:
> What is your crushmap and device sizes? It looks like you may have different roots, hence diffe...
Chang Liu
01:31 AM Bug #21262: cephfs ec data pool, many osds marked down
yes. the log not only about one issue.totally issue like blow:
1. slow request, osd marked down, osd op suicide ca...
Yong Wang

09/06/2017

09:03 PM Bug #20910 (Pending Backport): spurious MON_DOWN, apparently slow/laggy mon
Sage Weil
09:02 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
the problem is that bluestore logs so freaking much at debug bluestore = 30 that the mon gets all laggy. Sage Weil
08:52 PM Bug #21250 (Pending Backport): os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.exten...
Sage Weil
05:03 PM Bug #21249 (Fix Under Review): Client client.admin marked osd.2 out, after it was down for 150462...
https://github.com/ceph/ceph/pull/17525 John Spray
04:35 PM Bug #21262 (Need More Info): cephfs ec data pool, many osds marked down
Sage Weil
03:44 PM Bug #21262: cephfs ec data pool, many osds marked down
You're hitting a variety of issues there - some suggesting on-disk corruption, the unexpected error indicating a like... Josh Durgin
02:26 PM Bug #21262: cephfs ec data pool, many osds marked down
relationed error
ceph-osd.22.log:/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AV...
Yong Wang
02:16 PM Bug #21262 (Need More Info): cephfs ec data pool, many osds marked down
cephfs ec data pool, many osds marked down
slow request and get flow blocked, deal op blocked and etc.
Yong Wang
04:34 PM Bug #21180 (Need More Info): Bluestore throttler causes down OSD
Sage Weil
04:34 PM Bug #21180: Bluestore throttler causes down OSD
Can you try setting bluestore_deferred_throttle_bytes = 0 along with bluestore_throttle_bytes = 0 and see if that res... Sage Weil
04:32 PM Bug #21246: bluestore: hang while replaying deferred ios from journal
This looks like it might be the same as #21171, or one of the related bugs I am currently working on. As soon as I h... Sage Weil
03:18 PM Bug #21258 (Fix Under Review): "ceph df"'s MAX AVAIL is not correct
Ah I see your PR now: https://github.com/ceph/ceph/pull/17513 Josh Durgin
03:16 PM Bug #21258: "ceph df"'s MAX AVAIL is not correct
What is your crushmap and device sizes? It looks like you may have different roots, hence different space available i... Josh Durgin
03:45 AM Bug #21258 (Closed): "ceph df"'s MAX AVAIL is not correct
... Chang Liu
03:18 PM Bug #21263: when disk error happens, osd reports assertion failure without any error information
Will fix it in this PR:
https://github.com/ceph/ceph/pull/17522
Pan Liu
02:38 PM Bug #21263 (Resolved): when disk error happens, osd reports assertion failure without any error i...
I used fio+librbd to test one osd(bluestore), which built in an NVME SSD. After I plug-out this SSD, osd reports asse... Pan Liu
11:40 AM Bug #21143: bad RESETSESSION between OSDs?
@yuri, this PR is not merged. or i misunderstand your comment here? Kefu Chai
07:44 AM Bug #21243: incorrect erasure-code space in command ceph df
https://github.com/ceph/ceph/pull/17513 Chang Liu
05:56 AM Feature #21198: Monitors don't handle incomplete network splits
the same case:
https://marc.info/?l=ceph-devel&w=2&r=1&s=ceph-mon+leader+election+problem&q=b
zhiang li

09/05/2017

08:49 PM Bug #20041 (Resolved): ceph-osd: PGs getting stuck in scrub state, stalling RBD
Nathan Cutler
08:49 PM Backport #20780 (Resolved): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Nathan Cutler
08:47 PM Bug #20464 (Resolved): cache tier osd memory high memory consumption
Nathan Cutler
08:47 PM Backport #20511 (Resolved): jewel: cache tier osd memory high memory consumption
Nathan Cutler
08:46 PM Bug #20375 (Resolved): osd: omap threadpool heartbeat is only reset every 100 values
Nathan Cutler
08:46 PM Backport #20492 (Resolved): jewel: osd: omap threadpool heartbeat is only reset every 100 values
Nathan Cutler
07:02 PM Bug #21250 (Fix Under Review): os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.exten...
https://github.com/ceph/ceph/pull/17503 Sage Weil
06:53 PM Bug #21250: os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.extents.empty())
looks like two concurrent threads trying to compact_log_async:... Sage Weil
06:51 PM Bug #21250 (Resolved): os/bluestore/BlueFS.cc: 1255: FAILED assert(!log_file->fnode.extents.empty())
... Sage Weil
04:49 PM Bug #21249 (Resolved): Client client.admin marked osd.2 out, after it was down for 1504627577 sec...
... Sage Weil
03:30 PM Bug #20843 (Resolved): assert(i->prior_version == last) when a MODIFY entry follows an ERROR entry
Nathan Cutler
03:30 PM Backport #20930 (Rejected): kraken: assert(i->prior_version == last) when a MODIFY entry follows ...
Kraken is EOL. Nathan Cutler
03:30 PM Backport #20722 (Rejected): kraken: rados ls on pool with no access returns no error
Kraken is EOL. Nathan Cutler
03:29 PM Backport #20493 (Rejected): kraken: osd: omap threadpool heartbeat is only reset every 100 values
Kraken is EOL. Nathan Cutler
03:21 PM Backport #21242 (In Progress): luminous: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_qu...
Nathan Cutler
09:10 AM Backport #21242 (Resolved): luminous: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue...
https://github.com/ceph/ceph/pull/17501 Nathan Cutler
03:20 PM Backport #21240 (In Progress): luminous: "Health check update" log spam
Nathan Cutler
09:09 AM Backport #21240 (Resolved): luminous: "Health check update" log spam
https://github.com/ceph/ceph/pull/17500 Nathan Cutler
03:18 PM Backport #21238 (In Progress): luminous: test_health_warnings.sh can fail
Nathan Cutler
09:09 AM Backport #21238 (Resolved): luminous: test_health_warnings.sh can fail
https://github.com/ceph/ceph/pull/17498 Nathan Cutler
03:15 PM Backport #21236 (In Progress): luminous: build_initial_pg_history doesn't update up/acting/etc
Nathan Cutler
09:09 AM Backport #21236 (Resolved): luminous: build_initial_pg_history doesn't update up/acting/etc
https://github.com/ceph/ceph/pull/17496
https://github.com/ceph/ceph/pull/17622
Nathan Cutler
03:13 PM Backport #21235 (In Progress): luminous: thrashosds read error injection doesn't take live_osds i...
Nathan Cutler
09:09 AM Backport #21235 (Resolved): luminous: thrashosds read error injection doesn't take live_osds into...
https://github.com/ceph/ceph/pull/17495 Nathan Cutler
03:12 PM Backport #21234 (In Progress): luminous: bluestore: asyn cdeferred_try_submit deadlock
Nathan Cutler
09:09 AM Backport #21234 (Resolved): luminous: bluestore: asyn cdeferred_try_submit deadlock
https://github.com/ceph/ceph/pull/17494 Nathan Cutler
01:02 PM Bug #21243: incorrect erasure-code space in command ceph df
not only ISA plugin, It's common problem.... Chang Liu
01:00 PM Bug #21243: incorrect erasure-code space in command ceph df
... Chang Liu
11:09 AM Bug #21243 (Resolved): incorrect erasure-code space in command ceph df


ceph osd erasure-code-profile set ISA plugin=isa k=2 m=2 crush-failure-domain=host crush-device-c...
Petr Malkov
12:56 PM Bug #21246 (Resolved): bluestore: hang while replaying deferred ios from journal
Running ceph-osd-11.2.0-0.el7.x86_64 from ceph-stable's CentOS repository, I hit the following problem. The cluster (... Tobias Florek
10:22 AM Bug #21180: Bluestore throttler causes down OSD
just an update - sometimes even with bluestore_throttle_bytes set to 0 I get down OSDs, but it is much more rare and ... Henrik Korkuc
09:51 AM Backport #21182 (In Progress): luminous: 'osd crush rule rename' not idempotent
Nathan Cutler
09:39 AM Backport #21133 (In Progress): luminous: osd/PrimaryLogPG: sparse read won't trigger repair corre...
Nathan Cutler
09:38 AM Backport #21132 (Resolved): luminous: qa/standalone/scrub/osd-scrub-repair.sh timeout
Nathan Cutler
09:09 AM Backport #21239 (Resolved): jewel: test_health_warnings.sh can fail
https://github.com/ceph/ceph/pull/20289 Nathan Cutler

09/04/2017

08:36 PM Bug #20785 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool...
Nathan Cutler
01:43 PM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
thanks Joao, i am commenting on https://github.com/ceph/ceph/pull/17191 so it references https://github.com/ceph/ceph... Kefu Chai
12:57 PM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
doh. I missed the needs-backport tag on the pr :( Joao Eduardo Luis
12:14 PM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
Joao, I changed status to "Pending Backport" but the PR is also has the "needs-backport" label, which is perhaps enou... Nathan Cutler
12:13 PM Bug #20785 (Pending Backport): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(p...
Nathan Cutler
11:19 AM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
I may be wrong, but it looks like the commit fixing this is only present in current master. I was under the impressio... Joao Eduardo Luis
04:15 PM Bug #21227 (New): [osd]default mkfs.xfs option may make some problem
the default mkfs.xfs osd with -i size 2048
xfs=[
# xfs insists on not overwriting previous fs; even if...
peng zhang
10:39 AM Bug #21171: bluestore: aio submission deadlock
Sage, is there an identifiable behavior when this happens? Do the osds die, or is IO simply forever blocked? Joao Eduardo Luis
09:33 AM Backport #20781 (Rejected): kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
Kraken is EOL. Nathan Cutler
06:46 AM Bug #21207 (Pending Backport): bluestore: asyn cdeferred_try_submit deadlock
xie xingguo

09/02/2017

06:36 PM Bug #20888 (Pending Backport): "Health check update" log spam
Sage Weil
06:35 PM Bug #21206 (Pending Backport): thrashosds read error injection doesn't take live_osds into account
Sage Weil
06:34 PM Bug #21203 (Pending Backport): build_initial_pg_history doesn't update up/acting/etc
Sage Weil
04:15 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
I've got exactly the same problem with kernel client. But fuse client seems fine with ec pool on cephfs George Zhao
01:04 AM Bug #20981: ./run_seed_to_range.sh errored out
one more here http://qa-proxy.ceph.com/teuthology/yuriw-2017-09-01_23:34:11-rados-wip-yuri-testing-2017-08-31-2109-di... Yuri Weinstein

09/01/2017

09:25 PM Bug #21218 (Resolved): thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing...
... Sage Weil
02:02 PM Bug #21203: build_initial_pg_history doesn't update up/acting/etc
https://github.com/ceph/ceph/pull/17423 Sage Weil
01:30 PM Backport #20512 (Rejected): kraken: cache tier osd memory high memory consumption
Kraken is EOL. Nathan Cutler
06:53 AM Bug #21211: 12.2.0,cephfs(meta replica 2, data ec 2+1),ceph-osd coredump
12.2.0
create cephfs
meta pool: model : replica 2
data pool: model : ec 2+1
ceph-osd coredump after r...
Yong Wang
06:49 AM Bug #21211 (Need More Info): 12.2.0,cephfs(meta replica 2, data ec 2+1),ceph-osd coredump
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
1: (()+0xa23b21) [0x7fe4a148bb21]
2...
Yong Wang

08/31/2017

09:38 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
Hi
My institute has a large cluster running Kraken 11.2.1-0 and using EC 8+3 and believe we have run into this bug...
Alastair Dewhurst
08:45 PM Bug #21121 (Pending Backport): test_health_warnings.sh can fail
Sage Weil
08:44 PM Bug #21207 (Fix Under Review): bluestore: asyn cdeferred_try_submit deadlock
https://github.com/ceph/ceph/pull/17409 Sage Weil
08:39 PM Bug #21207 (Resolved): bluestore: asyn cdeferred_try_submit deadlock
In deferred_aio_finish we may need to requeue pending deferred via a finisher. Currently we reuse finishers[0], but ... Sage Weil
06:56 PM Bug #21206 (Fix Under Review): thrashosds read error injection doesn't take live_osds into account
https://github.com/ceph/ceph/pull/17406 Sage Weil
06:54 PM Bug #21206 (Resolved): thrashosds read error injection doesn't take live_osds into account
... Sage Weil
04:31 PM Bug #20981: ./run_seed_to_range.sh errored out
David, the first dead job to appear was http://pulpito.ceph.com/smithfarm-2017-08-21_19:38:42-rados-wip-jewel-backpor... Nathan Cutler
03:21 AM Bug #20981: ./run_seed_to_range.sh errored out
Are we sure it isn't http://tracker.ceph.com/issues/20613#note-24 ? Because the dead runs here http://pulpito.ceph.c... David Zafman
03:14 AM Bug #20981: ./run_seed_to_range.sh errored out
I looked at https://github.com/ceph/ceph/pull/15050 and don't see anything that would cause this issue. David Zafman
04:11 PM Bug #21204 (Resolved): DNS SRV default service name not used anymore
Hi,
I am in the process of upgrading from Kraken to Luminous.
I am using DNS SRV records to lookup MON servers.
...
Lionel BEARD
01:56 PM Bug #19605 (Pending Backport): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front(...
Kefu Chai
01:52 PM Bug #21203 (Resolved): build_initial_pg_history doesn't update up/acting/etc
The loop doesn't update up/acting/etc values, which means the result is incorrect when there are multiple intervals s... Sage Weil
11:00 AM Bug #20933: All mon nodes down when i use ceph-disk prepare a new osd.
I think I've hit the similar issue. Occured with 12.1.2 when tried to add host / osd (ceph-deploy osd prepare --dmcry... Denis Zadonskii
09:27 AM Feature #21198 (New): Monitors don't handle incomplete network splits
the network between monitors(the minimum rank and the maximum rank) disconnect, the node of the maximum rank always k... zhiang li
03:30 AM Bug #21194 (New): mon clock skew test is fragile
The original observed problem is that it failed to detect clock skew in run /a/sage-2017-08-27_02:16:57-rados-wip-sa... Sage Weil

08/30/2017

06:10 PM Bug #20981: ./run_seed_to_range.sh errored out
My money is on https://github.com/ceph/ceph/pull/15050 Nathan Cutler
06:07 PM Bug #20981: ./run_seed_to_range.sh errored out
David, the jewel failure started occurring in the integration branch that included the following PRs: http://tracker.... Nathan Cutler
04:56 PM Bug #20981: ./run_seed_to_range.sh errored out
I reverted https://github.com/ceph/ceph/pull/15947 to see if that would fix it and it did NOT. David Zafman
03:57 PM Backport #21182 (Resolved): luminous: 'osd crush rule rename' not idempotent
https://github.com/ceph/ceph/pull/17481 Nathan Cutler
03:28 PM Bug #21180 (Resolved): Bluestore throttler causes down OSD
Writing large amount of data to EC RBD pool via NBD causes down OSDs, PGs and drop in traffic due to unhealthy cluste... Henrik Korkuc
02:12 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
To clarify then: I have not tested this with a replicated cephfs data pool. Only tested with ec data pool as per my 4... Martin Millnert
01:19 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Martin: just to confirm, you were seeing this crash while you had EC pools involved, and when you do not have any EC ... John Spray
11:25 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
... John Spray
06:12 AM Bug #21174 (Rejected): OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_up...
I've setup a cephfs erasure coded pool on a small cluster consisting of 5 bluestore OSDs.
The pools were created as ...
Martin Millnert
01:33 PM Bug #20871 (Fix Under Review): core dump when bluefs's mkdir returns -EEXIST
Chang Liu
01:33 PM Bug #20871: core dump when bluefs's mkdir returns -EEXIST
https://github.com/ceph/ceph/pull/17357 Chang Liu

08/29/2017

09:43 PM Bug #21171 (Fix Under Review): bluestore: aio submission deadlock
https://github.com/ceph/ceph/pull/17352 Sage Weil
02:47 PM Bug #21171 (Resolved): bluestore: aio submission deadlock
- thread a holds deferred_submit_lock, blocks on aio submission (queue is full)
- thread b holds deferred_lock, bloc...
Sage Weil
08:58 PM Bug #21162 (Pending Backport): 'osd crush rule rename' not idempotent
Sage Weil
10:46 AM Bug #21162 (Fix Under Review): 'osd crush rule rename' not idempotent
https://github.com/ceph/ceph/pull/17329 xie xingguo
07:38 PM Documentation #20486 (Resolved): Document how to use bluestore compression
Sage Weil
04:11 PM Bug #21143: bad RESETSESSION between OSDs?
Haomai Wang wrote:
> https://github.com/ceph/ceph/pull/16009
>
> this pr gives a brief about reason. it's really ...
Yuri Weinstein
03:08 PM Bug #21092: OSD sporadically starts reading at 100% of ssd bandwidth
Seems that is side effect of too small value for bluestore_cache_size.
We set it to 50M to reduce osd memory consump...
Aleksei Gutikov
07:36 AM Bug #20981: ./run_seed_to_range.sh errored out
This is occurring in the current jewel branch now too:
https://github.com/ceph/ceph/pull/17317#issuecomment-325580432
Josh Durgin
07:06 AM Backport #16239 (Resolved): 'ceph tell osd.0 flush_pg_stats' fails in rados qa run
h3. description... Nathan Cutler
03:00 AM Bug #21165 (Can't reproduce): 2 pgs stuck in unknown during thrashing
... Sage Weil

08/28/2017

10:20 PM Bug #21162 (Resolved): 'osd crush rule rename' not idempotent
... Sage Weil
06:15 PM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
Is this causing job failures? I'm having trouble finding anything indicating this would be fatal without an actual I... David Galloway
08:06 AM Backport #21150 (Resolved): jewel: tests: btrfs copy_clone returns errno 95 (Operation not suppor...
https://github.com/ceph/ceph/pull/18165 Kefu Chai
01:55 AM Bug #21016 (Resolved): CRUSH crash on bad memory handling
xie xingguo
01:54 AM Backport #21106 (Resolved): luminous: CRUSH crash on bad memory handling
https://github.com/ceph/ceph/pull/17214 xie xingguo
 

Also available in: Atom