Activity
From 07/30/2017 to 08/28/2017
08/28/2017
- 10:20 PM Bug #21162 (Resolved): 'osd crush rule rename' not idempotent
- ...
- 06:15 PM Backport #21150: jewel: tests: btrfs copy_clone returns errno 95 (Operation not supported)
- Is this causing job failures? I'm having trouble finding anything indicating this would be fatal without an actual I...
- 08:06 AM Backport #21150 (Resolved): jewel: tests: btrfs copy_clone returns errno 95 (Operation not suppor...
- https://github.com/ceph/ceph/pull/18165
- 01:55 AM Bug #21016 (Resolved): CRUSH crash on bad memory handling
- 01:54 AM Backport #21106 (Resolved): luminous: CRUSH crash on bad memory handling
- https://github.com/ceph/ceph/pull/17214
08/27/2017
- 05:59 PM Bug #21147 (Resolved): Manager daemon x is unresponsive. No standby daemons available
- /a/sage-2017-08-26_20:38:41-rados-luminous-distro-basic-smithi/1567938
The last time I looked this appeared to be ... - 04:04 PM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2017-08-26_20:38:41-rados-luminous-distro-basic-smithi/1568055
- 04:30 AM Bug #21143: bad RESETSESSION between OSDs?
- https://github.com/ceph/ceph/pull/16009
this pr gives a brief about reason. it's really rare, so I don't do it imm... - 02:16 AM Backport #21076 (Resolved): luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools()....
- 02:15 AM Backport #21095 (Resolved): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
08/26/2017
- 06:14 PM Bug #20785 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool...
- 06:13 PM Bug #20913 (Resolved): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- 06:08 PM Bug #21144 (Resolved): daemon-helper: command crashed with signal 1
- ...
- 05:56 PM Bug #21143 (Duplicate): bad RESETSESSION between OSDs?
- osd.5...
- 12:11 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- I uploaded more logs and info files with ceph-post-file
f27fb8a5-baae-4f04-8353-d3b2b314c61a
- 11:56 AM Bug #21142 (Won't Fix): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- after upgrading to luminous 12.1.4 rc
we saw several osds crashing with below logs.
the cluster was unhealthy when ... - 01:06 AM Bug #20981: ./run_seed_to_range.sh errored out
- Stack trace from core dump doesn't include a stack with _inject_failure() in it.
For core dump in /a/kchai-2017-08...
08/25/2017
- 08:02 PM Backport #21133 (Resolved): luminous: osd/PrimaryLogPG: sparse read won't trigger repair correctly
- https://github.com/ceph/ceph/pull/17475
- 08:02 PM Backport #21132 (Resolved): luminous: qa/standalone/scrub/osd-scrub-repair.sh timeout
- https://github.com/ceph/ceph/pull/17264
- 07:46 PM Bug #21127: qa/standalone/scrub/osd-scrub-repair.sh timeout
- https://github.com/ceph/ceph/pull/17264
- 07:44 PM Bug #21127 (Pending Backport): qa/standalone/scrub/osd-scrub-repair.sh timeout
- 03:01 PM Bug #21127: qa/standalone/scrub/osd-scrub-repair.sh timeout
- We need to backport fe81b7e3a5034ce855303f93f3e413f3f2dc74a8 and this change together to luminous.
- 02:59 PM Bug #21127: qa/standalone/scrub/osd-scrub-repair.sh timeout
- Caused by:
commit fe81b7e3a5034ce855303f93f3e413f3f2dc74a8
Author: huanwen ren <ren.huanwen@zte.com.cn>
Date: ... - 01:46 PM Bug #21127 (Fix Under Review): qa/standalone/scrub/osd-scrub-repair.sh timeout
- https://github.com/ceph/ceph/pull/17258
- 01:44 PM Bug #21127 (Resolved): qa/standalone/scrub/osd-scrub-repair.sh timeout
- ...
- 03:44 PM Bug #21130 (Can't reproduce): "FAILED assert(bh->last_write_tid > tid)" in powercycle-master-test...
- Run: http://pulpito.ceph.com/yuriw-2017-08-24_22:38:48-powercycle-master-testing-basic-smithi/
Job: 1560682
Logs: h... - 03:34 PM Backport #20781 (Fix Under Review): kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:33 PM Backport #20781: kraken: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- https://github.com/ceph/ceph/pull/17261
- 03:22 PM Backport #20780 (Fix Under Review): jewel: ceph-osd: PGs getting stuck in scrub state, stalling RBD
- 03:09 PM Bug #21123 (Pending Backport): osd/PrimaryLogPG: sparse read won't trigger repair correctly
- 03:08 PM Bug #21129 (New): 'ceph -s' hang
- ...
- 12:11 PM Backport #21076 (In Progress): luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools...
- https://github.com/ceph/ceph/pull/17257
- 10:28 AM Bug #21092: OSD sporadically starts reading at 100% of ssd bandwidth
- Another stack trace that leads to pread same size and same offset:...
- 09:20 AM Bug #21092: OSD sporadically starts reading at 100% of ssd bandwidth
- Stacktrace of thread performing reads of 2445312 bytes from offset 96117329920 ...
- 10:19 AM Bug #20188 (New): filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) from ceph_te...
- /a//kchai-2017-08-25_08:38:31-rados-wip-kefu-testing-distro-basic-smithi/1561884...
- 06:35 AM Bug #20785 (Fix Under Review): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(p...
- /a//joshd-2017-08-25_00:03:46-rados-wip-dup-perf-distro-basic-smithi/1560728/ mon.c
- 02:40 AM Backport #21095: osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- should backport https://github.com/ceph/ceph/pull/17246 also.
- 02:38 AM Bug #20913: osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- https://github.com/ceph/ceph/pull/17246
- 02:09 AM Bug #20876: BADAUTHORIZER on mgr, hung ceph tell mon.*
- /a/sage-2017-08-24_17:38:40-rados-wip-sage-testing2-luminous-20170824a-distro-basic-smithi/1560473
08/24/2017
- 11:57 PM Bug #21123 (Resolved): osd/PrimaryLogPG: sparse read won't trigger repair correctly
- master PR: https://github.com/ceph/ceph/pull/17221
- 09:59 PM Bug #21121 (Fix Under Review): test_health_warnings.sh can fail
- https://github.com/ceph/ceph/pull/17244
- 09:55 PM Bug #21121: test_health_warnings.sh can fail
- I believe the fix is to subscribe to osdmaps when in the waiting for healthy state. if we are unhealthy because we a...
- 09:54 PM Bug #21121 (Resolved): test_health_warnings.sh can fail
- - test_mark_all_but_last_osds_down marks all but one osd down
- clears noup
- osd.1 fails the is_healthy check beca... - 07:25 PM Bug #20770: test_pidfile.sh test is failing 2 places
- This problem still hasn't been solved. The is disabled, so moving back to verified.
- 07:23 PM Bug #20770 (Resolved): test_pidfile.sh test is failing 2 places
- luminous backport rejected because the test continued to fail
- 07:22 PM Bug #20975 (Resolved): test_pidfile.sh is flaky
- luminous backport: https://github.com/ceph/ceph/pull/17241
- 05:50 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- Nathan Cutler wrote:
> @Vikhyat, I think Abhi just created the luminous backport tracker manually. The jewel one wil... - 05:22 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- @Vikhyat, I think Abhi just created the luminous backport tracker manually. The jewel one will be created automagical...
- 04:36 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- Thanks Nathan. I think some issue and it did not create a tracker for jewel backport so I removed luminous so it can ...
- 03:56 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- Verified that both commits from https://github.com/ceph/ceph/pull/17039 were cherry-picked to luminous.
- 05:25 PM Backport #21117 (Resolved): jewel: osd: osd_scrub_during_recovery only considers primary, not rep...
- https://github.com/ceph/ceph/pull/17815
- 05:23 PM Bug #21092: OSD sporadically starts reading at 100% of ssd bandwidth
- 59.log more obviously shows the issue with repeating part:...
- 10:10 AM Bug #21092 (New): OSD sporadically starts reading at 100% of ssd bandwidth
- luminous v12.1.4
bluestore
Periodically (10 mins) some osd starts reading ssd disk at maximum available speed (45... - 05:22 PM Backport #21106 (Resolved): luminous: CRUSH crash on bad memory handling
- 03:54 PM Bug #21096 (New): osd-scrub-repair.sh:381: unfound_erasure_coded: return 1
- ...
- 03:31 PM Backport #21095 (In Progress): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_del...
- https://github.com/ceph/ceph/pull/17233
- 03:30 PM Backport #21095 (Resolved): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- ...
- 03:14 PM Bug #20913 (Pending Backport): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_del...
- 08:12 AM Bug #19605 (Fix Under Review): OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front(...
- https://github.com/ceph/ceph/pull/17217
- 06:58 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- although all ops in repop_queue are canceled upon pg reset (change), and pg discards messages from down OSDs accordin...
- 03:46 AM Bug #20785 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool...
- 03:46 AM Backport #21090 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid...
- https://github.com/ceph/ceph/pull/17191
- 03:46 AM Backport #21090 (Resolved): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid...
- https://github.com/ceph/ceph/pull/17191
- 03:44 AM Feature #20956 (Resolved): Include front/back interface names in OSD metadata
- 03:36 AM Bug #20970 (Resolved): bug in funciton reweight_by_utilization
- 03:13 AM Feature #21073: mgr: ceph/rgw: show hostnames and ports in ceph -s status output
- ...
- 03:04 AM Backport #21076 (Resolved): luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools()....
- 03:03 AM Backport #21048 (Resolved): luminous: Include front/back interface names in OSD metadata
- 03:02 AM Backport #21077 (Resolved): luminous: osd: osd_scrub_during_recovery only considers primary, not ...
- 03:02 AM Backport #21079 (Resolved): bug in funciton reweight_by_utilization
- 12:30 AM Bug #21016 (Pending Backport): CRUSH crash on bad memory handling
08/23/2017
- 11:02 PM Bug #20730: need new OSD_SKEWED_USAGE implementation
- I've created 2 pull request for Jewel and Kraken to disable this now.
Jewel: https://github.com/ceph/ceph/pull/172... - 08:52 PM Bug #14115: crypto: race in nss init
- Still seeing this in Jewel 10.2.7, Ubuntu 16.04.2 running an application using ceph under Apache:...
- 06:33 PM Bug #21016: CRUSH crash on bad memory handling
- 05:27 PM Bug #18209 (Resolved): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- 05:00 PM Backport #20965 (Resolved): luminous: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= l...
- 01:46 PM Backport #20965 (In Progress): luminous: src/common/LogClient.cc: 310: FAILED assert(num_unsent <...
- 04:09 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
- Add pool-metadata based auth caps. The initial use case is CephFS; if pools are tagged based on filesystem, then auth...
- 01:48 PM Backport #21079 (In Progress): bug in funciton reweight_by_utilization
- 01:47 PM Backport #21079 (Resolved): bug in funciton reweight_by_utilization
- https://github.com/ceph/ceph/pull/17198
- 01:37 PM Backport #21051 (In Progress): luminous: Improve size scrub error handling and ignore system attr...
- 01:30 PM Backport #21077 (In Progress): luminous: osd: osd_scrub_during_recovery only considers primary, n...
- 01:27 PM Backport #21077 (Resolved): luminous: osd: osd_scrub_during_recovery only considers primary, not ...
- https://github.com/ceph/ceph/pull/17195
- 01:26 PM Backport #21048 (In Progress): luminous: Include front/back interface names in OSD metadata
- 01:02 PM Backport #21076 (In Progress): luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools...
- https://github.com/ceph/ceph/pull/17191
- 12:59 PM Backport #21076 (Resolved): luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools()....
- https://github.com/ceph/ceph/pull/17191
- 10:20 AM Bug #16553: Removing Writeback Cache Tier Does not clean up Incomplete_Clones
- It looks like I hit same issue on 10.2.9.
- 08:37 AM Bug #20913 (Fix Under Review): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_del...
- https://github.com/ceph/ceph/pull/17183
- 08:23 AM Feature #21073 (Resolved): mgr: ceph/rgw: show hostnames and ports in ceph -s status output
- Similar to the way we do mds and mgr statuses, we could display the rgw endpoints in ceph status as well, the informa...
- 05:26 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- see also https://github.com/ceph/ceph/pull/17179
08/22/2017
- 11:30 PM Bug #20909 (Fix Under Review): Error ETIMEDOUT: crush test failed with -110: timed out during smo...
- https://github.com/ceph/ceph/pull/17169
- 04:43 PM Bug #20770: test_pidfile.sh test is failing 2 places
- Another change is needed too. I've requested that in the pull request.
https://github.com/ceph/ceph/pull/17052 sh... - 04:21 PM Bug #20770: test_pidfile.sh test is failing 2 places
- David Zafman wrote:
> To backport all the test-pidfile.sh cherry-pick 4 pull requests using the sha1s in this order:... - 04:26 PM Bug #20981: ./run_seed_to_range.sh errored out
- See also here =>
http://qa-proxy.ceph.com/teuthology/yuriw-2017-08-22_14:54:54-rados-wip-yuri-testing_2017_8_22-di... - 03:18 PM Bug #20981: ./run_seed_to_range.sh errored out
- David, can you take a look? This seems to be showing up pretty consistently in rados runs.
- 04:23 PM Bug #20975 (Duplicate): test_pidfile.sh is flaky
- 03:39 PM Bug #20785 (Pending Backport): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(p...
- 02:50 PM Feature #18206 (Pending Backport): osd: osd_scrub_during_recovery only considers primary, not rep...
- 01:18 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- ...
- 01:17 PM Bug #21016 (Fix Under Review): CRUSH crash on bad memory handling
- 06:09 AM Bug #20970 (Pending Backport): bug in funciton reweight_by_utilization
08/21/2017
- 11:53 PM Bug #15741: librados get_last_version() doesn't return correct result after aio completion
- This bug still exists.
- 10:34 PM Bug #19487 (Closed): "GLOBAL %RAW USED" of "ceph df" is not consistent with check_full_status
- Reopen this if issue hasn't been fixed in the latest code with the understanding that each OSD has its own fullness d...
- 04:14 PM Backport #21051 (Resolved): luminous: Improve size scrub error handling and ignore system attrs i...
- https://github.com/ceph/ceph/pull/17196
- 04:13 PM Backport #21048 (Resolved): luminous: Include front/back interface names in OSD metadata
- https://github.com/ceph/ceph/pull/17193
- 03:54 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- # osd.1 sent failure report of osd.0
# osd.1 sent repop 5386 to osd.0
# mon.a marked osd.0 down in osdmap.27
# osd... - 03:52 PM Bug #17138 (Resolved): crush: inconsistent ruleset/ruled_id are difficult to figure out
- 07:44 AM Bug #20981: ./run_seed_to_range.sh errored out
- /a//kchai-2017-08-21_01:51:35-rados-master-distro-basic-smithi/1545907/teuthology.log has debug heartbeatmap = 20.
<... - 03:57 AM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Hi, everyone.
I've found that the reason that clone overlap modifications should pass "is_present_clone" condition... - 02:04 AM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- /a//kchai-2017-08-20_09:42:12-rados-wip-kefu-testing-distro-basic-mira/1545387/
08/18/2017
- 11:20 PM Bug #20770: test_pidfile.sh test is failing 2 places
To backport all the test-pidfile.sh cherry-pick 4 pull requests using the sha1s in this order:
https://github.co...- 11:08 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- https://github.com/ceph/ceph/pull/17039
- 09:34 AM Bug #20981: ./run_seed_to_range.sh errored out
- /a/kchai-2017-08-18_03:03:28-rados-master-distro-basic-mira/1537335...
- 03:12 AM Bug #20243 (Pending Backport): Improve size scrub error handling and ignore system attrs in xattr...
- https://github.com/ceph/ceph/pull/16407
08/17/2017
- 09:47 PM Bug #20332 (Won't Fix): rados bench seq option doesn't work
- 06:01 PM Feature #18206 (Fix Under Review): osd: osd_scrub_during_recovery only considers primary, not rep...
- 02:55 PM Bug #20970 (Fix Under Review): bug in funciton reweight_by_utilization
- 11:19 AM Bug #20970: bug in funciton reweight_by_utilization
- https://github.com/ceph/ceph/pull/17064
- 12:14 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- excerpt of osd.0.log...
- 11:31 AM Bug #20785 (Fix Under Review): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(p...
- https://github.com/ceph/ceph/pull/17065
- 07:52 AM Bug #21016: CRUSH crash on bad memory handling
- I believe this should be fixed by https://github.com/ceph/ceph/pull/17014/commits/6252068ec08c66513e5394188b786978236...
08/16/2017
- 10:34 PM Bug #21016: CRUSH crash on bad memory handling
- ...and this was also responsible for at least a couple failures that got detected as such.
- 10:15 PM Bug #21016 (Resolved): CRUSH crash on bad memory handling
- ...
- 12:04 PM Feature #18206: osd: osd_scrub_during_recovery only considers primary, not replicas
- david, i just read your inquiry over IRC. what would you want me to review for this ticket? do we have a PR for it al...
- 01:48 AM Bug #21005 (New): mon: mon_osd_down_out interval can prompt osdmap creation when nothing is happe...
- I saw a cluster where we had the whole gamut of no* flags set in an attempt to stop it creating maps.
Unfortunatel...
08/15/2017
- 03:40 PM Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster
- Hello,
sorry for the delay
Yes, it appears under flags.... - 01:22 AM Bug #20770 (Pending Backport): test_pidfile.sh test is failing 2 places
08/14/2017
- 10:14 PM Feature #18206 (In Progress): osd: osd_scrub_during_recovery only considers primary, not replicas
- 09:00 PM Bug #20999 (New): rados python library does not document omap API
- The omap API can be fairly important for RADOS applications but it is not documented in the expected location http://...
- 08:32 PM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Note: bug is not present in master, as demonstrated by https://github.com/ceph/ceph/pull/17017
- 08:31 PM Backport #17445 (In Progress): jewel: list-snap cache tier missing promotion logic (was: rbd cli ...
- h3. description
In our ceph cluster some rbd images (create by openstack) make rbd segfault. This is on a ubuntu 1... - 10:48 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- The pull request https://github.com/ceph/ceph/pull/17017
- 10:46 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
- Hi, everyone.
I've just add a new list-snaps test, #17017, which can test whether this problem exists in master br... - 07:40 PM Bug #20770 (Fix Under Review): test_pidfile.sh test is failing 2 places
- 01:55 PM Bug #20985 (Resolved): PG which marks divergent_priors causes crash on startup
- Several other confirmations and a healthy test run later, all merged!
08/13/2017
- 07:20 PM Feature #14527: Lookup monitors through DNS
- The recent code doesn't support IPv6, apparently. Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->m...
- 07:01 PM Bug #20939 (Resolved): crush weight-set + rm-device-class segv
- 06:59 PM Bug #20876: BADAUTHORIZER on mgr, hung ceph tell mon.*
- /a/sage-2017-08-12_21:09:40-rados-wip-sage-testing-20170812a-distro-basic-smithi/1518429...
- 09:17 AM Bug #20985: PG which marks divergent_priors causes crash on startup
- Stephan Hohn wrote:
> I can confirm that this build worked on my test cluster. It's back to HEALTH_OK and all OSDs a... - 09:17 AM Bug #20985: PG which marks divergent_priors causes crash on startup
- I can conform that this build worked on my test cluster. It's back to HEALTH_OK and all OSDs are up.
08/12/2017
- 06:08 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-11_21:54:20-rados-luminous-distro-basic-smithi/1512264
I'm going to whitelist this on luminous bra... - 05:31 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- If anyone wants to validate that the fix packages at https://shaman.ceph.com/repos/ceph/wip-20985-divergent-handling-...
- 09:19 AM Bug #20985: PG which marks divergent_priors causes crash on startup
- Facing the same issue upgrading from jewel 10.2.9 -> luminous 12.1.3 (RC)
- 02:55 AM Bug #20923 (Resolved): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- 02:35 AM Bug #20983 (Resolved): bluestore: failure to dirty src onode on clone with 1-byte logical extent
08/11/2017
- 10:49 PM Bug #20986 (Can't reproduce): segv in crush_destroy_bucket_straw2 on rados/standalone/misc.yaml
- ...
- 10:45 PM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
- ...
- 10:43 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- Luminous at https://github.com/ceph/ceph/pull/17001
- 10:20 PM Bug #20985: PG which marks divergent_priors causes crash on startup
- https://github.com/ceph/ceph/pull/17000
Still compiling, testing, etc - 10:16 PM Bug #20985 (Resolved): PG which marks divergent_priors causes crash on startup
- This was noticed in the course of somebody upgrading from 12.1.1 to 12.1.2:...
- 10:14 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-11_17:22:37-rados-wip-sage-testing-20170811a-distro-basic-smithi/1511996
- 10:12 PM Bug #20959: cephfs application metdata not set by ceph.py
- https://github.com/ceph/ceph/pull/16954
- 02:29 AM Bug #20959 (Resolved): cephfs application metdata not set by ceph.py
- 05:36 PM Bug #20770: test_pidfile.sh test is failing 2 places
- 05:34 AM Bug #20770 (In Progress): test_pidfile.sh test is failing 2 places
- 04:46 PM Bug #20983: bluestore: failure to dirty src onode on clone with 1-byte logical extent
- https://github.com/ceph/ceph/pull/16994
- 04:45 PM Bug #20983 (Resolved): bluestore: failure to dirty src onode on clone with 1-byte logical extent
- symptom is...
- 04:27 PM Bug #20981: ./run_seed_to_range.sh errored out
- Super weird.. looks like a race between heartbeat timeout and a failure injection maybe?...
- 01:26 PM Bug #20981 (Can't reproduce): ./run_seed_to_range.sh errored out
- ...
- 01:00 PM Bug #20974 (Fix Under Review): osd/PG.cc: 3377: FAILED assert(r == 0) (update_snap_map remove fails)
- https://github.com/ceph/ceph/pull/16982
08/10/2017
- 07:59 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Yes, but osd.0 doing that is very incorrect. We've had some problems in this area before with marking stuff down not ...
- 10:20 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- greg, osd.0 failed to send the reply of tid 5386 over the wire because it was disconnected. but it managed to send th...
- 07:41 PM Bug #20975: test_pidfile.sh is flaky
- https://github.com/ceph/ceph/pull/16977
- 07:41 PM Bug #20975 (Resolved): test_pidfile.sh is flaky
- fails regularly on make check. disabling it for now.
- 04:41 PM Bug #20939: crush weight-set + rm-device-class segv
- 04:15 PM Feature #20956 (Pending Backport): Include front/back interface names in OSD metadata
- 04:12 PM Bug #20949 (Resolved): mon: quorum incorrectly believes mon has kraken (not jewel) features
- 03:49 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Moving this back to RADOS -- changing librbd to force a full object diff if an object exists in the cache tier seems ...
- 02:16 PM Bug #20974 (Can't reproduce): osd/PG.cc: 3377: FAILED assert(r == 0) (update_snap_map remove fails)
- ...
- 01:33 PM Bug #20958 (Resolved): missing set lost during upgrade
- also backported
- 01:23 PM Bug #20973 (Can't reproduce): src/osdc/ Objecter.cc: 3106: FAILED assert(check_latest_map_ops.fin...
- ...
- 07:04 AM Bug #20970 (Resolved): bug in funciton reweight_by_utilization
- There is one bug in function OSDMonitor::reweight_by_utilization ...
08/09/2017
- 09:34 PM Bug #20798 (Need More Info): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again befo...
- 09:15 PM Bug #20959 (In Progress): cephfs application metdata not set by ceph.py
- So far I've identified three problems in the source:
1) we don't check that we're in luminous mode before the MDS se... - 07:57 PM Bug #20959: cephfs application metdata not set by ceph.py
- As I reported in #20891 I am seeing this on fresh luminous clusters.
- 07:56 PM Bug #20959: cephfs application metdata not set by ceph.py
- Okay, unlike the previous log I looked at, the "fs new" command is clearly *not* triggering a new osd map commit. We ...
- 07:53 PM Bug #20959: cephfs application metdata not set by ceph.py
- Hmm, this still doesn't make sense. The cluster started out as luminous and so the maps would always have the luminou...
- 04:19 PM Bug #20959: cephfs application metdata not set by ceph.py
- The bug I hit before was doing the right checks on encoding, *but* the pending_inc was applied to the in-memory mon c...
- 03:29 PM Bug #20959: cephfs application metdata not set by ceph.py
- We're encoding with the quorum features, though, so I don't think that could actually cause a problem, Maybe though.
- 03:23 PM Bug #20959: cephfs application metdata not set by ceph.py
- Sage was right, the MDSMonitor unconditionally calls do_application_enable() and that unconditionally sets applicatio...
- 03:06 PM Bug #20959 (Resolved): cephfs application metdata not set by ceph.py
- "2017-08-09 06:52:11.115593 mon.a mon.0 172.21.15.12:6789/0 154 : cluster [WRN] Health check failed: application not ...
- 07:54 PM Bug #20920 (Resolved): pg dump fails during point-to-point upgrade
- 07:26 PM Bug #20920: pg dump fails during point-to-point upgrade
- https://github.com/ceph/ceph/pull/16871
- 07:54 PM Backport #20963 (Resolved): luminous: pg dump fails during point-to-point upgrade
- Manually cherry-picked to luminous ahead of the 12.2.0 release.
- 06:32 PM Backport #20963 (Resolved): luminous: pg dump fails during point-to-point upgrade
- 07:33 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- I'm not really sure how we could reasonably handle this scenario on the Ceph side. Seems like we should adjust the te...
- 07:06 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- meanwhile on osd.2, start is...
- 06:46 PM Bug #20960: ceph_test_rados: mismatched version (due to pg import/export)
- second write to teh object sets uv482...
- 06:09 PM Bug #20960 (Can't reproduce): ceph_test_rados: mismatched version (due to pg import/export)
- ...
- 07:20 PM Bug #20947 (Resolved): OSD and mon scrub cluster log messages are too verbose
- 09:48 AM Bug #20947 (Pending Backport): OSD and mon scrub cluster log messages are too verbose
- 07:20 PM Backport #20961 (Resolved): luminous: OSD and mon scrub cluster log messages are too verbose
- Manually cherry-picked to luminous branch.
- 06:32 PM Backport #20961 (Resolved): luminous: OSD and mon scrub cluster log messages are too verbose
- 06:34 PM Backport #20965 (Resolved): luminous: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= l...
- https://github.com/ceph/ceph/pull/17197
- 06:19 PM Bug #20958: missing set lost during upgrade
- 06:14 PM Bug #20958: missing set lost during upgrade
- 05:47 PM Bug #20958: missing set lost during upgrade
- 04:17 PM Bug #20958: missing set lost during upgrade
- It looks like a bug in the jewel->luminous conversion:
* jewel doesn't save the missing set
* luminous detects th... - 02:12 PM Bug #20958: missing set lost during upgrade
- osd.3 send empty msising to primary at...
- 01:50 PM Bug #20958 (Resolved): missing set lost during upgrade
- pg 4.3...
- 05:46 PM Bug #18209 (Pending Backport): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queu...
- 12:00 PM Bug #20888 (Fix Under Review): "Health check update" log spam
- https://github.com/ceph/ceph/pull/16942
- 11:54 AM Feature #20956: Include front/back interface names in OSD metadata
- https://github.com/ceph/ceph/pull/16941
- 11:52 AM Feature #20956 (Resolved): Include front/back interface names in OSD metadata
- This information is needed by anyone who has a TSDB/dashboard that wants to correlate their NIC statistics with the u...
- 05:28 AM Bug #20952 (Can't reproduce): Glitchy monitor quorum causes spurious test failure
qa/standalone/mon/misc.sh failed in TEST_mon_features()
http://qa-proxy.ceph.com/teuthology/dzafman-2017-08-08_1...- 02:34 AM Bug #20925 (Resolved): bluestore: bad csum during fsck
08/08/2017
- 10:43 PM Bug #20949 (Resolved): mon: quorum incorrectly believes mon has kraken (not jewel) features
- mon.2 is the last mon to restart:...
- 10:13 PM Bug #20923 (Fix Under Review): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(las...
- https://github.com/ceph/ceph/pull/16924
- 09:10 PM Bug #20863 (Duplicate): CRC error does not mark PG as inconsistent or queue for repair
- 06:37 PM Bug #20863: CRC error does not mark PG as inconsistent or queue for repair
- This will be available in Luminous, see http://tracker.ceph.com/issues/19657
- 06:57 PM Bug #20947: OSD and mon scrub cluster log messages are too verbose
- https://github.com/ceph/ceph/pull/16916
- 06:56 PM Bug #20947 (Resolved): OSD and mon scrub cluster log messages are too verbose
- ...
- 06:43 PM Bug #20875 (Duplicate): mon segv during shutdown
- 06:16 PM Bug #20645: bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- 06:00 PM Bug #20944 (Fix Under Review): OSD metadata 'backend_filestore_dev_node' is "unknown" even for si...
- https://github.com/ceph/ceph/pull/16913
- 01:17 PM Bug #20944: OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple deployment
- Should have also said: bluestore was populating its bluestore_bdev_dev_node correctly on the same server and drive --...
- 01:16 PM Bug #20944 (Resolved): OSD metadata 'backend_filestore_dev_node' is "unknown" even for simple dep...
OSD created using ceph-deploy "ceph-deploy osd create --filestore", metadata after starting up is:...- 03:41 PM Bug #19881 (Can't reproduce): ceph-osd: pg_update_log_missing(1.20 epoch 66/11 rep_tid 1493 entri...
- 03:39 PM Bug #20116 (Can't reproduce): osds abort on shutdown with assert(ceph/src/osd/OSD.cc: 4324: FAILE...
- 03:39 PM Bug #20188 (Can't reproduce): filestore: os/filestore/FileStore.h: 357: FAILED assert(q.empty()) ...
- 03:39 PM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
- 03:35 PM Bug #20543: osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry found") in PGLog::re...
- Probably the incorrectly-assessed "out-of-order" op numbers.
- 03:35 PM Bug #20543 (Can't reproduce): osd/PGLog.h: 1257: FAILED assert(0 == "invalid missing set entry fo...
- 03:33 PM Bug #20626 (Can't reproduce): failed to become clean before timeout expired, pgs stuck unknown
- 01:58 PM Bug #20925: bluestore: bad csum during fsck
- https://github.com/ceph/ceph/pull/16900
- 01:19 PM Bug #20925: bluestore: bad csum during fsck
- deferred writes are completing out of order. this is fallout from ca32d575eb2673737198a63643d5d1923151eba3.
08/07/2017
- 10:43 PM Bug #20919 (Fix Under Review): osd: replica read can trigger cache promotion
- https://github.com/ceph/ceph/pull/16884
- 10:32 PM Bug #20939 (Fix Under Review): crush weight-set + rm-device-class segv
- https://github.com/ceph/ceph/pull/16883
- 08:49 PM Bug #20939 (Resolved): crush weight-set + rm-device-class segv
- Although that is probably just one of many problems; weight-set and device classes don't play well together.
- 07:49 PM Bug #20920 (Pending Backport): pg dump fails during point-to-point upgrade
- 07:02 PM Bug #20933 (Closed): All mon nodes down when i use ceph-disk prepare a new osd.
- Sage thinks this has been fixed ("[12:02:12] <sage> oh, it was a problem with the reusing osd ids"). Please update t...
- 07:00 PM Bug #20933: All mon nodes down when i use ceph-disk prepare a new osd.
- Apparently this is the result of a typo: https://www.spinics.net/lists/ceph-users/msg37317.html
But I'm not sure t... - 09:07 AM Bug #20933 (Closed): All mon nodes down when i use ceph-disk prepare a new osd.
- ceph version 12.1.0 (262617c9f16c55e863693258061c5b25dea5b086) luminous (dev)
when "ceph-disk prepare --bluestore ... - 04:51 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Sage Weil wrote:
> [...]
> This object is larger than 32bits (4gb), which bluestore does not allow/support. Why ar... - 04:36 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- ...
- 01:44 PM Bug #20923: ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Sage Weil wrote:
> can you reproduce with debug bluestore = 1/30 and attach the resulting log?
Here it comes (obj... - 01:21 AM Bug #20923 (Need More Info): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last ...
- can you reproduce with debug bluestore = 1/30 and attach the resulting log?
- 03:19 PM Bug #20922: misdirected op with localize_reads set
- Well, the issue is not immediately apparent, but _calc_target() is pretty complicated and we're feeding in a not-tota...
- 02:28 PM Bug #20475 (Resolved): EPERM: cannot set require_min_compat_client to luminous: 6 connected clien...
- 02:27 PM Backport #20639 (Resolved): jewel: EPERM: cannot set require_min_compat_client to luminous: 6 con...
- 08:22 AM Tasks #20932 (New): run rocksdb's env_test with our BlueRocksEnv
- 07:41 AM Backport #20930 (Rejected): kraken: assert(i->prior_version == last) when a MODIFY entry follows ...
- 01:16 AM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-08-06_16:51:13-rados-wip-sage-testing2-20170806a-distro-basic-smithi/1490528
08/06/2017
- 07:08 PM Bug #19191 (Resolved): osd/ReplicatedBackend.cc: 1109: FAILED assert(!parent->get_log().get_missi...
- 07:06 PM Bug #20925 (Resolved): bluestore: bad csum during fsck
- ...
- 07:05 PM Bug #20924 (Resolved): osd: leaked Session on osd.7
- ...
- 07:03 PM Bug #20910: spurious MON_DOWN, apparently slow/laggy mon
- /a/sage-2017-08-06_13:59:55-rados-wip-sage-testing-20170805a-distro-basic-smithi/1490103
seeing a lot of these. - 09:36 AM Bug #20923 (Resolved): ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)
- Running 12.1.1 RC1 OSD:s, currently doing inline migration to BlueStore (ceph osd destroy procedure). Getting these a...
08/05/2017
- 06:23 PM Bug #20922 (New): misdirected op with localize_reads set
- ...
- 05:47 PM Bug #20770: test_pidfile.sh test is failing 2 places
- 05:47 PM Bug #20770: test_pidfile.sh test is failing 2 places
- This is still failing sometimes in TEST_without_pidfile() even after adding a sleep 1.
- 03:32 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" obje...
- 03:30 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" obje...
- 03:34 AM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- This may be a bluestore bug - the log is so large from bluestore debugging that I haven't had time to properly read i...
- 02:32 AM Bug #20843 (Pending Backport): assert(i->prior_version == last) when a MODIFY entry follows an ER...
- Backport only needed for kraken, jewel does not have error log entries.
- 12:03 AM Bug #20920: pg dump fails during point-to-point upgrade
- Do we have a "legacy" command map that matches the pre-luminous ones? I think we just need to use that for the comman...
08/04/2017
- 10:25 PM Bug #20920 (Resolved): pg dump fails during point-to-point upgrade
- Command failed on smithi021 with status 22: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage...
- 09:03 PM Bug #20919: osd: replica read can trigger cache promotion
- a replica was servicing a read and tried to do a cache promotion:...
- 08:53 PM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- ...
- 07:23 PM Bug #20561 (Can't reproduce): bluestore: segv in _deferred_submit_unlock from deferred_try_submit...
- 06:20 PM Bug #20904 (Resolved): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-unf...
- 06:40 AM Bug #20904 (Fix Under Review): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on ...
- https://github.com/ceph/ceph/pull/16809
- 12:40 AM Bug #20904 (In Progress): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-...
- Think I found the problem, testing a fix.
- 06:17 PM Bug #20913 (Resolved): osd: leak from osd/PGBackend.cc:136 PGBackend::handle_recovery_delete()
- ...
- 06:00 PM Bug #18209 (Fix Under Review): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queu...
- https://github.com/ceph/ceph/pull/16828
- 03:56 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- /a/sage-2017-08-04_13:49:55-rbd:singleton-bluestore-wip-sage-testing2-20170803b-distro-basic-mira/1482623...
- 04:04 PM Bug #20295 (Resolved): bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool ...
- 01:59 PM Bug #20910 (Resolved): spurious MON_DOWN, apparently slow/laggy mon
- mon shows very slow progress for ~10 seconds, failing to send lease renewals etc, and triggering an election...
- 01:50 PM Bug #20845 (Resolved): Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- 01:46 PM Bug #20909 (Can't reproduce): Error ETIMEDOUT: crush test failed with -110: timed out during smok...
- ...
- 01:37 PM Bug #20908 (Resolved): qa/standalone/misc failure in TEST_mon_features
- ...
- 01:35 PM Bug #20133: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksdb+librados
- /a/sage-2017-08-04_05:23:06-rados-wip-sage-testing-20170803-distro-basic-smithi/1481973
- 08:41 AM Bug #20227: os/bluestore/BlueStore.cc: 2617: FAILED assert(0 == "can't mark unloaded shard dirty")
- Hit the same assert in http://qa-proxy.ceph.com/teuthology/joshd-2017-08-04_06:16:52-rados-wip-20904-distro-basic-smi...
- 07:15 AM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I mean I think it's the condition check "is_present_clone" that
prevent the clone overlap to record the client write... - 04:54 AM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Hi, grep:-)
I finally got what you mean in https://github.com/ceph/ceph/pull/16790..
I agree with you in that "... - 12:58 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- osd.1 in the posted log has pg 1.4 in epoch 26 from the time it first dequeues those operations right up until it cra...
08/03/2017
- 11:52 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- from irc:
<joshd>:
> I'd suggest making rbd diff conservative when it's used with cache pools (if necessary, repo... - 11:40 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- > the reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier poo...
- 11:31 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- The reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier pool,...
- 03:00 PM Bug #20896: export_diff relies on clone_overlap, which is lost when cache tier is enabled
- I submitted a pr for this: https://github.com/ceph/ceph/pull/16790
- 02:46 PM Bug #20896 (New): export_diff relies on clone_overlap, which is lost when cache tier is enabled
- Recently, we find that, under some circumstance, in the cache tier, the "HEAD" object's clone_overlap can lose some O...
- 11:44 PM Bug #20798 (In Progress): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- 08:47 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- ...
- 11:28 PM Bug #20871 (In Progress): core dump when bluefs's mkdir returns -EEXIST
- 02:42 PM Bug #20871: core dump when bluefs's mkdir returns -EEXIST
- https://github.com/ceph/ceph/pull/16745/commits/6bb89702c1cae44558480f72c2723f564308f822
- 06:57 PM Bug #20904 (Resolved): cluster [ERR] 2.e shard 2 missing 2:70b3bf12:::existing_4:head on lost-unf...
- ...
- 06:22 PM Bug #20810 (Resolved): fsck finish with 29 errors in 47.732275 seconds
- 06:22 PM Bug #20844 (Resolved): peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-ove...
- 02:49 PM Bug #20844 (Fix Under Review): peering_blocked_by_history_les_bound on workloads/ec-snaps-few-obj...
- https://github.com/ceph/ceph/pull/16789
- 01:51 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- This appears to be a test problem:
- the thrashosds has 'chance_test_map_discontinuity: 0.5', which will mark an o... - 09:59 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- mon.a.log...
- 09:42 AM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- ...
- 09:05 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
- A user writes:...
- 02:46 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- https://github.com/ceph/ceph/pull/16769
08/02/2017
- 10:46 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
txn Z queues deferred io,...- 09:46 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- Kefu Chai wrote:
> checked the actingset and actingbackfill of the PG of the crashed osd using gdb, they are not cha... - 06:23 PM Bug #20888 (Resolved): "Health check update" log spam
- (We've known about this for a while, just need to fix it!)
The health checks for PG related stuff get updated when... - 03:32 PM Bug #20301 (Can't reproduce): "/src/osd/SnapMapper.cc: 231: FAILED assert(r == -2)" in rados
- 03:31 PM Bug #20416 (Need More Info): "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgrade...
- 03:29 PM Bug #20616: pre-luminous: aio_read returns erroneous data when rados_osd_op_timeout is set but no...
- 03:28 PM Bug #20690 (Need More Info): Cluster status is HEALTH_OK even though PGs are in unknown state
- why can't cephfs be mounted when pgs are unknown?
- 03:25 PM Bug #20791 (Duplicate): crash in operator<< in PrimaryLogPG::finish_copyfrom
- 03:21 PM Bug #20843 (Fix Under Review): assert(i->prior_version == last) when a MODIFY entry follows an ER...
- https://github.com/ceph/ceph/pull/16675
- 03:14 PM Bug #20551 (Duplicate): LOST_REVERT assert during rados bench+thrash in ReplicatedBackend::prepar...
- 03:12 PM Bug #20545 (Duplicate): erasure coding = crashes
- I think this is the same as #20295, which we can now reproduce.
- 03:02 PM Bug #20785 (Need More Info): osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgi...
- 02:40 PM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
- 02:31 PM Bug #18595 (Pending Backport): bluestore: allocator fails for 0x80000000 allocations
- 02:40 PM Backport #20884 (Resolved): kraken: bluestore: allocator fails for 0x80000000 allocations
- 02:33 PM Backport #20884 (Resolved): kraken: bluestore: allocator fails for 0x80000000 allocations
- https://github.com/ceph/ceph/pull/13011
- 02:11 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- /a/sage-2017-08-02_01:58:49-rados-wip-sage-testing-distro-basic-smithi/1470073
pg 2.d on [5,1,4] - 01:57 PM Bug #20876: BADAUTHORIZER on mgr, hung ceph tell mon.*
- /a/sage-2017-08-02_01:58:49-rados-wip-sage-testing-distro-basic-smithi/1469949
- 01:57 PM Bug #20876 (Can't reproduce): BADAUTHORIZER on mgr, hung ceph tell mon.*
- ...
- 01:18 PM Bug #20875 (Duplicate): mon segv during shutdown
- ...
- 01:14 PM Bug #20874 (Can't reproduce): osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end()...
- ...
08/01/2017
- 07:47 PM Bug #20810 (Fix Under Review): fsck finish with 29 errors in 47.732275 seconds
- https://github.com/ceph/ceph/pull/16738
- 07:14 PM Bug #20793 (Resolved): osd: segv in CopyFromFinisher::execute in ec cache tiering test
- 07:13 PM Bug #20803 (Resolved): ceph tell osd.N config set osd_max_backfill does not work
- 07:12 PM Bug #20850 (Resolved): osd: luminous osd crashes when older monitor doesn't support set-device-class
- 07:11 PM Bug #20808 (Resolved): osd deadlock: forced recovery
- 07:03 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- ...
- 07:02 PM Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yaml
- /a/sage-2017-08-01_15:32:10-rados-wip-sage-testing-distro-basic-smithi/1469176
rados/thrash-erasure-code/{ceph.yam... - 03:03 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- New (hopefully more "mergeable") reproducer: https://github.com/ceph/ceph/pull/16731
- 02:02 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- This job reproduces the issue: http://pulpito.ceph.com/smithfarm-2017-08-01_13:28:09-rbd:singleton-master-distro-basi...
- 01:41 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- Nathan has a teuthology unit to, hopefully, flush this out: https://github.com/ceph/ceph/pull/16728
He also has a ... - 01:38 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- As far as I can tell, the differences seem to simply be the `--io-total`, and in most cases the `--io-size` or number...
- 01:16 PM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- Any idea how your test case varies from what's in the rbd suite?
- 11:35 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- For clarity's sake: the previous comment lacked the version. This is a recent master build (fa70335); from yesterday,...
- 11:26 AM Bug #20295: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites
- We've been reproducing this reliably on one of our test clusters.
This is a cluster composed of mostly hdds, 32G R... - 02:53 PM Bug #20845 (In Progress): Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- 02:39 PM Bug #20871 (Resolved): core dump when bluefs's mkdir returns -EEXIST
- ...
- 02:13 PM Bug #19605: OSD crash: PrimaryLogPG.cc: 8396: FAILED assert(repop_queue.front() == repop)
- if osd.1 is down, osd.2 should have started a peering. and repop_queue should be flushed by on_change() in start_peer...
- 12:44 PM Documentation #20867 (Closed): OSD::build_past_intervals_parallel()'s comment is stale
- PG::generate_past_intervals() was removed in 065bb89ca6d85cdab49db1d06c858456c9bbd2c8
- 12:14 PM Backport #20638 (Resolved): kraken: EPERM: cannot set require_min_compat_client to luminous: 6 co...
- 02:35 AM Bug #20242 (Resolved): Make osd-scrub-repair.sh unit test run faster
- https://github.com/ceph/ceph/pull/16513
Moved long running tests into qa/standalone to be run by teuthology instea...
07/31/2017
- 11:18 PM Bug #20784 (Duplicate): rados/standalone/erasure-code.yaml failure
- 09:47 PM Bug #20808 (Fix Under Review): osd deadlock: forced recovery
- https://github.com/ceph/ceph/pull/16712
- 09:03 PM Bug #20808: osd deadlock: forced recovery
- We're holding the pg_map_lock the whole time too, which I don't think is gonna work either (we certainly want to avoi...
- 03:50 PM Bug #20808: osd deadlock: forced recovery
- We use the pg_lock to protect the state field - so looking at this code more closely, the pg lock should be taken in ...
- 07:20 AM Bug #20808: osd deadlock: forced recovery
- Possible fix: https://github.com/ovh/ceph/commit/d92ce63b0f1953852bd1d520f6ad55acc6ce1c07
Does it look reasonable? I... - 08:54 PM Bug #20854 (Duplicate): (small-scoped) recovery_lock being blocked by pg lock holders
- 08:43 PM Bug #20854: (small-scoped) recovery_lock being blocked by pg lock holders
- That's from https://github.com/ceph/ceph/pull/13723, which was 7 days ago.
- 08:43 PM Bug #20854: (small-scoped) recovery_lock being blocked by pg lock holders
- Naively this looks like something else was blocked while holding the recovery_lock, which is a bit scary since that s...
- 03:48 PM Bug #20863 (Duplicate): CRC error does not mark PG as inconsistent or queue for repair
- While testing bitrot detection it was found that even when OSD process has detected CRC mismatch and returned an erro...
- 03:32 PM Bug #20845: Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- http://qa-proxy.ceph.com/teuthology/kchai-2017-07-31_14:22:05-rados-wip-kefu-testing-distro-basic-mira/1465207/teutho...
- 01:22 PM Bug #20845: Error ENOENT: cannot link item id -16 name 'host2' to location {root=bar}
- https://github.com/ceph/ceph/pull/16805
- 01:29 PM Bug #20803 (Fix Under Review): ceph tell osd.N config set osd_max_backfill does not work
- https://github.com/ceph/ceph/pull/16700
- 09:37 AM Bug #20803 (In Progress): ceph tell osd.N config set osd_max_backfill does not work
- OK, looks like this is setting the option (visible in "config show") but not calling the handlers properly (not refle...
- 07:18 AM Bug #19512: Sparse file info in filestore not propagated to other OSDs
- Enabled FIEMAP/SEEK_HOLE in QA here: https://github.com/ceph/ceph/pull/15939
- 02:26 AM Bug #20785: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
- https://github.com/ceph/ceph/pull/16677 is posted to help debug this issue.
07/30/2017
Also available in: Atom