Project

General

Profile

Activity

From 06/29/2018 to 07/28/2018

07/28/2018

07:55 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326 Sage Weil
02:46 PM Bug #25146 (Resolved): "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:paralle...
This is on mew mimic-x suite https://github.com/ceph/ceph/pull/23292
Run: http://pulpito.ceph.com/yuriw-2018-07-27_2...
Yuri Weinstein
09:12 AM Backport #25143 (In Progress): luminous: mimic selinux denials comm="tp_fstore_op / comm="ceph-o...
Nathan Cutler
09:11 AM Backport #25143 (Resolved): luminous: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd ...
https://github.com/ceph/ceph/pull/23296 Nathan Cutler
09:11 AM Backport #25142 (In Progress): mimic: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd ...
Nathan Cutler
09:10 AM Backport #25142 (Resolved): mimic: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev...
https://github.com/ceph/ceph/pull/23295 Nathan Cutler
09:11 AM Backport #25145 (Resolved): luminous: Automatically set expected_num_objects for new pools with >...
https://github.com/ceph/ceph/pull/24395 Nathan Cutler
09:11 AM Backport #25144 (Resolved): mimic: Automatically set expected_num_objects for new pools with >=10...
https://github.com/ceph/ceph/pull/23860 Nathan Cutler

07/27/2018

11:55 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Mimic back-port:
https://github.com/ceph/ceph/pull/23295
Luminous back-port:
https://github.com/ceph/ceph/pu...
Boris Ranto
10:18 PM Bug #24785 (Pending Backport): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-...
Boris Ranto
07:01 AM Bug #24785 (Fix Under Review): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-...
Boris Ranto
07:00 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
The manual testing suggests this should fix this issue:
https://github.com/ceph/ceph/pull/23278
Boris Ranto
03:03 AM Bug #23352: osd: segfaults under normal operation
Dan van der Ster wrote:
> Can we see that state from the coredump somehow? Basically none of our clusters should hav...
Brad Hubbard

07/26/2018

07:35 PM Backport #23670 (Need More Info): luminous: auth: ceph auth add does not sanity-check caps
non-trivial backport. One attempt was already made - https://github.com/ceph/ceph/pull/21361 - but it was implicated ... Nathan Cutler
07:31 PM Backport #23670 (Rejected): luminous: auth: ceph auth add does not sanity-check caps
see discussion in https://github.com/ceph/ceph/pull/21361 Nathan Cutler
07:17 PM Feature #24949: luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
https://github.com/ceph/ceph/pull/23236
Includes backport from master of https://github.com/ceph/ceph/pull/23217
David Zafman
07:15 PM Backport #25128 (Resolved): mimic: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
https://github.com/ceph/ceph/pull/23272
(includes backport of https://tracker.ceph.com/issues/25085 from master)
David Zafman
07:12 PM Backport #25127 (Resolved): luminous: Allow repair of an object with a bad data_digest in object_...
https://github.com/ceph/ceph/pull/23236 David Zafman
07:11 PM Backport #25126 (Resolved): mimic: Allow repair of an object with a bad data_digest in object_inf...
https://github.com/ceph/ceph/pull/23272 David Zafman
05:47 PM Bug #24687 (Pending Backport): Automatically set expected_num_objects for new pools with >=100 PG...
Douglas Fuller
05:17 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Removed pgcalc message while pgcalc updates are considered Douglas Fuller
05:19 PM Cleanup #25124 (New): Add message to consult pgcalc for expected_num_objects
Currently we warn the user when attempting to create a filestore pool that appears to be intended to store a large nu... Douglas Fuller
03:58 PM Bug #25106: Ceph-osd coredumps on launch
Either the patch here: https://github.com/ceph/ceph/pull/22954
Doesn't fix the bug, or this is not a duplicate iss...
Michael Jones
03:58 AM Bug #25108: object errors found in be_select_auth_object() aren't logged the same

I ran a subtest of osd-scrub-repair based on pull request https://github.com/ceph/ceph/pull/23217. I also added a ...
David Zafman
03:18 AM Bug #24664: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
Need help with the luminous backport, which is needed to fix a failure in upgrade/luminous-x. Nathan Cutler
12:40 AM Bug #25112 (Fix Under Review): osd,mon: increase mon_max_pg_per_osd to 250
https://github.com/ceph/ceph/pull/23251 Neha Ojha
12:19 AM Bug #25112 (Resolved): osd,mon: increase mon_max_pg_per_osd to 250
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1603615 Neha Ojha
12:21 AM Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse task
It appears the crash can be explained on the same basis as in the case of "bug #24664":https://tracker.ceph.com/issue... Radoslaw Zarzynski

07/25/2018

10:41 PM Bug #25106 (Duplicate): Ceph-osd coredumps on launch
this will be fixed in 13.2.1 Josh Durgin
06:10 PM Bug #25106 (Duplicate): Ceph-osd coredumps on launch
See https://tracker.ceph.com/issues/24993
The problem:
ceph-volume lvm create --bluestore --data /dev/sda <- wor...
Michael Jones
10:39 PM Bug #24667: osd: SIGSEGV in MMgrReport::encode_payload
downgrading due to lack of recurrence Josh Durgin
07:59 PM Bug #25108 (Resolved): object errors found in be_select_auth_object() aren't logged the same

object errors found in be_select_auth_object() aren't logged the same as errors found in be_compare_scrub_objects()...
David Zafman
01:59 PM Backport #25101 (In Progress): mimic: jewel->luminous: osdmap crc mismatch
Nathan Cutler
01:58 PM Backport #25101 (Resolved): mimic: jewel->luminous: osdmap crc mismatch
Nathan Cutler
01:57 PM Backport #25101 (Resolved): mimic: jewel->luminous: osdmap crc mismatch
https://github.com/ceph/ceph/pull/23226 Nathan Cutler
01:58 PM Backport #25100 (Resolved): luminous: jewel->luminous: osdmap crc mismatch
Nathan Cutler
01:57 PM Backport #25100 (In Progress): luminous: jewel->luminous: osdmap crc mismatch
Nathan Cutler
01:56 PM Backport #25100 (Resolved): luminous: jewel->luminous: osdmap crc mismatch
https://github.com/ceph/ceph/pull/23227 Nathan Cutler
12:27 PM Bug #25057: jewel->luminous: osdmap crc mismatch
luminous: https://github.com/ceph/ceph/pull/23227
mimic: https://github.com/ceph/ceph/pull/23226
Sage Weil
12:07 PM Bug #25057: jewel->luminous: osdmap crc mismatch
The problem was that CRUSH_TUNABLES5 was associated with kraken instead of jewel in 0ceb5c0, backported to luminous ... Sage Weil
12:00 PM Bug #25057 (Pending Backport): jewel->luminous: osdmap crc mismatch
https://github.com/ceph/ceph/pull/23220 Sage Weil
09:01 AM Bug #23352: osd: segfaults under normal operation
Dan van der Ster wrote:
> * The OSD health metric changes sure are a juicy candidate to be the root cause -- but we ...
Dan van der Ster
08:11 AM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> I was also thinking that, since the OSDHealthMetric related code only triggers when there are ...
Dan van der Ster
02:37 AM Bug #23352: osd: segfaults under normal operation
Thanks Roberto,
Your core, as well as the last uploaded by Alex show the now familiar corruption to the vtable of ...
Brad Hubbard

07/24/2018

08:54 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
https://github.com/ceph/ceph/pull/23211 Neha Ojha
06:34 PM Backport #24988 (In Progress): luminous: Limit pg log length during recovery/backfill so that we ...
Neha Ojha
08:52 PM Feature #25085 (In Progress): Allow repair of an object with a bad data_digest in object_info on ...
David Zafman
08:51 PM Feature #25085: Allow repair of an object with a bad data_digest in object_info on all replicas
https://github.com/ceph/ceph/pull/23217 David Zafman
08:46 PM Feature #25085 (Resolved): Allow repair of an object with a bad data_digest in object_info on all...

We've seen this due to a bug in Luminous 12.2.6, but it may have been seen in other cases.
David Zafman
08:44 PM Bug #25084 (Resolved): Attempt to read object that can't be repaired loops forever

If all replicas are of an object are bad causes a loop of continuous recovery and calls to rep_repair_primary_objec...
David Zafman
07:18 PM Bug #25057 (In Progress): jewel->luminous: osdmap crc mismatch
Sage Weil
06:41 PM Bug #25057: jewel->luminous: osdmap crc mismatch
/a/teuthology-2018-07-20_04:23:01-upgrade:jewel-x-luminous-distro-basic-smithi/2799173
is an instance where the mo...
Sage Weil
11:07 AM Bug #25076 (Duplicate): MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fu...
Teuthology log: http://qa-proxy.ceph.com/teuthology/smithfarm-2018-07-24_02:10:24-upgrade:luminous-x-mimic-distro-bas... Nathan Cutler
09:27 AM Bug #23352: osd: segfaults under normal operation
Hi Brad,
We spotted again this issue in one of our clusters, just 2 hours after we upgraded from 12.2.5 -> 12.2.7...
Roberto Valverde
09:10 AM Bug #23352: osd: segfaults under normal operation
Thanks Alex,
I'll check out the core tomorrow and let you know.
I have been working on instrumenting the ceph-o...
Brad Hubbard
02:06 AM Documentation #4640 (Resolved): rados.8 should document import/export
https://github.com/ceph/ceph/pull/23186 Nathan Cutler

07/23/2018

09:52 PM Bug #24909: RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as an op)
Jason Dillaman wrote:
> https://github.com/ceph/ceph/pull/23029
merged
Yuri Weinstein
09:40 PM Bug #21496 (Fix Under Review): doc: Manually editing a CRUSH map, Word 'type' missing.
https://github.com/ceph/ceph/pull/23192 Jos Collin
07:37 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
Jos Collin wrote:
> Remy, Please create a PR.
Done.
Anonymous
05:16 PM Feature #1203: osd: priority or fairness osd operations
https://github.com/ceph/dmclock Patrick Donnelly
04:52 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
Patrick Donnelly wrote:
> Please seek assistance for these kinds of issues on ceph-users mailing list.
Hi Patrick...
Alon Avrahami
04:46 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
Please seek assistance for these kinds of issues on ceph-users mailing list. Patrick Donnelly
02:27 PM Bug #23352: osd: segfaults under normal operation
After the upgrade to 12.2.7 I am still seeing crashes on OSDs. Please check and advise if a separate tracker should b... Alex Gorbachev
10:28 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Robert Sander wrote:
> On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do...
Brad Hubbard
09:54 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Brad Hubbard wrote:
> For the data_digest_mismatch_info error with client activity stopped, read the data from thi...
Robert Sander
09:18 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Oops, my mistake, terribly sorry. I gave you the procedure for an omap_digest_mismatch_info error.
For the data_di...
Brad Hubbard
08:10 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Brad Hubbard wrote:
> 1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-k...
Robert Sander
10:09 AM Bug #24835: osd daemon spontaneous segfault
After spending a week trying to get Ubuntu/systemd to allow a core dump to be created, we finally have two different ... Christian Schlittchen

07/22/2018

10:09 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
In the case of pg 2.34 above where the only error is "data_digest_mismatch_info" and all the data digests except the ... Brad Hubbard
12:55 PM Bug #25057 (Resolved): jewel->luminous: osdmap crc mismatch
The upgrade/jewel-x runs for 12.2.6 and 12.2.7 threw osdmap crc mismatch errors. Sage Weil

07/21/2018

06:32 PM Backport #25055 (In Progress): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-st...
Nathan Cutler
06:26 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
https://github.com/ceph/ceph/pull/23163 Nathan Cutler
04:12 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
I have the same issue Anton Neubauer
12:08 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
Remy, Please create a PR. Jos Collin
11:56 AM Bug #24923 (Pending Backport): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
https://github.com/ceph/ceph/pull/21520 Jos Collin

07/20/2018

11:42 PM Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
Josh Durgin
04:43 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Running here: http://pulpito.ceph.com/vasu-2018-07-20_16:43:09-ceph-deploy-mimic-distro-basic-ovh/ Vasu Kulkarni
03:03 PM Bug #25017 (Duplicate): log [ERR] : 1.3 past_intervals [182,196) start interval does not contain ...
Josh Durgin
12:38 PM Bug #25017 (Duplicate): log [ERR] : 1.3 past_intervals [182,196) start interval does not contain ...
... Sage Weil
11:56 AM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
This sounds familiar: http://tracker.ceph.com/issues/16211
We used the workaround to set and rm a dummy key/val and ...
Dan van der Ster
08:34 AM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
Sorry that was just a single example to keep it short. listomapkeys doesn't return any data for any bucket in this cl... Magnus Grönlund
07:31 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
... Robert Sander
01:39 AM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Can you post the output of 'rados list-inconsistent-obj 2.53 --format=json-pretty' ? Brad Hubbard
02:14 AM Bug #25011 (New): competing scrubs stuck reserving local -> remote
In this run: http://pulpito.ceph.com/yuriw-2018-07-18_20:14:43-rados-mimic-distro-basic-smithi/2794751/
osd.0 and ...
Josh Durgin
01:06 AM Backport #24068 (In Progress): luminous: osd sends op_reply out of order
Nathan Cutler
01:06 AM Backport #25010 (In Progress): mimic: osd sends op_reply out of order
Nathan Cutler
12:59 AM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
https://github.com/ceph/ceph/pull/23136 Nathan Cutler

07/19/2018

09:59 PM Bug #23827: osd sends op_reply out of order
http://pulpito.ceph.com/yuriw-2018-07-18_21:37:13-powercycle-mimic-distro-basic-smithi/2796128/ indicates that this n... Neha Ojha
12:57 PM Bug #24994: active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
I have now added "osd skip data digest = true" as per release notes and restarted all OSDs.
I still have inconsist...
Robert Sander
08:36 AM Bug #24994 (New): active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
Hi,
a deep scrub revealed 59 active+clean+inconsistent PGs at one customer's cluster and 50 active+clean+inconsist...
Robert Sander
06:11 AM Backport #24989 (Need More Info): mimic: Limit pg log length during recovery/backfill so that we ...
Nathan Cutler
06:10 AM Backport #24988 (Need More Info): luminous: Limit pg log length during recovery/backfill so that ...
Nathan Cutler
06:10 AM Backport #24992 (Resolved): mimic: valgrind-leaks.yaml: expected valgrind issues and found none
https://github.com/ceph/ceph/pull/23744 Nathan Cutler

07/18/2018

09:42 PM Backport #24989: mimic: Limit pg log length during recovery/backfill so that we don't run out of ...
We can hold off on this backport for now. Need to let this bake in master for a while. Neha Ojha
08:00 PM Backport #24989 (Resolved): mimic: Limit pg log length during recovery/backfill so that we don't ...
https://github.com/ceph/ceph/pull/23403 Nathan Cutler
09:42 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
We can hold off on this backport for now. Need to let this bake in master for a while.
Also, this backport is going ...
Neha Ojha
08:00 PM Backport #24988 (Resolved): luminous: Limit pg log length during recovery/backfill so that we don...
https://github.com/ceph/ceph/pull/23211 Nathan Cutler
09:38 PM Bug #24975 (Pending Backport): valgrind-leaks.yaml: expected valgrind issues and found none
This issue has been fixed in master by https://github.com/ceph/ceph/pull/22261
Needs to be backported to mimic.
Neha Ojha
09:14 PM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
This appears to be another instance of #23352. Josh Durgin
09:12 PM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
Did you check that this bucket actually has any entries? These commands are tested in our suite. Greg Farnum
08:46 PM Bug #24990 (Resolved): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
... Neha Ojha
06:10 PM Feature #23979 (Pending Backport): Limit pg log length during recovery/backfill so that we don't ...
Josh Durgin
04:15 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
Alon Avrahami wrote:
> Hi,
>
>
> We have ceph cluster installed with Luminous 12.2.2 using bluestore.
> All no...
Alon Avrahami
01:24 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
Hi,
We have ceph cluster installed with Luminous 12.2.2 using bluestore.
All nodes are Intel servers with 1.6TB...
Alon Avrahami
03:42 PM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
Nathan Cutler
02:32 PM Backport #24472: mimic: Ceph-osd crash when activate SPDK
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22684
merged
Yuri Weinstein
03:36 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
Nathan Cutler
03:35 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
Nathan Cutler
02:20 PM Backport #24865: mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-...
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23024
merged
Yuri Weinstein
03:14 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
David Zafman
02:24 PM Backport #24951: mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
David Zafman wrote:
> https://github.com/ceph/ceph/pull/23084
nerged
Yuri Weinstein
02:22 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
https://github.com/ceph/ceph/pull/23096 merged Yuri Weinstein
11:20 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
https://github.com/ceph/ceph/pull/16777 Nathan Cutler

07/17/2018

10:48 PM Bug #24975 (Resolved): valgrind-leaks.yaml: expected valgrind issues and found none
... Neha Ojha
10:43 PM Bug #24974 (New): Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
... Neha Ojha
08:32 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
Nathan Cutler
08:10 PM Backport #24583: mimic: osdc: wrong offset in BufferHead
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22869
merged
Yuri Weinstein
06:21 PM Feature #23979 (Fix Under Review): Limit pg log length during recovery/backfill so that we don't ...
https://github.com/ceph/ceph/pull/23098 Neha Ojha
05:39 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Neha Ojha
01:37 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
Igor Fedotov
09:58 AM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
ceph-osd parent process need to restart log service after fork, or ceph-osd will not work correctly when the option l... mingshuai wang

07/16/2018

09:18 PM Bug #24950: Running osd_skip_data_digest in a mixed cluster is not ideal
https://github.com/ceph/ceph/pull/23083 David Zafman
09:14 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal

If osd_skip_data_digest in a mixed BlueStore/FileStore cluster is dangerous because we loose data_digest integrity ...
David Zafman
09:17 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
https://github.com/ceph/ceph/pull/23084 David Zafman
09:08 PM Feature #24949 (Resolved): luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest

I'm thinking that while osd_distrust_data_digest=true we should automatically ignore data_digest errors and repair ...
David Zafman
07:36 PM Bug #23352: osd: segfaults under normal operation
We actually got one on July 15: Jul 14 23:54:42 roc04r-sc3a080 kernel: [6988357.283555] safe_timer[19917]: segfault a... Alex Gorbachev
03:54 AM Bug #23352: osd: segfaults under normal operation
The latest core uploaded by Dan in comment 66 is slightly different to the others we've seen so far.
Once again th...
Brad Hubbard
02:24 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
https://github.com/ceph/ceph/pull/23072 Douglas Fuller
02:24 PM Bug #24687 (Fix Under Review): Automatically set expected_num_objects for new pools with >=100 PG...
Because a value for expected_num_objects is too difficult to determine automatically, instead we print a suggestion t... Douglas Fuller
11:16 AM Bug #24938 (New): luminous: rados listomapkeys & listomapvals don't return data.
Hi,
rados listomapkeys & rados listomapvals don't return data when running Luminous, tested on 12.2.4 and 12.2.6:
...
Magnus Grönlund
08:52 AM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
My environment :
[root@gz-ceph-52-203 log]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-...
伟杰 谭
12:57 AM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
Noting the same issue, per ceph-users list post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028...
David Young

07/15/2018

05:46 AM Documentation #24924 (Resolved): doc: typo in crush-map docs
Each time the OSD starts, it verifies it is in the correct location in the CRUSH map and, if it is not, it moved its... Michael Jones

07/14/2018

09:04 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Undersized
The placement group fewer copies than the configured pool replication level.
Missing "has"
Michael Jones
07:57 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
For the luminous regression, this will reproduce the issue:... Sage Weil

07/13/2018

11:02 PM Feature #24917 (New): Gracefully deal with upgrades when bluestore skipping of data_digest become...

Once the data_digest is no longer being used, but is still set from an earlier version, we can get EIO from read bu...
David Zafman
09:26 PM Backport #24083 (In Progress): luminous: rados: not all exceptions accept keyargs
PR: https://github.com/ceph/ceph/pull/22979 Victor Denisov
03:52 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
Nathan Cutler
05:09 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Could cephfs trigger this issue? There have been two reports of cephfs_metadata pool crc errors on the users ML this ... Dan van der Ster
03:51 PM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
Nathan Cutler
03:18 PM Backport #24891: mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22997
merged
Yuri Weinstein
03:00 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
FTR, this crc issue is probably due to an incomplete backport to 12.2.6 of the skip_digest changes for bluestore:
...
Dan van der Ster
01:55 PM Bug #24909 (Fix Under Review): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
https://github.com/ceph/ceph/pull/23029 Jason Dillaman
01:47 PM Bug #24909 (In Progress): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints ...
Jason Dillaman
01:47 PM Bug #24909 (Resolved): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as ...
While running performance testing with Ceph metrics gathering statistics on the cluster, I noticed that while my RBD ... Jason Dillaman
12:58 PM Backport #24908 (In Progress): luminous: luminous->mimic: missing primary copy of xxx, wil try co...
Nathan Cutler
12:57 PM Backport #24908 (Resolved): luminous: luminous->mimic: missing primary copy of xxx, wil try copie...
https://github.com/ceph/ceph/pull/23028 Nathan Cutler
12:26 PM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
Nathan Cutler
12:26 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
original fix is fe5038c7f9577327f82913b4565712c53903ee48
luminosu backport https://github.com/ceph/ceph/pull/23028
Sage Weil
12:06 PM Bug #23871 (Pending Backport): luminous->mimic: missing primary copy of xxx, wil try copies on 3,...
Sage Weil
11:31 AM Backport #24888 (Need More Info): luminous: osd: crash in OpTracker::unregister_inflight_op via O...
non-trivial backport. There are two conflicts. The first conflict can be resolved by cherry-picking 17a192ba5cdbe2129... Nathan Cutler
11:23 AM Backport #24889 (In Progress): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
Nathan Cutler
11:22 AM Backport #24864 (In Progress): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-c...
Nathan Cutler
11:20 AM Backport #24865 (In Progress): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code...
Nathan Cutler

07/12/2018

11:56 PM Bug #24801 (In Progress): PG num_bytes becomes huge
David Zafman
07:38 PM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
07:38 PM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
04:36 PM Backport #24617: mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22888
merged
Yuri Weinstein
02:05 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
Is this the relevant fix? https://github.com/ceph/ceph/commit/4667280f8afe6cd68dfffea61d7530581f3dd0eb
Alessandro'...
Dan van der Ster
12:27 PM Backport #24890 (In Progress): luminous: FAILED assert(0 == "ERROR: source must exist") in FileSt...
Nathan Cutler
10:18 AM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
https://github.com/ceph/ceph/pull/22976 Nathan Cutler
11:03 AM Backport #24891 (In Progress): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore...
Nathan Cutler
10:18 AM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
https://github.com/ceph/ceph/pull/22997 Nathan Cutler
10:50 AM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler
10:50 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler
12:06 AM Backport #24204: mimic: LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22291
merged
Yuri Weinstein
10:50 AM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
Nathan Cutler
10:49 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
Nathan Cutler
12:05 AM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22492
merged
Yuri Weinstein
10:48 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
Nathan Cutler
12:03 AM Backport #24747: mimic: change default filestore_merge_threshold to -10
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22813
merged
Yuri Weinstein
10:48 AM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
Nathan Cutler
10:47 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
Nathan Cutler
12:03 AM Backport #24473: mimic: cosbench stuck at booting cosbench driver
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22887
merged
Yuri Weinstein
10:46 AM Bug #24487 (Resolved): osd: choose_acting loop
Nathan Cutler
10:46 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
Nathan Cutler
12:02 AM Backport #24618: mimic: osd: choose_acting loop
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged
Yuri Weinstein
10:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler
10:46 AM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler
12:02 AM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged
Yuri Weinstein
10:45 AM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
Nathan Cutler
12:00 AM Backport #24805: mimic: rgw workload makes osd memory explode
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22960
merged
Yuri Weinstein
10:36 AM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
Nathan Cutler
10:18 AM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
https://github.com/ceph/ceph/pull/23026 Nathan Cutler
10:18 AM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
Nathan Cutler
03:03 AM Bug #24664 (Pending Backport): osd: crash in OpTracker::unregister_inflight_op via OSD::get_healt...
Sage Weil
03:01 AM Bug #24597 (Pending Backport): FAILED assert(0 == "ERROR: source must exist") in FileStore::_coll...
Sage Weil

07/11/2018

11:48 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
... Josh Durgin
11:47 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
Happened again in 12.2.4:... Josh Durgin
11:33 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
/a/nojha-2018-07-06_23:31:26-rados-wip-23979-2018-07-06-distro-basic-smithi/2744661/ Neha Ojha
11:24 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Cool, I will pickup and run your test, atm the load on workers is high, should have the results tomorrow eod. Vasu Kulkarni
10:25 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
OK, it looks like we missed this in the previous tracker issue that mentioned it (it was actually a three part fix an... Boris Ranto
11:23 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
Josh Durgin
11:21 PM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
Does this show up in the monitor's log in /var/log/ceph/ ? Josh Durgin
11:15 PM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
https://github.com/ceph/ceph/pull/22771 Josh Durgin
11:13 PM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
Looks the same as #24640 Josh Durgin
11:11 PM Bug #24835 (Need More Info): osd daemon spontaneous segfault
Unfortunately there's not much to go on - if this happens again perhaps you can grab a core file or a crash dump will... Josh Durgin
10:09 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
mimic backport: https://github.com/ceph/ceph/pull/22997 Sage Weil
03:54 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Factors leading to this:
- ec pool (e.g., rgw workload0
- rados ops that result in pg log 'error' entries (e.g., ...
Sage Weil
12:37 PM Bug #24597 (In Progress): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collectio...
https://github.com/ceph/ceph/pull/22974 Sage Weil
01:16 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Aha, in that case wip-24192 should fix it. Running it through testing again... Josh Durgin
12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Sage Weil
12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
I believe this is caused by b50186bfe6c8981700e33c8a62850e21779d67d5, which does... Sage Weil
09:38 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
Ah, the error was reported on luminous, which doesn't do the repair, and I guess I missed it on master. Sorry for the... Greg Farnum
09:01 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors

The do_sparse_read() path doesn't attempt to repair a checksum error. Could that be the real issue?
The do_read...
David Zafman
08:25 PM Bug #24875 (Resolved): OSD: still returning EIO instead of recovering objects on checksum errors
A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:... Greg Farnum
08:31 PM Bug #24876 (New): snaptrim_error state cannot be cleared without a new snaptrim
A user on the list reported they had PGs in state "active+clean+snaptrim_error". Investigating, I found that the only... Greg Farnum
08:11 PM Backport #24771: mimic: osd: may get empty info at recovery
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22861
mergedReviewed-by: Sage Weil <sage@redhat.com>
Yuri Weinstein
07:27 PM Bug #24874 (New): ec fast reads can trigger read errors in log
fast read finishes...... Sage Weil
04:11 PM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
This looks like #24597 for the 12.2.5 case, at least. I wonder if the original 12.2.3 is something else (time warp d... Sage Weil
03:51 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...
Josh Durgin

07/10/2018

10:10 PM Bug #24866 (Resolved): FAILED assert(0 == "past_interval start interval mismatch") in check_past_...
... Neha Ojha
08:30 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
https://github.com/ceph/ceph/pull/23024 Patrick Donnelly
08:29 PM Backport #24864 (Resolved): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code...
https://github.com/ceph/ceph/pull/23025 Patrick Donnelly
04:51 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
This was a ceph-volume test with rbd workload, no upgrades, just fresh install, full logs at
http://pulpito.ceph.c...
Vasu Kulkarni
02:41 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
This points to a deeper issue. The target context seems to always be 'unlabeled_t'. That context means something like... Boris Ranto
12:23 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Filing under RADOS because it appears to be OSD specific. John Spray
01:42 PM Bug #23492 (Pending Backport): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-e...
Sage Weil
12:46 PM Bug #24850 (New): IPv6 scoped address not parseable by entity_addr_t
An IPv6 link-local scoped address is not currently parseable since it contains a "%<interface name>" suffix in the ad... Jason Dillaman
12:14 PM Bug #24835: osd daemon spontaneous segfault
The log (attached) does not contain any information on the crash. It shows only the automatic restart of the crashed ... Christian Schlittchen
09:54 AM Backport #24847 (In Progress): jewel: rgw workload makes osd memory explode
Nathan Cutler
09:54 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22959 Nathan Cutler
09:48 AM Backport #24806 (In Progress): luminous: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22962 Prashant D
09:42 AM Backport #24805 (In Progress): mimic: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22960 Prashant D
09:41 AM Bug #23352: osd: segfaults under normal operation
We see periodically with osd_enable_op_tracker = false
Last time ...
Serg D
12:55 AM Bug #23352: osd: segfaults under normal operation
That is correct, Brad. No crashes for 7 days now. Alex Gorbachev
09:33 AM Bug #24768: rgw workload makes osd memory explode
jewel backport: https://github.com/ceph/ceph/pull/22959
i knew that jewel is (almost) EOL. just in case anyone is ...
Kefu Chai
04:01 AM Backport #24845 (Resolved): luminous: tools/ceph-objectstore-tool: split filestore directories of...
https://github.com/ceph/ceph/pull/23418 Nathan Cutler

07/09/2018

10:43 PM Bug #23352: osd: segfaults under normal operation
Alex, So that's a week without issue when previously you were getting a crash every 3-4 days right? Brad Hubbard
01:36 PM Bug #23352: osd: segfaults under normal operation
No issues so far since injecting osd_enable_op_tracker=false Alex Gorbachev
08:40 PM Feature #21366 (Pending Backport): tools/ceph-objectstore-tool: split filestore directories offli...
Josh Durgin
06:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
https://github.com/ceph/ceph/pull/22954 Sage Weil
06:02 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
The problem is that int global_init_shutdown_stderr(CephContext *cct) is not being run at a time in the process lifec... Sage Weil
05:02 PM Bug #24835: osd daemon spontaneous segfault
Can you provide the backtrace out of the OSD log? Or even the whole log? Greg Farnum
02:13 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
We experience spontaneous segmentation faults of osd daemons in our mimic production cluster:... Christian Schlittchen
04:36 PM Bug #24838 (Resolved): mon: auth checks not correct for pool ops
The mon was not enforcing caps for pool ops correctly (which are used for managing unmanaged snapshots or even pool d... Sage Weil
04:32 PM Bug #24837 (Resolved): auth: cephx signature check is weak/broken
The signature check code was validating only the first (32-byte) of two blocks, and thus did not cover all of the crc... Sage Weil
04:30 PM Bug #24836 (Resolved): auth: cephx authorizer subject to replay
The cephx authorizer does not have any challenge or nonce, and thus (if sniffed) can be reused by another session.
...
Sage Weil
04:09 PM Bug #24368: osd: should not restart on permanent failures
I don't think the issue has moved beyond the PR linked above to change the systemd settings. I sent this out to one o... Greg Farnum
08:42 AM Bug #24368: osd: should not restart on permanent failures
guotao Yao wrote:
> I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quick...
guotao Yao
08:12 AM Bug #24368: osd: should not restart on permanent failures
I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quickly by systemd. It cau... guotao Yao

07/06/2018

09:55 PM Bug #24322 (Resolved): slow mon ops from osd_failure
Nathan Cutler
09:55 PM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
Nathan Cutler
09:50 PM Backport #24350: mimic: slow mon ops from osd_failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22297
merged
Yuri Weinstein
09:54 PM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler
09:54 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
Nathan Cutler
09:49 PM Backport #24246: mimic: Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22333
merged
Yuri Weinstein
09:54 PM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
Nathan Cutler
09:49 PM Backport #24375: mimic: mon: auto compaction on rocksdb should kick in more often
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22361
merged
Yuri Weinstein
09:52 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
Nathan Cutler
09:51 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
Nathan Cutler
09:51 PM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
Nathan Cutler
09:42 PM Backport #24468: mimic: tell ... config rm <foo> not idempotent
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22552
merged
Yuri Weinstein
09:50 PM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
Nathan Cutler
09:42 PM Backport #24332: mimic: local_reserver double-reservation of backfilled pg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22559
merged
Yuri Weinstein
09:49 PM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
09:49 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
09:40 PM Backport #24599: mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22651
merged
Yuri Weinstein
09:48 PM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
Nathan Cutler
09:39 PM Backport #24494: mimic: osd: segv in Session::have_backoff
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22730
merged
Yuri Weinstein
09:47 PM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
Nathan Cutler
09:47 PM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
Nathan Cutler
09:45 PM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
Nathan Cutler
09:44 PM Backport #24292: mimic: common: JSON output from rados bench write has typo in max_latency key
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22406
merged
Yuri Weinstein
09:06 PM Backport #24806 (Resolved): luminous: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22962 Nathan Cutler
09:06 PM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22960 Nathan Cutler
06:44 PM Bug #24768 (Pending Backport): rgw workload makes osd memory explode
Sage Weil
06:12 PM Bug #24801: PG num_bytes becomes huge

The OSD logs and this bug point to a slight flaw in https://github.com/ceph/ceph/pull/22797. I add the adjustment ...
David Zafman
05:57 PM Bug #24801 (Resolved): PG num_bytes becomes huge

dzafman-2018-07-05_12:45:56-rados-wip-19753-distro-basic-smithi/2739140
description: rados/thrash/{0-size-min-si...
David Zafman
04:45 PM Backport #23772 (In Progress): luminous: ceph status shows wrong number of objects
Nathan Cutler
01:39 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
... David Zafman
01:28 AM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED

dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
2732821
2732693
2732523...
David Zafman
01:01 AM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails

http://pulpito.ceph.com/dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
Multiple jobs
2732818
...
David Zafman

07/05/2018

10:40 PM Bug #24785 (Resolved): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
... Vasu Kulkarni
09:33 PM Bug #24664 (In Progress): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_met...
https://github.com/ceph/ceph/pull/22877 Brad Hubbard
08:39 PM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
Ganging up with another backport to prevent merge conflicts. Nathan Cutler
08:27 PM Backport #24618 (In Progress): mimic: osd: choose_acting loop
Nathan Cutler
08:22 PM Backport #24617 (In Progress): mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
08:15 PM Backport #24473 (In Progress): mimic: cosbench stuck at booting cosbench driver
Nathan Cutler
12:44 PM Bug #24768 (Fix Under Review): rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22858 Sage Weil
09:17 AM Backport #24583 (In Progress): mimic: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22869 Prashant D
07:25 AM Backport #24584 (In Progress): luminous: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22865 Prashant D

07/04/2018

11:11 PM Backport #24772 (In Progress): luminous: osd: may get empty info at recovery
Nathan Cutler
10:52 PM Backport #24772 (Resolved): luminous: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22862 Nathan Cutler
11:03 PM Backport #24771 (In Progress): mimic: osd: may get empty info at recovery
Nathan Cutler
10:52 PM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22861 Nathan Cutler
07:24 PM Bug #24588: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22704 is the fix Sage Weil
07:23 PM Bug #24588 (Pending Backport): osd: may get empty info at recovery
Sage Weil
07:17 PM Bug #24768 (Resolved): rgw workload makes osd memory explode
From ML,... Sage Weil
12:47 PM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> Having reviewed the code in question again I was afraid that may be the case. If you can provi...
Dan van der Ster
09:38 AM Bug #23352: osd: segfaults under normal operation
Having reviewed the code in question again I was afraid that may be the case. If you can provide the crash dump Dan, ... Brad Hubbard
07:36 AM Bug #23352: osd: segfaults under normal operation
I *injected* osd_enable_op_tracker=false yesterday ... Dan van der Ster
07:40 AM Bug #24123 (Resolved): "process (unknown)" in ceph logs
Nathan Cutler
07:39 AM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
Nathan Cutler
07:38 AM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
Nathan Cutler
07:38 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler
07:37 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
Nathan Cutler
07:31 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi Martin,
Have you try my workaround above?
Best regards,
Lazuardi Nasution
06:18 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi everyone,
What’s the workaround for this issue? Not being able to add new osds is getting more and more urgent...
Martin Overgaard Hansen
01:10 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Final bisect results:
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
2e476...
David Zafman

07/03/2018

10:51 AM Backport #24748 (In Progress): luminous: change default filestore_merge_threshold to -10
Nathan Cutler
07:55 AM Backport #24748 (Resolved): luminous: change default filestore_merge_threshold to -10
https://github.com/ceph/ceph/pull/22814 Nathan Cutler
10:47 AM Backport #24747 (In Progress): mimic: change default filestore_merge_threshold to -10
Nathan Cutler
07:55 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
https://github.com/ceph/ceph/pull/22813 Nathan Cutler
10:47 AM Bug #24686: change default filestore_merge_threshold to -10
*master PR*: https://github.com/ceph/ceph/pull/22761 Nathan Cutler
12:36 AM Bug #24686 (Pending Backport): change default filestore_merge_threshold to -10
Josh Durgin
10:24 AM Feature #13507: scrub APIs to read replica
Update backport field? Nathan Cutler
04:16 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi Sergey,
Have you try that after "ceph osd require-osd-release mimic"?
My workaround is below.
1. Build pa...
Lazuardi Nasution
03:08 AM Bug #23352: osd: segfaults under normal operation
Thanks Alex! Brad Hubbard
01:50 AM Bug #23352: osd: segfaults under normal operation
I set it Brad, watching the status. We normal get one failure in 3-4 days. Alex Gorbachev
01:03 AM Bug #23352: osd: segfaults under normal operation
We are investigating the potential race between get_health_metrics and the op_tracker code.
In the mean time, for ...
Brad Hubbard

07/02/2018

10:47 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I finded workaround solution how add new osd to "mimic" cluster:
1. Purge osd from cluster which displayed as "dow...
Sergey Ponomarev
04:52 PM Backport #24215: mimic: "process (unknown)" in ceph logs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22311
merged
Yuri Weinstein
04:52 PM Backport #24500: mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22545
merged
Yuri Weinstein
04:51 PM Backport #24355: mimic: osd: pg hard limit too easy to hit
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22621
merged
Yuri Weinstein
06:12 AM Bug #23352: osd: segfaults under normal operation
Pretty sure this all revolves around the racy code highlighted in #24037 and, unfortunately, the PR does *not* fix al... Brad Hubbard

06/29/2018

11:27 PM Bug #23875 (In Progress): Removal of snapshot with corrupt replica crashes osd
Tentative pull request https://github.com/ceph/ceph/pull/22476 is an improvement but doesn't address comment 3 David Zafman
11:25 PM Bug #19753 (In Progress): Deny reservation if expected backfill size would put us over backfill_f...
David Zafman
05:59 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Also include >1024 PGs overall Douglas Fuller
09:59 AM Bug #23145: OSD crashes during recovery of EC pg
@sage weil,
tks, due to env is not exists. I couldn't get the logs for the arguments debug_osd=20.
from the previou...
Yong Wang
 

Also available in: Atom