Project

General

Profile

Activity

From 11/10/2014 to 12/09/2014

12/09/2014

10:24 PM CephFS Bug #10288: ceph fs ls fails to list newly created fs
This is probably going to be something obvious in the MDSMonitor. Greg Farnum
09:38 PM CephFS Bug #10288 (Resolved): ceph fs ls fails to list newly created fs
Hi!
After upgrading from .6 to .8 (giant current from ceph ubuntu packages), I wanted to play with CephFS. I foll...
Steve H.
10:24 PM Bug #10287: ceph v0.80.7 ceph-mon --mkfs crash
change to leveldb 1.12, everything works fine. Please close it. wei li
08:25 PM Bug #10287: ceph v0.80.7 ceph-mon --mkfs crash
ceph.conf file... wei li
08:24 PM Bug #10287 (Resolved): ceph v0.80.7 ceph-mon --mkfs crash
ceph version v0.80.7 a new install machine "CentOS Linux release 7.0.1406 (Core)" run, the rpm build in same OS platf... wei li
06:14 PM CephFS Feature #1398: qa: multiclient file io test
Currently I am testing with the following yaml file.... Anonymous
04:35 PM Feature #10198: PG removal occupy the disk thread several hours
In that case, two things:
1) move scrubbing into the OpWQ (I'm working on that one)
2) restructure pg removal to on...
Samuel Just
03:23 PM Bug #10281 (Fix Under Review): firefly: make check fails on fedora 20
https://github.com/ceph/ceph/pull/3128 Loïc Dachary
02:45 PM Bug #10281 (Resolved): firefly: make check fails on fedora 20
http://paste.ubuntu.com/9447409/ Loïc Dachary
03:16 PM Bug #10282 (Resolved): gf-complete: missing .gitignore entry for .dirstamp
upstream Greg's fix https://github.com/ceph/gf-complete/pull/2 :
* -https://github.com/ceph/gf-complete/pull/3-
* ...
Loïc Dachary
01:49 PM CephFS Bug #10248: messenger: failed Pipe;:connect::assert(m) in Hadoop client
Hmm, the client only calls _closed_mds_session if:
1) it gets back a session close
2) the session goes stale
2a)...
Greg Farnum
01:11 PM Feature #7862: allow backfill/recovery while below min_size
Samuel Just
01:10 PM Feature #8635 (In Progress): add scrub, snap trimming, should be items in the OpWQ with cost/prio...
Samuel Just
01:06 PM Feature #7861: osd: allow writes on degraded objects
Samuel Just
01:06 PM Feature #9781 (In Progress): ceph_objectstore_tool: On import handle splits
David Zafman
01:05 PM Feature #9780: ceph_objectstore_tool: Add OSDMap information to pg export
David Zafman
12:05 PM Fix #9566: osd: prioritize recovery of OSDs with most work to do
Loïc Dachary
11:32 AM Bug #9503: Dumpling: removing many snapshots in a short time makes OSDs go berserk
Sam, is it correct to assume that this was fixed for dumpling in commit:1be9476afb9f715502a14749dd44e08371535b54, and... Florian Haas
06:49 AM rgw Bug #10268: s3tests.functional.test_s3.test_bucket_create_exists fails with 'S3CreateError not ra...
ubuntu@teuthology:/a/teuthology-2014-12-08_02:35:02-smoke-master-distro-basic-multi/642112 Sage Weil
02:43 AM Bug #10272: objects misplaced after reweight
Of course... thanks for explaining Loïc Dachary
01:05 AM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
So there not exists OSD superblock issue? only EC+KV problem that #9978 mentioned?
Haomai Wang
12:36 AM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
This issue has something to do with down time. On KV OSDs I've checked 'superblock' files and found that they are OK ... Dmitry Smirnov
12:39 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Can this bug get a little attention please? It has profound effect on my crippled cluster and I'm talking about files... Dmitry Smirnov

12/08/2014

11:12 PM CephFS Bug #10277 (Fix Under Review): ceph-fuse: Consistent pjd failure in getcwd
Zheng Yan
02:58 PM CephFS Bug #10277 (Resolved): ceph-fuse: Consistent pjd failure in getcwd
"job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or... Greg Farnum
09:39 PM Bug #10010 (Fix Under Review): ceph_osd.cc calls global_init_shutdown_stderr even when running wi...
Seems pretty simple; just check g_conf->daemonize, and don't close if not set. Dan Mick
09:08 PM Bug #10242: FAILED assert(backfill_targets.empty() || backfill_targets == want_backfill)
Since this is from online product environment, it never happened again. And I cannot reproduce it in my test/staging ... Wang Qiang
02:53 PM Bug #10242: FAILED assert(backfill_targets.empty() || backfill_targets == want_backfill)
Mmm, that assert is essentially saying that choose_acting is only called in two situations:
1) On a new interval. I...
Samuel Just
08:56 PM Bug #10171 (Fix Under Review): DBObjectMap: ghobject_t header key excludes hash for EC pools
Sage Weil
08:55 PM Bug #10272 (Rejected): objects misplaced after reweight
problem is the (post-crush) reweights. you're rejecting almost all osds with 80% probability. eventually crush will... Sage Weil
02:54 PM Bug #10272: objects misplaced after reweight
This is a problem with the crush rule. Crush retried a bunch of times, but was unable to get 3 replicas for that pg. Samuel Just
10:35 AM Bug #10272 (Rejected): objects misplaced after reweight
Steps to reproduce, after compiling from sources:... Loïc Dachary
04:13 PM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
It's waiting on https://github.com/ceph/ceph-qa-suite/pull/250 Loïc Dachary
03:30 PM Bug #10018 (In Progress): OSD assertion failure if the hinfo_key xattr is not there (corrupted?) ...
Loic: are the tests for this in the regression suite yet? Samuel Just
03:23 PM Bug #10018 (Resolved): OSD assertion failure if the hinfo_key xattr is not there (corrupted?) dur...
Samuel Just
03:33 PM rbd Bug #10180 (Resolved): qemu tests crash host kernel
Yuri Weinstein
03:26 PM Bug #10262: osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Unfortunately I doubt it. From what I have read, cranking up the logs so much would extremely quickly eat up availabl... Daniel Schneller
03:20 PM Bug #10262: osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Can you reproduce? The logs don't have much information, I need it reproduced with
debug osd = 20
debug filestor...
Samuel Just
09:48 AM Bug #10262: osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Replicated Pool. No cache tiering. Daniel Schneller
09:47 AM Bug #10262: osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
is this an erasure or replicated pool? are you using cache tiering? Sage Weil
03:25 PM Bug #9503 (Resolved): Dumpling: removing many snapshots in a short time makes OSDs go berserk
Samuel Just
03:03 PM CephFS Bug #10263 (Resolved): [ERR] bad backtrace on dir ino 600
They're all happy now, merged everything in. Greg Farnum
12:36 PM CephFS Bug #10263: [ERR] bad backtrace on dir ino 600
Merged in the patch for Giant as of commit:247a6fac54854e92a7df0e651e248a262d3efa05.
The others are a little unhap...
Greg Farnum
02:05 PM CephFS Bug #10248: messenger: failed Pipe;:connect::assert(m) in Hadoop client
the new assert for wip-10057 would trigger this.
this looks like a corner case is the session close + reopen seque...
Sage Weil
11:59 AM Bug #10250: PG stuck incomplete after interrupted backfill.
Awesome, looks like it worked, it started backfilling right away and now my vms are unfrozen. Thanks a lot! Aaron Bassett
11:30 AM Bug #10250: PG stuck incomplete after interrupted backfill.
install wip-last_epoch_started, set osd_find_best_info_ignore_history_les = true in ceph.conf, and restart the primar... Sage Weil
10:59 AM Bug #10250: PG stuck incomplete after interrupted backfill.
... Sage Weil
10:21 AM Bug #10250: PG stuck incomplete after interrupted backfill.
I actually had already marked 11 as lost a few days ago. Just this morning I re-activated the disk and it came up as ... Aaron Bassett
08:52 AM Bug #10250: PG stuck incomplete after interrupted backfill.
Can you try 'ceph osd lost 11' ? (I take it osd.11 is the one that you wiped and removed?)
If you can catpure the...
Sage Weil
06:48 AM Bug #10250: PG stuck incomplete after interrupted backfill.
I found more info in the log relating to this pg. It looks like it's kicking it, but still hanging requests. At this ... Aaron Bassett
11:48 AM Bug #8935: operations not idempotent when enabling cache
ubuntu@teuthology:/a/samuelj-2014-12-05_23:56:18-rados-wip-sam-firefly-testing-wip-testing-vanilla-fixes-basic-multi/... Samuel Just
11:47 AM Bug #8797: "ceph status" do not exit with python_2.7.8
I think the right fix for this is to remove Rados.__del__. I'll come up with a pull request unless you want to, Joe. Dan Mick
10:33 AM rgw Bug #10066 (In Progress): rgw: failed md5sum on s3tests-test-readwrite
Alfredo Deza
10:19 AM Bug #10241 (Need More Info): Incorrect OSD mapping with EC 6+2 setup in Giant
need osdmap or crushmap that triggers the failed mapping Sage Weil
09:47 AM Bug #10258 (Duplicate): ceph health reporting blocked op indefinitely
#10259 Sage Weil
09:45 AM rgw Bug #10271 (Resolved): Radosgw urlencode
When performing a multipart upload using AWS SDK JS v.2.0.29 I can see that the uploadId always starts with "2/" whic... Georgios Dimitrakakis
09:28 AM devops Bug #10266: Can't kill runs on magan002 "AuthenticationException: Authentication failed."
What even caused this? I can't of course log in to the machines to check it out. Zack Cerza
09:12 AM rbd Bug #10270 (Resolved): "[ FAILED ] LibRBD.ListChildren" in upgrade:firefly-x-giant-distro-basic...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-07_18:13:01-upgrade:firefly-x-giant-distro-basic-m... Yuri Weinstein
08:40 AM Feature #10192 (Resolved): ceph_objectstore_tool object lookup
Loïc Dachary
08:18 AM Bug #10215 (Resolved): vstart_wrapper.sh kills daemons that do not belong to it
Loïc Dachary
06:56 AM devops Bug #10200: tgtd error: undefined symbol: rbd_discard
tgt_1.0.38-48.bf6981.precise.ceph_amd64.deb from ceph-extras depends on rbd_discard, which is not available in stock ... Jason Dillaman
05:43 AM Bug #9916: osd: crash in check_ops_in_flight
Sorry for the repetition of #5 and #6 for network problem. Wenjun Huang
05:41 AM Bug #9916: osd: crash in check_ops_in_flight
In file osd/OSD.cc:
OSD::_dispatch(Message *m) method:...
Wenjun Huang
05:01 AM Fix #9566: osd: prioritize recovery of OSDs with most work to do
... Loïc Dachary
02:50 AM Fix #9566 (In Progress): osd: prioritize recovery of OSDs with most work to do
Loïc Dachary
04:30 AM Linux kernel client Feature #5109: libceph: implement message signatures
Zheng Yan
04:04 AM Messengers Feature #10029: Retry binding on IPv6 address if not available
Logs I'm seeing on a monitor when it boots:... Wido den Hollander
02:49 AM Cleanup #10253 (Closed): gf-complete dead code
False positive according to Kevin & Janne. Loïc Dachary
02:03 AM Bug #9485: Monitor crash due to wrong crush rule set
But you understand that when CRUSH can not find enough racks using the indep mode, things go wrong and the wrong rule... Dong Lei
01:41 AM Bug #9485: Monitor crash due to wrong crush rule set
Although I've marked the issue as verified, I did not actually get to reproduce it. I meant to a number of times usin... Loïc Dachary

12/07/2014

07:46 PM Feature #10193 (Fix Under Review): Perf counter for WBThrottle
https://github.com/ceph/ceph/pull/3111 Haomai Wang
06:40 PM Bug #9485: Monitor crash due to wrong crush rule set
Hi sage:
According to my test earlier, crushtool may not be able to make it crash. I remember that crushtool will ...
Dong Lei
10:11 AM Bug #9485: Monitor crash due to wrong crush rule set
Just to offer some debriefing on the issue.
After installing the patch, I managed to get the monitor up and runnin...
Panayiotis Gotsis
05:50 PM CephFS Bug #10263 (Fix Under Review): [ERR] bad backtrace on dir ino 600
It's introduced by the 'verify backtrace on fetching dirfrag' patch. Stray directories of old fs has no backtrace, th... Zheng Yan
10:06 AM rgw Bug #10268 (Resolved): s3tests.functional.test_s3.test_bucket_create_exists fails with 'S3CreateE...
... Sage Weil
08:44 AM devops Bug #10266 (Resolved): Can't kill runs on magan002 "AuthenticationException: Authentication failed."
... Yuri Weinstein
05:47 AM devops Bug #10148: Giant/Wheezy SysV: /etc/init.d/ceph -a start shifts crushmap to executing host
Duplicate of #9407 Andrey Korolyov
05:00 AM Fix #10264 (Fix Under Review): docker-test-helper fails on detached head
https://github.com/ceph/ceph/pull/3104 Loïc Dachary
03:19 AM Fix #10264 (Resolved): docker-test-helper fails on detached head
If the working tree is on a detached head (i.e. the commit may be unreachable from from any git refs), docker-test-he... Loïc Dachary
04:59 AM Documentation #10265: building from source should be a onliner
https://github.com/ceph/ceph/pull/3104 Loïc Dachary
04:46 AM Documentation #10265 (Resolved): building from source should be a onliner
Building Ceph from sources is documented as multiple steps although it could be a oneliner grouping... Loïc Dachary
12:31 AM Feature #9888 (Resolved): AsyncMessenger: Async event threads can shared by all AsyncMessenger
Haomai Wang

12/06/2014

06:42 PM Bug #10125 (Resolved): radosgw is being started as root not apache with systemd
Sage Weil
05:36 PM CephFS Bug #10263 (Resolved): [ERR] bad backtrace on dir ino 600
ubuntu@teuthology:/a/sage-bug-10171-base/639742
and the other runs in this set. It's an upgrade test:...
Sage Weil
05:34 PM Bug #9485: Monitor crash due to wrong crush rule set
I've fixed Panayiotis's issue, but it is different than the original bug.
Dong Lei, I've tried to reproduce this b...
Sage Weil
12:22 PM Bug #9485: Monitor crash due to wrong crush rule set
https://github.com/ceph/ceph/commit/wip-9485 Loïc Dachary
11:17 AM Bug #9485: Monitor crash due to wrong crush rule set
for the attached linked, this is the result of the command (crushtool, as compiled from git tree with --with-debug --... Panayiotis Gotsis
11:16 AM Bug #9485: Monitor crash due to wrong crush rule set
for the attached linked, this is the result of the command (crushtool, as supplied by debian packages -- 0.80)
htt...
Panayiotis Gotsis
11:09 AM Bug #9485: Monitor crash due to wrong crush rule set
This is the crashing crushmap
https://www.dropbox.com/s/gbusu8jf2ku6k62/crushmap.orig?dl=0
Panayiotis Gotsis
05:28 PM rbd Bug #10180: qemu tests crash host kernel
For the kernel fixes see https://github.com/ceph/teuthology/pull/380
Rbd suite run - http://pulpito.front.sepia.ce...
Yuri Weinstein
11:08 AM Bug #10063 (Resolved): ceph_objectstore_tool does not support getting attributes for erasure code...
Loïc Dachary
11:02 AM Feature #9420 (Resolved): erasure-code: tools and archive to check for non regression of encoding
Loïc Dachary
06:26 AM RADOS Feature #6114: Complete python binding interfaces for librados
* lock support https://github.com/ceph/ceph/pull/3099 Loïc Dachary
06:04 AM Bug #10262 (Resolved): osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
During the night of 2014-12-06 our cluster (4 nodes, 12x4TB spinning disks, Firefly 0.80.7.1 on Ubuntu 14.04.1) suffe... Daniel Schneller

12/05/2014

09:56 PM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
Sage Weil wrote:
> Just a reminder that the "_dev" in "keyvaluestore_dev" means "experimental! danger! danger!". Th...
Dmitry Smirnov
07:01 AM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
Just a reminder that the "_dev" in "keyvaluestore_dev" means "experimental! danger! danger!". This code is not well-... Sage Weil
05:48 PM CephFS Feature #1398: qa: multiclient file io test
The problem i believe is that we need to install ceph and make sure that we have some mount points before we run the ... Anonymous
05:11 PM Bug #9485: Monitor crash due to wrong crush rule set
Hello, I can verify that I am facing the same problem.
After trying to edit the crushmap in order to separate grou...
Panayiotis Gotsis
03:26 PM Bug #10259 (Resolved): mon health stuck with phantom hung requests
commit:1ac17c0a662e6079c2c57edde2b4dc947f547f57
(03:22:47 PM) sjust: sage: osd_stat_t
(03:22:59 PM) sjust: does n...
Samuel Just
02:36 PM Bug #10258 (Duplicate): ceph health reporting blocked op indefinitely
On the performance test cluster, when creating an EC pool, ceph health reports that an op is blocked many hours after... Mark Nelson
12:08 PM Bug #10257 (Resolved): Ceph df doesn't report MAX AVAIL correctly when using rulesets and OSD in ...
In our setup we have two rulesets, one for SSDs and another one for HDDs. Ceph df normally reports the MAX AVAIL spac... Xavier Trilla
11:58 AM devops Bug #10252 (Resolved): apt-mirror having issues
This was being caused by a proxy issue (needed another reload) which is used to access apt-mirror from the redhat net... Sandon Van Ness
09:37 AM devops Bug #10252: apt-mirror having issues
From magna002 curl sees the same:... Alfredo Deza
08:25 AM devops Bug #10252 (Resolved): apt-mirror having issues
That prevent installation on some machines:... Alfredo Deza
10:16 AM Feature #10231 (Resolved): gperftools headers have moved
Sage Weil
09:32 AM Bug #9785 (Fix Under Review): /etc/ceph/dmcrypt-keys and key contents are created world-readable
* giant backport https://github.com/ceph/ceph/pull/3095
* firefly backport https://github.com/ceph/ceph/pull/3096
Loïc Dachary
08:38 AM Bug #9785 (Pending Backport): /etc/ceph/dmcrypt-keys and key contents are created world-readable
Sage Weil
09:25 AM Feature #10254 (New): mon,osd: long-term non-clean PGs prevent osdmap trimming
sometimes clusters have pgs that are degraded for long periods of time. this forces the mon to retain lots of old os... Sage Weil
08:58 AM Cleanup #10253 (Closed): gf-complete dead code
... Loïc Dachary
08:12 AM rgw Bug #10251 (Resolved): "Segmentation fault" (radosgw()) in upgrade:dumpling-firefly-x:parallel-ne...
Run http://pulpito.ceph.com/teuthology-2014-12-04_17:15:01-upgrade:dumpling-firefly-x:parallel-next-distro-basic-vps/... Yuri Weinstein
07:42 AM Bug #9844: "initiating reconnect" (log) race; crash of multiple OSDs (domino effect)
I have had this same issue on my cluster as well.
My cluster originally had 4 nodes, with 7 osds on each node, 28 ...
Jake Young
06:37 AM Bug #10250: PG stuck incomplete after interrupted backfill.
Oh I forgot to mention, min_size is 1 on this pool. Aaron Bassett
05:57 AM Bug #10250 (Closed): PG stuck incomplete after interrupted backfill.
Ceph version: 0.87
OS: Ubuntu 14.04
Cluster: 3x osd nodes with ~24 osds each
Issue: I had a pool accidentally se...
Aaron Bassett
03:55 AM Bug #9916: osd: crash in check_ops_in_flight
Sage Weil wrote:
> how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops sho...
Wenjun Huang
03:06 AM Bug #10246 (Resolved): add el7 to the list of supported version for centos in ceph-deploy install...
Loïc Dachary
01:15 AM rbd Feature #10226: Add pool quota reporting for Libvirt and other clients
Assigning this one to me.
Need to figure a way out to fetch the pool's quota from the cluster first.
Wido den Hollander
01:12 AM Fix #9566 (Fix Under Review): osd: prioritize recovery of OSDs with most work to do
Loïc Dachary
01:10 AM Bug #10018 (Fix Under Review): OSD assertion failure if the hinfo_key xattr is not there (corrupt...
Loïc Dachary

12/04/2014

07:36 PM CephFS Bug #10229 (Resolved): Filer: lock inversion with Objecter
Zheng Yan
06:06 PM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
I'm sorry, that I forgot use correct format before, so it omitted some character:... Wenjun Huang
06:04 PM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Wenjun Huang wrote:
> Sorry for my carelessness, I meant:
> @if (r == -ENOENT || r == -ENOATTR)
> continue;@
...
Wenjun Huang
05:26 PM CephFS Feature #1398: qa: multiclient file io test
... Anonymous
03:49 PM devops Feature #10046 (In Progress): run make check on every pull request
https://github.com/ceph/ceph-build/pull/35 Loïc Dachary
01:58 PM Bug #10063 (Fix Under Review): ceph_objectstore_tool does not support getting attributes for eras...
* firefly backport https://github.com/ceph/ceph/pull/3089
* giant backport https://github.com/ceph/ceph/pull/3088
Loïc Dachary
01:29 PM Bug #10125 (Fix Under Review): radosgw is being started as root not apache with systemd
Loïc Dachary
12:07 PM Bug #10125: radosgw is being started as root not apache with systemd
* giant backport https://github.com/ceph/ceph/pull/3085
* firefly backport https://github.com/ceph/ceph/pull/3086
Loïc Dachary
10:03 AM Bug #10125 (Pending Backport): radosgw is being started as root not apache with systemd
Sage Weil
09:58 AM Bug #10125 (Fix Under Review): radosgw is being started as root not apache with systemd
... Loïc Dachary
04:45 AM Bug #10125: radosgw is being started as root not apache with systemd
I would like to test it manually but I don't know how to get a centos7 VPS. Help ? Loïc Dachary
01:23 PM Fix #9566 (In Progress): osd: prioritize recovery of OSDs with most work to do
Loïc Dachary
01:22 PM Bug #9785 (Fix Under Review): /etc/ceph/dmcrypt-keys and key contents are created world-readable
https://github.com/ceph/ceph/pull/3087 Loïc Dachary
12:44 PM Bug #10246 (Fix Under Review): add el7 to the list of supported version for centos in ceph-deploy...
Loïc Dachary
08:06 AM Bug #10246: add el7 to the list of supported version for centos in ceph-deploy installation instr...
https://github.com/ceph/ceph/pull/3081 Loïc Dachary
08:00 AM Bug #10246 (Resolved): add el7 to the list of supported version for centos in ceph-deploy install...
http://ceph.com/docs/master/start/quick-start-preflight/#red-hat-package-manager-rpm Loïc Dachary
11:31 AM rbd Bug #10030 (Resolved): Crash when attempting to open non-existent parent image
Josh Durgin
10:37 AM CephFS Bug #10248 (New): messenger: failed Pipe;:connect::assert(m) in Hadoop client
We have logs and a core dump from the QA run: http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-30_23:12:01-hado... Greg Farnum
10:05 AM Bug #10211 (Fix Under Review): gf-complete exit(1) because of misaligned structure
giant backport https://github.com/ceph/ceph/pull/3083 Loïc Dachary
10:00 AM Bug #10211 (Pending Backport): gf-complete exit(1) because of misaligned structure
Loïc Dachary
09:25 AM Feature #9728 (Resolved): erasure-code: jerasure support for NEON
Samuel Just
09:24 AM Documentation #10247 (Resolved): Alpha sort os-recommendataions
Sage Weil
08:14 AM Documentation #10247 (Resolved): Alpha sort os-recommendataions
to remove implied bias. Patrick McGarry
09:24 AM Bug #10209 (Resolved): osd/OSD.cc: 5410: FAILED assert(session)
Samuel Just
09:11 AM Documentation #10245: RPM quick start for RHEL should explain where to get tcmalloc & python-flask
CentOS7 needs... Loïc Dachary
07:55 AM Documentation #10245: RPM quick start for RHEL should explain where to get tcmalloc & python-flask
For RHEL, we should cover the basics of "Check if you're subscribed to Red Hat with @subscription-manager@ ?" and "ru... Ken Dreyer
07:53 AM Documentation #10245: RPM quick start for RHEL should explain where to get tcmalloc & python-flask
Bumping the priority so that it gets triaged Loïc Dachary
07:47 AM Documentation #10245 (Resolved): RPM quick start for RHEL should explain where to get tcmalloc & ...
http://ceph.com/docs/master/start/quick-start-preflight/#red-hat-package-manager-rpm... Loïc Dachary
08:59 AM rgw Bug #10066: rgw: failed md5sum on s3tests-test-readwrite
That's the conf file I'm using for reproduction:... Yehuda Sadeh
07:12 AM Fix #10244 (New): double resource for setting up ceph-deploy
And one of them is missing instructions on installing via RPM.
This causes issues with users that find the incompl...
Alfredo Deza
06:56 AM Linux kernel client Bug #4553 (Can't reproduce): kclient: lockdep report, crash involving ceph fs and libceph
Sage Weil
06:36 AM rgw Bug #10243: civetweb is hitting a limit (number of threads 1024)
The problem is in [1] as:
#define MAX_WORKER_THREADS 1024
I changed this to "MAX_WORKER_THREADS 20480" and it worke...
Mustafa Muhammad
04:00 AM rgw Bug #10243 (Resolved): civetweb is hitting a limit (number of threads 1024)
When setting "rgw thread pool size" to a number higher than 1024 and enabling civetweb 'rgw_frontends="civetweb port=... Mustafa Muhammad
06:30 AM rbd Bug #10123 (Resolved): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Jason Dillaman
06:21 AM Bug #10067 (Can't reproduce): ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
Loïc Dachary
06:04 AM Bug #10067: ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
The ... Loïc Dachary
05:37 AM Bug #10067: ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
... Loïc Dachary
05:21 AM Bug #10067: ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
It's a stress split test therefore no erasure code involved after upgrading to firefly. Loïc Dachary
04:54 AM Bug #10042 (Duplicate): OSD crash doing object recovery with EC pool
http://tracker.ceph.com/issues/8588 Loïc Dachary
04:48 AM Bug #10065 (Duplicate): hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
http://tracker.ceph.com/issues/10211 Loïc Dachary
03:51 AM Bug #8588 (In Progress): In the erasure-coded pool, primary OSD will crash at decoding if any dat...
Loïc Dachary
03:42 AM Bug #10017 (Fix Under Review): OSD wrongly marks object as unfound if only the primary is corrupt...
https://github.com/ceph/ceph/pull/3034 ready for review Loïc Dachary
12:33 AM Bug #10242: FAILED assert(backfill_targets.empty() || backfill_targets == want_backfill)
Since the osd went down, so please upgrade the priority and severity. Wang Qiang
12:19 AM Bug #10242 (Can't reproduce): FAILED assert(backfill_targets.empty() || backfill_targets == want_...
version: 0.80.7
config:
osd max backfills = 1
osd recovery max active = 1
One osd went down after hitting...
Wang Qiang

12/03/2014

09:31 PM CephFS Bug #10229: Filer: lock inversion with Objecter
Zheng Yan
10:40 AM CephFS Bug #10229 (Resolved): Filer: lock inversion with Objecter
Saw this on a next test (http://qa-proxy.ceph.com/teuthology/sage-2014-12-01_11:11:17-fs-next-distro-basic-multi/6289... Greg Farnum
09:02 PM Bug #10241 (Resolved): Incorrect OSD mapping with EC 6+2 setup in Giant
Hit this on the performance test cluster during nightly giant testing. Notice the very incorrect mapping.... Mark Nelson
07:11 PM Feature #10198: PG removal occupy the disk thread several hours
Samuel Just wrote:
> Radosgw creates unbounded size index objects, will change eventually.
Hi Sam,
It is more like...
Guang Yang
05:56 PM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Sorry for my carelessness, I meant:
@if (r == -ENOENT || r == -ENOATTR)
continue;@
Wenjun Huang
04:10 AM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Samuel Just wrote:
> Right, the deleting the object portion is what I was talking about. I think that's the right w...
Wenjun Huang
05:55 PM CephFS Feature #1398: qa: multiclient file io test
Note to self:
Try: rbd import to create an image name, rbd resize the image, make sure reads return EOF at right...
Anonymous
05:43 PM Bug #10176: Segmentation fault in upgrade:firefly:singleton-firefly-distro-basic-vps run
See PR - https://github.com/ceph/ceph-qa-suite/pull/253 Yuri Weinstein
04:24 PM rgw Bug #10195: s3 java jdk conn.getobject(...) (get s3 object) method fails with latest version of a...
Both test runs used the exact same code, same endpoint, same authentication keys, and same GET request. The only thin... Selwyn Jacobs
12:52 PM Feature #10231: gperftools headers have moved
Work-in-progress pushed to https://github.com/ceph/ceph/tree/wip-10231-gperftools-location, submitted for review at h... Ken Dreyer
12:44 PM Feature #10231 (Resolved): gperftools headers have moved
The @google/@ headers location has been deprecated as of gperftools 2.0. As of gperftools 2.2rc, the @google/@ header... Ken Dreyer
10:18 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
... Loïc Dachary
10:10 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Here is a tentative approach. The idea is to accumulate authoritative peers instead of just keeping the last one. For... Loïc Dachary
08:56 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Exploring a two options:
* changing PG::scrub_compare_maps to collect all shards for a given missing object so tha...
Loïc Dachary
08:49 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
When the primary shard is lost in k=2, m=2, the PG has an unfound object (because, as explained in the description) t... Loïc Dachary
08:45 AM rgw Bug #10227 (Duplicate): "Segmentation fault" radosgw() in smoke-master-distro-basic-multi suite
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-30_02:35:03-smoke-master-distro-basic-multi/626596... Yuri Weinstein
08:35 AM rbd Feature #10226 (New): Add pool quota reporting for Libvirt and other clients
Currently when adding a Ceph RBD pool into Libvirt it will set the pool size as the maximum capacity of the entire cl... L B
08:20 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
The initial run for this report passed - http://pulpito.front.sepia.ceph.com/teuthology-2014-12-02_17:00:03-upgrade:f... Yuri Weinstein
06:56 AM CephFS Fix #10135 (Resolved): OSDMonitor: allow adding cache pools to cephfs pools already in use
26e8cf174b8e76b4282ce9d9c1af6ff12f5565a9 Greg Farnum
05:25 AM Bug #10211: gf-complete exit(1) because of misaligned structure
For the record, strace on the process shows:... Loïc Dachary
05:16 AM CephFS Bug #10164 (Fix Under Review): Dirfrag objects for deleted dir not purged until MDS restart
https://github.com/ceph/ceph/pull/3071 Zheng Yan
02:33 AM Bug #10202 (Can't reproduce): ceph_objecstore_tool.py : OSD has the store locked
Thanks for trying, since armv8 based machines & the ubuntu distribution matching are not out yet, let's close this. Loïc Dachary

12/02/2014

11:20 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
I reported KV OSD init problem as #10225. Dmitry Smirnov
11:18 PM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
Oh yeah, another thing: some filestore based OSDs crash at the end of boot sequence so I didn't bother to start them ... Dmitry Smirnov
11:15 PM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
I didn't do anything to Ceph because cluster was down so I didn't even start those OSDs. No upgrades to Ceph was depl... Dmitry Smirnov
11:01 PM Bug #10225: keyvaluestore: OSDs do not start after few weeks of downtime (osd init failed / unabl...
Hmm, I'm not sure why this happen. It seemed keyvaluestore lose "osd_superblock"?
Do you upgrade ceph?
Haomai Wang
10:45 PM Bug #10225 (Closed): keyvaluestore: OSDs do not start after few weeks of downtime (osd init faile...
On "Giant" I've created seven KV OSDs (on 4 or 5 different hosts) before cluster went down due to cascade of OSD cras... Dmitry Smirnov
10:54 PM Bug #10202: ceph_objecstore_tool.py : OSD has the store locked

On a virtual machine.
$ cat /etc/issue
Ubuntu 14.04 LTS \n \l
$ uname -a
Linux ubuntu 3.13.0-24-generic #46-U...
David Zafman
03:09 PM Bug #10202: ceph_objecstore_tool.py : OSD has the store locked
It is ubuntu 14.04 on ARMv8 Loïc Dachary
12:57 PM Bug #10202: ceph_objecstore_tool.py : OSD has the store locked

Could this be a platform specific bug using init-ceph to kill daemons? Do we know what platform they are running on?
David Zafman
09:32 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
https://github.com/ceph/ceph/pull/3070 Greg Farnum
03:41 PM Bug #10153: Rados.shutdown() dies with Illegal instruction (core dumped)
This is not specific to rados.py, of course. Dan Mick
01:15 PM Bug #10153: Rados.shutdown() dies with Illegal instruction (core dumped)
This was fixed by the application of commit:92615ea and commit:cf2104d in master. Please backport to firefly. Joe Julian
01:47 PM Bug #9487 (Resolved): dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim...
Samuel Just
01:43 PM Bug #9891 (Rejected): "Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x...
Samuel Just
01:40 PM Bug #10085 (Resolved): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
Sage Weil
01:39 PM Bug #10157: PGLog::(read|write)_log don't write out rollback_info_trimmed_to
Samuel Just
01:37 PM Bug #9459 (Can't reproduce): osd: blocked request
Sage Weil
01:36 PM Bug #8595 (Resolved): osd: client op blocks until backfill starts (dumpling)
Sage Weil
01:35 PM Bug #10126 (Rejected): "Exception: timed out waiting for admin_socket" in upgrade:giant-x-next-di...

Second time this was reproduced a misc.log was available. The load average was 17 on node with osd.13 which must n...
David Zafman
01:34 PM Bug #9806 (Pending Backport): Objecter: resend linger ops on split
Sage Weil
01:34 PM Bug #9806 (Resolved): Objecter: resend linger ops on split
Sage Weil
01:34 PM Bug #8885 (Can't reproduce): SIGABRT in TrackedOp::dump() via dump_ops_in_flight()
Samuel Just
01:33 PM Bug #9731 (Can't reproduce): Ceph 0.80.6 OSD crashes
Samuel Just
01:32 PM Messengers Bug #9898 (Pending Backport): osd: fast dispatch deadlock in mark_down (giant)
Sage Weil
01:32 PM Messengers Bug #9898 (Resolved): osd: fast dispatch deadlock in mark_down (giant)
Samuel Just
01:30 PM Bug #10058 (Can't reproduce): next stuck in recovery, no progress
Samuel Just
01:28 PM Bug #10105 (Can't reproduce): crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
Samuel Just
01:27 PM Bug #10138 (Need More Info): osd: crash in SnapSet::from_snap_set
Sage Weil
01:19 PM Bug #8797: "ceph status" do not exit with python_2.7.8
The SIGILL was cured in master with the application of 92615ea and cf2104d. I've tested backporting these to firefly ... Joe Julian
01:18 PM Bug #10209: osd/OSD.cc: 5410: FAILED assert(session)
Samuel Just
01:18 PM Bug #10178 (Resolved): mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Sage Weil
01:17 PM Bug #9939 (Resolved): "giant" no longer log scrub errors
commit:d392f44891a064e08f28244673c43a869e1f6014 Sage Weil
01:14 PM Bug #10109 (Duplicate): "LibRadosTwoPoolsECPP.PromoteSnap" test failed in upgrade:dumpling-firefl...
Sage Weil
01:13 PM Bug #10113 (Duplicate): --log-to-stderr with -f/-d sends a lot of things to logfile
#9180 Sage Weil
01:13 PM Bug #10124 (Rejected): monitor recieves bus error signal
leveldb bug Sage Weil
01:11 PM Bug #10146 (In Progress): ceph-disk: sometimes the journal symlink is not created
Still open, needs tests. Loïc Dachary
01:10 PM Bug #10146 (Resolved): ceph-disk: sometimes the journal symlink is not created
Sage Weil
01:10 PM Bug #10118 (Need More Info): messenger drops messages between osds
Samuel Just
01:08 PM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Right, the deleting the object portion is what I was talking about. I think that's the right way. Samuel Just
01:07 PM Bug #10173 (Resolved): autogen.sh will fail if submodule URL changes
Loïc Dachary
01:06 PM Bug #10175 (Resolved): deps.deb.txt is obsolete
Loïc Dachary
01:06 PM Feature #10198: PG removal occupy the disk thread several hours
Radosgw creates unbounded size index objects, will change eventually. Samuel Just
12:41 PM rgw Bug #10015 (Fix Under Review): rgw sync agent: 403 when syncing object that has tilde in its name
PR opened https://github.com/ceph/radosgw-agent/pull/12 Alfredo Deza
11:37 AM rbd Bug #10122: "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vps run
Josh, can you take a look? Not sure if the project has to change to rbd or not tho. Yuri Weinstein
10:57 AM Bug #9788: "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
One more in run http://pulpito.ceph.com/teuthology-2014-12-01_18:18:01-upgrade:firefly-x-giant-distro-basic-vps/
L...
Yuri Weinstein
10:56 AM rgw Bug #10066: rgw: failed md5sum on s3tests-test-readwrite
test fails, and can be reproduced with the specific random seeds, with a specific object size (at the larger side of ... Yehuda Sadeh
10:53 AM rgw Bug #10221 (Resolved): Crash in "radosgw-admin" in upgrade:firefly:singleton-firefly-distro-basic...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-01_17:05:01-upgrade:firefly:singleton-firefly-dist... Yuri Weinstein
10:27 AM rgw Bug #10062: s3-test failures using keystone authentication
Hi Yehuda, Sage
the patch addressed only the first 5 or so failures as mentioned.
The post_object* tests were s...
Abhishek Lekshmanan
09:38 AM rgw Bug #10062 (Resolved): s3-test failures using keystone authentication
Yehuda Sadeh
09:38 AM rgw Bug #10062: s3-test failures using keystone authentication
Fix merged into master. Yehuda Sadeh
10:12 AM rbd Bug #9513 (Resolved): rbd_cache=true default setting is degading librbd performance ~10X in Giant
Reverted in master commit:b808cdfaa8823f0747f78938f3ed9a7a75e9bed1
Reverted in Giant commit:3b1eafcabb6139133b5ff0bd...
Jason Dillaman
10:11 AM Linux kernel client Bug #9896: krbd: EPERM from map-snapshot-io.sh
/a/teuthology-2014-10-24_23:06:01-krbd-giant-testing-basic-multi/570830... Ilya Dryomov
10:09 AM rbd Bug #9936 (Resolved): Exporting images larger than 2GB fails
Jason Dillaman
10:09 AM rbd Bug #9936: Exporting images larger than 2GB fails
Master: commit:4b87a81c86db06f6fe2bee440c65fc05cd4c23ce
Giant: commit:65c565701eb6851f4ed4d2dbc1c7136dfaad6bcb
Jason Dillaman
09:50 AM rgw Bug #10162 (Duplicate): s3tests-test-readwrite failure
A duplicate of #10082 Yehuda Sadeh
09:34 AM rgw Bug #10188: Can not create new rgw user when specifying an email already assigned to a user
That's by design. Users cannot share the same email, as S3 permissions can be granted by email address, so email need... Yehuda Sadeh
09:32 AM rgw Bug #10195 (Need More Info): s3 java jdk conn.getobject(...) (get s3 object) method fails with la...
Can you provide rgw log (debug rgw = 20), for such a failed request? Yehuda Sadeh
09:29 AM rgw Bug #10106: rgw acl response should start with <?xml version="1.0" ?>
Note that Amazon's API definition does not specify this:
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGE...
Yehuda Sadeh
09:28 AM rbd Bug #10116 (Need More Info): Ceph vm guest disk lockup when using fio
Warren, can you provide an stack backtraces from when you encountered the issue and a list of any OSD ops-in-flight? Jason Dillaman
09:24 AM rgw Bug #10108 (Duplicate): s3tests fail in upgrade:dumpling-firefly-x:parallel-next-distro-basic-mul...
Duplicate of #10082 Yehuda Sadeh
09:24 AM rbd Bug #9078 (Rejected): Removing an RBD is very slow whenever there is write's in other RBD which a...
Sage Weil
09:23 AM rbd Bug #9742 (Resolved): `rbd map lun` fails with: (2) No such file or directory on kernel 3.14.14 w...
Sage Weil
09:22 AM rgw Bug #10121 (Duplicate): "test.functional.tests.TestAccountUTF8" error in upgrade:dumpling-x-firef...
Duplicate of #10082 Yehuda Sadeh
09:20 AM rbd Bug #8329 (Won't Fix): qemu-img rpm provided breaks snapshooting functionality on centos
Sage Weil
09:18 AM rgw Bug #9886 (Resolved): rgw: apache 2.4 does not send http status reason string
Sage Weil
09:11 AM Bug #10125: radosgw is being started as root not apache with systemd
https://github.com/ceph/ceph/pull/3059 Loïc Dachary
09:11 AM rgw Bug #10145 (Can't reproduce): rgw swift functional test: testChunkedPut (test.functional.tests.Te...
Sage Weil
09:11 AM rgw Bug #10144 (Can't reproduce): rgw swift functional test: testChunkedPut (test.functional.tests.Te...
Sage Weil
09:08 AM rgw Bug #9899 (Resolved): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-d...
Sage Weil
09:05 AM Bug #10220 (Resolved): "mon/Paxos.cc: 1033: FAILED assert(mon->is_leader())" in upgrade:dumpling-...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-01_18:25:01-upgrade:dumpling-firefly-x:stress-spli... Yuri Weinstein
09:04 AM rgw Bug #10219 (Resolved): s3-tests failing to clone
http://pulpito.ceph.com/sage-2014-12-01_11:07:39-rgw-next-distro-basic-multi Sage Weil
08:01 AM devops Bug #10218 (Rejected): "Gem::DependencyError" error in upgrade:firefly-x-next-distro-basic-vps run
In run http://pulpito.front.sepia.ceph.com/teuthology-2014-12-01_17:18:01-upgrade:firefly-x-next-distro-basic-vps/
...
Yuri Weinstein
06:57 AM CephFS Bug #9997 (Resolved): test_client_pin case is failing
Merged to next (https://github.com/ceph/ceph/pull/3056) John Spray
06:53 AM CephFS Bug #10217 (Resolved): old fuse should warn on flock
This works in master. Greg Farnum
06:19 AM CephFS Bug #10217: old fuse should warn on flock
yes, we need recent version of ceph-fuse and MDS. old version does not support interrupting flock Zheng Yan
03:37 AM CephFS Bug #10217 (Resolved): old fuse should warn on flock

Test failure: test_filelock (tasks.mds_client_recovery.TestClientRecovery):
http://pulpito.front.sepia.ceph.com/sa...
John Spray
05:40 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
Alfredo Deza
03:46 AM CephFS Fix #10135 (Fix Under Review): OSDMonitor: allow adding cache pools to cephfs pools already in use
giant backport PR: https://github.com/ceph/ceph/pull/3055 John Spray
03:35 AM CephFS Bug #10151 (Resolved): mds client cache pressure health warning oscillates on/off
The version on next has a pass on client-limits (the one that exercises health): http://pulpito.front.sepia.ceph.com/... John Spray
12:14 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Fixed #10211 that showed up while experimenting Loïc Dachary

12/01/2014

11:27 PM Bug #10216 (Resolved): gf-complete and jerasure call exit(1)
On error, gf-complete and jerasure call exit(1) instead of assert. This causes the OSD to disapear instead of display... Loïc Dachary
11:06 PM Bug #10211 (Fix Under Review): gf-complete exit(1) because of misaligned structure
Loïc Dachary
04:26 PM Bug #10211 (In Progress): gf-complete exit(1) because of misaligned structure
https://github.com/ceph/ceph/pull/3049 Loïc Dachary
01:00 PM Bug #10211: gf-complete exit(1) because of misaligned structure
https://github.com/ceph/gf-complete/pull/1 Loïc Dachary
01:00 PM Bug #10211 (Resolved): gf-complete exit(1) because of misaligned structure
Steps to reproduce (this is fragile because it depends on the version of the allocator):
* rm -fr dev out ; mkdir...
Loïc Dachary
10:42 PM Bug #10215 (Fix Under Review): vstart_wrapper.sh kills daemons that do not belong to it
https://github.com/ceph/ceph/pull/3054 Loïc Dachary
10:28 PM Bug #10215 (Resolved): vstart_wrapper.sh kills daemons that do not belong to it
When vstart_wrapper.sh "calls vstart.sh":https://github.com/ceph/ceph/blob/master/src/test/vstart_wrapper.sh#L32 it u... Loïc Dachary
06:05 PM CephFS Fix #10135 (Pending Backport): OSDMonitor: allow adding cache pools to cephfs pools already in use
merged to next in commit:25fc21b837ba74bab2f6bc921c78fb3c43993cf5
This also should go into giant (I think Firefly ...
Greg Farnum
05:58 PM CephFS Bug #10011 (Resolved): Journaler: failed on shutdown or EBLACKLISTED
giant commit:65f6814847fe8644f5d77a9021fbf13043b76dbe Greg Farnum
06:37 AM CephFS Bug #10011 (Fix Under Review): Journaler: failed on shutdown or EBLACKLISTED
Haven't seen any failures around this, let's backport to giant: https://github.com/ceph/ceph/pull/3047 John Spray
05:47 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
I'll turn that fpaste into a real patch and get Sam or somebody to put it in some testing so we should at least see i... Greg Farnum
05:37 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
If a connection gets marked down, we *cannot* reconnect to that endpoint again; it needs to recycle itself to a new e... Greg Farnum
05:29 PM Bug #10213 (Resolved): Some inappropriate consts
Thanks! Greg Farnum
05:21 PM Bug #10213 (Fix Under Review): Some inappropriate consts
My apologies for not being careful enough on this review. https://github.com/ceph/ceph/pull/3050 Loïc Dachary
02:15 PM Bug #10213 (Resolved): Some inappropriate consts
https://github.com/ceph/ceph/pull/3011/files
https://github.com/ceph/ceph/pull/3037/files
These added some consts...
Greg Farnum
05:16 PM Bug #10214 (Resolved): crush: straw buckets do not have expected/desired properties
two issues:
- straw scaling factors calculated for straw buckets do not produces the correct distribution when the...
Sage Weil
12:18 PM rgw Bug #10195: s3 java jdk conn.getobject(...) (get s3 object) method fails with latest version of a...
typo... in bug title/description anywhere you see "jdk" I meant "sdk"... it was late before thanksgiving when I compo... Selwyn Jacobs
10:28 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Modified vps.yaml:... Yuri Weinstein
06:35 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
mon_lease_ack_timeout: 25... Sage Weil
09:30 AM Bug #10209 (Fix Under Review): osd/OSD.cc: 5410: FAILED assert(session)
https://github.com/ceph/ceph/pull/3048 Sage Weil
09:23 AM Bug #10209 (In Progress): osd/OSD.cc: 5410: FAILED assert(session)
Sage Weil
08:50 AM Bug #10209 (Resolved): osd/OSD.cc: 5410: FAILED assert(session)
This was with wip-sam-testing, but I don't think it's related to any of the patches.
ubuntu@teuthology:/a/samuelj-...
Samuel Just
09:22 AM Bug #9921 (Pending Backport): msgr/osd/pg dead lock giant
Sage Weil
09:04 AM Bug #9321 (Resolved): pgmap updates from OSDMap can be delayed indefinitely
Samuel Just
08:53 AM Bug #10210 (Closed): "Caught signal" in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-29_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
06:59 AM CephFS Bug #10164 (In Progress): Dirfrag objects for deleted dir not purged until MDS restart
Zheng: assigning to you since you mentioned you were working on it John Spray
06:34 AM CephFS Bug #9997 (Fix Under Review): test_client_pin case is failing
https://github.com/ceph/ceph/pull/3045 John Spray
04:42 AM CephFS Bug #9994: ceph-qa-suite: nfs mount timeouts
http://pulpito.ceph.com/teuthology-2014-11-23_23:10:01-knfs-next-testing-basic-multi/617093/
http://pulpito.ceph.com...
John Spray
04:20 AM CephFS Feature #9881 (Resolved): mds: admin command to flush the mds journal
Merged to master (forgot the Fixes:, doh)... John Spray
01:57 AM RADOS Bug #9523: Both op threads and dispatcher threads could be stuck at acquiring the budget of files...
> I am wondering if it makes sense to add a new parameter named *should_take_filestore_budget* to dispatch_context an... Guang Yang

11/30/2014

11:35 PM RADOS Bug #9523: Both op threads and dispatcher threads could be stuck at acquiring the budget of files...
There seems two problems here:
# Dispatcher thread hang due to filestore throttling
# Op thread hang due to filesto...
Guang Yang
01:35 PM Linux kernel client Bug #10208: libceph: intermittent hangs under memory pressure
The kern.log attached, with the data got shortly after running the following command:
time dd if=/dev/zero of=4G00...
Andrei Mikhailovsky
11:15 AM devops Bug #9665 (Resolved): ceph-disk zap should call partprobe
Loïc Dachary
05:00 AM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Haomai Wang wrote:
> A related bug is fixed. But I'm not fully sure whether fix this problem.
I'm not sure which...
Dmitry Smirnov

11/29/2014

11:45 PM Linux kernel client Bug #10208 (Resolved): libceph: intermittent hangs under memory pressure
Ilya Dryomov

11/28/2014

01:01 PM Documentation #10207 (Resolved): documentation: auth service required needs clarification
In http://ceph.com/docs/master/rados/configuration/auth-config-ref/#configuration-settings the distinction between *a... Loïc Dachary
12:49 PM Documentation #10206 (Resolved): documentation: Network Configuration Reference duplicate string
The string *You may configure this range at your discretion.* shows twice in http://ceph.com/docs/master/rados/config... Loïc Dachary
12:41 PM Documentation #10205 (Resolved): documentation: reference to ceph-deploy should be a link
In http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters the phrase *See ceph-deploy n... Loïc Dachary
12:36 PM Documentation #10204 (Resolved): documentation: mon should be listed before osd
When deploying a Ceph cluster, the mon must be run first. In the list shown at http://ceph.com/docs/master/rados/conf... Loïc Dachary
12:28 PM Documentation #10203 (Resolved): documentation: explain the term MON
http://ceph.com/docs/master/rados/ should read *and a Ceph Monitor (MON) maintains* of *and a Ceph Monitor maintains*... Loïc Dachary
12:08 PM Feature #9815 (Resolved): run make check in parallel
Loïc Dachary
12:07 PM Bug #10201 (Fix Under Review): tests must use ceph_objectstore_tool
https://github.com/ceph/ceph/pull/3033 Loïc Dachary
07:55 AM Bug #10201 (Resolved): tests must use ceph_objectstore_tool
Tests such as "osd-scrub-repair":https://github.com/ceph/ceph/blob/giant/src/test/osd/osd-scrub-repair.sh#L64 must no... Loïc Dachary
12:00 PM Feature #9403: Make rados import/export fully functional and re-enable

Created wip-9403 to preserve a change needed to make this feature work. Also, the existing code sort of supports x...
David Zafman
10:20 AM Bug #10197: arch detection on armv8 must check asimd
Janne, I would very much appreciate a run of src/unittest_arch on ARMv8 if you can spare the time ( the branch is htt... Loïc Dachary
10:17 AM Bug #10197 (Fix Under Review): arch detection on armv8 must check asimd
https://github.com/ceph/ceph/pull/3035 Loïc Dachary
09:44 AM Bug #10202 (Can't reproduce): ceph_objecstore_tool.py : OSD has the store locked
Janne Grunau (jannau irc) can reproduce this reliably. I suspect it is because it does not wait long enough for the o... Loïc Dachary
09:25 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
https://github.com/ceph/ceph/pull/3034 Loïc Dachary
06:12 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
The same problem shows up when two OSDs are missing (k=2, m=2). Loïc Dachary
08:59 AM Bug #10199 (Fix Under Review): ceph --format xml daemon {daemon}.{id} config get is not valid XML
https://github.com/ceph/ceph/pull/3031 Loïc Dachary
03:03 AM Bug #10199 (Resolved): ceph --format xml daemon {daemon}.{id} config get is not valid XML
... Loïc Dachary
05:11 AM devops Bug #10200 (Rejected): tgtd error: undefined symbol: rbd_discard
Saw this error in syslog from an unrelated test. Presumably this is a bug somewhere?
/a/teuthology-2014-11-23_23:...
John Spray
12:58 AM Feature #10198: PG removal occupy the disk thread several hours
... Zhi Zhang
12:56 AM Feature #10198 (Resolved): PG removal occupy the disk thread several hours
We found an issue after we enable scrubbing/deep-scrubbing when doing recovering. The phenomenon is that all the rado... Zhi Zhang
12:57 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Pull request - https://github.com/ceph/ceph/pull/3029 Guang Yang

11/27/2014

06:27 PM Bug #10163: rados bench parameter -b producing wrong values when different blocksize used in writes
From "rados --help":
-b op_size
set the size of write ops for put or benchmarking
We can see -b only for w...
jianpeng ma
05:12 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Guang Yang wrote:
> Update more logs from the *crashed* OSD:
> [...]
>
> It seems that the peer OSD was marked d...
Guang Yang
03:41 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Update more logs from the *crashed* OSD:... Guang Yang
01:16 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
AFAR, if server rebind and client try to connect, client won't get CEPH_MSGR_TAG_SEQ tag from server because no repla... Haomai Wang
12:02 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
> The next question is, why B's in_seq is a very large number even after rebinding?
After more deep dive, I think th...
Guang Yang
02:33 PM Bug #10197 (Resolved): arch detection on armv8 must check asimd
instead of neon in https://github.com/ceph/ceph/blob/master/src/test/test_arch.cc... Loïc Dachary
09:17 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-26_09:31:02-upgrade:giant-x-next-distro-basic-vps/... Yuri Weinstein
06:04 AM Bug #10196: [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other nodes.
You are welcome! :) Nilamdyuti Goswami
05:50 AM Bug #10196 (Rejected): [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other...
Thanks for the update :-) Loïc Dachary
03:15 AM Bug #10196: [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other nodes.
The output of ssh in verbose mode just showed the error "Bad owner or permissions in /home/ceph/.ssh/config". So, the... Nilamdyuti Goswami
02:06 AM Bug #10196: [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other nodes.
See also https://github.com/ceph/ceph/pull/3007 Loïc Dachary
01:59 AM Bug #10196: [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other nodes.
Could you please attach the output of ssh in verbose mode when trying to connect to the remote host ? That will show ... Loïc Dachary
01:19 AM Bug #10196 (Rejected): [RHEL7] Modification of ~/.ssh/config in admin node restricts ssh to other...
While setting up Ceph cluster using RHEL7 VMs, I found that modifying ~/.ssh/config file in admin node with details i... Nilamdyuti Goswami

11/26/2014

11:34 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Add some peer's logs to prove the two-ways connect:... Guang Yang
11:14 PM Feature #9420 (Fix Under Review): erasure-code: tools and archive to check for non regression of ...
the "non regression tests":https://github.com/ceph/ceph/blob/master/qa/workunits/erasure-code/encode-decode-non-regre... Loïc Dachary
10:26 PM CephFS Bug #9997: test_client_pin case is failing
For 3.18+ kernel, I think we can iterate the all dir inodes and invalidate dentry one by one. Zheng Yan
12:19 AM CephFS Bug #9997: test_client_pin case is failing
yes, I think it caused by the d_invalidate change. In 3.18-rc kernel, d_invalidate() unhash dentry regardless if the... Zheng Yan
07:49 PM Feature #9951: librados, osd: per-object scrub operation
Hi sage:
I'm interested in this feature(From that i'll know the process of scurb). Is there somebody already did...
jianpeng ma
06:16 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Hi Dmitry,
A related bug is fixed. But I'm not fully sure whether fix this problem. So could you give a crash keyv...
Haomai Wang
05:01 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
It's been another week -- is there any chance to get this fixed please? Dmitry Smirnov
03:56 PM rgw Bug #10195 (Closed): s3 java jdk conn.getobject(...) (get s3 object) method fails with latest ver...
For instance,
in the Java example ( http://docs.ceph.com/docs/master/radosgw/s3/java/ )
The example: ...
Selwyn Jacobs
03:09 PM rgw Bug #10194 (Resolved): rgw: fcgi connections are not closed when using mod-proxy-fcgi
Yehuda Sadeh
09:06 AM Feature #10192 (Fix Under Review): ceph_objectstore_tool object lookup
https://github.com/ceph/ceph/pull/3020 Loïc Dachary
05:04 AM Feature #10192 (Resolved): ceph_objectstore_tool object lookup
It would be convenient for test purposes to have... Loïc Dachary
07:22 AM Feature #10193 (Resolved): Perf counter for WBThrottle
Since sync thread will cause unstable iops and latency performance curve, we may want to WBThread do more(or moderate... Haomai Wang
04:40 AM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Wenjun Huang wrote:
> Samuel Just wrote:
> > This should probably be a feature request for the backlog. We need a ...
Wenjun Huang
02:18 AM devops Bug #9665: ceph-disk zap should call partprobe
* firefly backport https://github.com/ceph/ceph/pull/3014
* dumpling backport https://github.com/ceph/ceph/pull/3015
Loïc Dachary

11/25/2014

04:07 PM rgw Bug #8233: Installation & Documentation broken for Ubuntu Trusty 14.04 - rgw
The 100-Continue stuff was all in fastcgi, not httpd. So you can use Ubunu's httpd 2.4 if you want. Here's the patch ... Ken Dreyer
03:09 PM rgw Documentation #10142 (Resolved): Update S3 compatibility table to reflect bucket location support
Neil Levine
03:06 PM rgw Feature #10191 (Resolved): rgw: object versioning multi-zone support
Neil Levine
03:05 PM rgw Feature #8216 (Fix Under Review): rgw: object versioning objclass support
Yehuda Sadeh
03:05 PM rgw Feature #8218 (Fix Under Review): rgw: object versioning manifest changes
Yehuda Sadeh
03:05 PM rgw Feature #8217 (Fix Under Review): rgw: object versioning object overwrite / delete changes
Neil Levine
03:03 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
I think https://github.com/ceph/ceph-qa-suite/pull/250 reproduces the problem reliably. ... Loïc Dachary
02:47 PM Bug #10017 (In Progress): OSD wrongly marks object as unfound if only the primary is corrupted fo...
Loïc Dachary
03:03 PM Bug #10126 (New): "Exception: timed out waiting for admin_socket" in upgrade:giant-x-next-distro-...
Yuri Weinstein
03:02 PM Bug #10126: "Exception: timed out waiting for admin_socket" in upgrade:giant-x-next-distro-basic-...
Still an issue, not sure what to make of it.
Run http://pulpito.ceph.com/teuthology-2014-11-24_17:05:01-upgrade:gian...
Yuri Weinstein
01:23 PM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-24_17:18:03-upgrade:firefly-x-next-distro-basic-vp... Yuri Weinstein
12:13 PM Documentation #6465: admin/build-doc should have some kind of build check for broken links
No one else agrees this is urgent, so, dropping pri Dan Mick
10:52 AM rgw Bug #10188 (Won't Fix): Can not create new rgw user when specifying an email already assigned to ...
The error can be seen in the output of the commands below. In the first command we create a user "jj1" with an email ... JuanJose Galvez
10:25 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
* firefly backport https://github.com/ceph/ceph/pull/3009
* giant backport https://github.com/ceph/ceph/pull/3010
Loïc Dachary
10:03 AM Bug #10018 (Pending Backport): OSD assertion failure if the hinfo_key xattr is not there (corrupt...
Loïc Dachary
09:27 AM CephFS Fix #10135 (Fix Under Review): OSDMonitor: allow adding cache pools to cephfs pools already in use
https://github.com/ceph/ceph/pull/3008 John Spray
08:44 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Committed vps.yaml on master, gaint and next
Fixed syntax for
@mon lease = 15@
to
@mon lease: 15@
Yuri Weinstein
06:55 AM Bug #9487 (Pending Backport): dumpling: snaptrimmer causes slow requests while backfilling. osd_s...
oops, still need firefly Sage Weil
06:19 AM devops Bug #9665 (Fix Under Review): ceph-disk zap should call partprobe
Loïc Dachary
06:19 AM devops Bug #9665: ceph-disk zap should call partprobe
* giant backport https://github.com/ceph/ceph/pull/3005 Loïc Dachary
06:17 AM Bug #10183: OSD dispatcher thread hangs several seconds due to contention for osd_lock
https://github.com/ceph/ceph/pull/3004 Loïc Dachary
05:51 AM Bug #9073 (Resolved): OSD with device/partition journals down after fresh deploy or upgrade to 0.83
Loïc Dachary
05:46 AM Feature #9728: erasure-code: jerasure support for NEON
Loïc Dachary
05:13 AM Bug #10185 (Resolved): neon runtime detection is always false
Loïc Dachary
02:01 AM Bug #10185 (Fix Under Review): neon runtime detection is always false
https://github.com/ceph/ceph/pull/3003 Loïc Dachary
04:42 AM CephFS Bug #9997: test_client_pin case is failing
After much head scratching and log examination, this appears to be a kernel regression (assuming our behaviour was va... John Spray
04:00 AM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
Samuel Just wrote:
> This should probably be a feature request for the backlog. We need a test reproducing it and s...
Wenjun Huang

11/24/2014

11:31 PM Bug #10185 (Resolved): neon runtime detection is always false
The neon CPU feature detection function should test if the number of elements returned is 1 "instead of the size of t... Loïc Dachary
10:21 PM Bug #10166 (Fix Under Review): fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc...
https://github.com/ceph/ceph/pull/3000
I'm not fully ensure that this issue is caused by inconsistence size. Maybe...
Haomai Wang
12:14 PM Bug #10166: fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc: 8706: FAILED asse...
Hmm, this might be as simple as truncating out to the full copy_range size. Samuel Just
12:11 PM Bug #10166: fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc: 8706: FAILED asse...
Going back to the full log, it appears to be related to _do_sparse_copy_range and therefore fiemap:
2014-11-23 19:...
Samuel Just
11:38 AM Bug #10166: fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc: 8706: FAILED asse...
/a/teuthology-2014-11-23_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/615700
2014-11-23 19:18:14.224698 7f4...
Samuel Just
07:56 PM rgw Documentation #10184 (Resolved): rgw: document swift temp url functionality
Swift-temp-url functionality seems to have been merged from v0.78 or so. (Feature #3454) This needs to be documented.... Abhishek Lekshmanan
06:41 PM Bug #10183 (Resolved): OSD dispatcher thread hangs several seconds due to contention for osd_lock
Recently when investigating the long tail latency during backfilling/recovering, I found on some OSDs, the dispatcher... Guang Yang
06:21 PM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Sage, OK
I changed vps.ayml on teuthology to:...
Yuri Weinstein
05:36 PM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Yuri, new plan: let's just add 'mon lease = 15' to vps.yaml and see if this comes up again. Sage Weil
12:04 PM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
on giant fix - https://github.com/ceph/ceph-qa-suite/pull/252 Yuri Weinstein
11:24 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
RE: https://github.com/ceph/ceph-qa-suite/pull/251... Yuri Weinstein
11:21 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
ok, new plan. instead of changing mon behavior, make the tests more resilient.
if we upgrade all mons, then resta...
Sage Weil
11:02 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
https://github.com/ceph/ceph/pull/2999 Sage Weil
11:00 AM Bug #10178 (Fix Under Review): mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Sage Weil
10:46 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
... Sage Weil
10:34 AM Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
... Sage Weil
10:29 AM Bug #10178 (Resolved): mon rejects peer during election based on OSD_SET_ALLOC_HINT feature?
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_17:00:03-upgrade:firefly:newer-firefly-distro-b... Yuri Weinstein
05:19 PM CephFS Bug #10151 (Pending Backport): mds client cache pressure health warning oscillates on/off
Merged to master as of commit:aa4d1478647ce416e9cf4e8fcd32411230639f40. I like to let things go through testing befor... Greg Farnum
09:20 AM CephFS Bug #10151: mds client cache pressure health warning oscillates on/off
Opened PR against master instead of next by mistake. Next PR is https://github.com/ceph/ceph/pull/2996 John Spray
03:16 AM CephFS Bug #10151 (Fix Under Review): mds client cache pressure health warning oscillates on/off
master: https://github.com/ceph/ceph/pull/2989
giant: https://github.com/ceph/ceph/pull/2990
John Spray
12:11 PM Bug #10165 (Duplicate): ceph_test_rados got short read
Samuel Just
12:10 PM Bug #10165: ceph_test_rados got short read
I think this also is due to enabling fiemap in the nightlies, teuthology commit:
0f97481ce44e0487ac6cffa051a05590f...
Samuel Just
10:34 AM Bug #10165: ceph_test_rados got short read
Repeated issue in run http://pulpito.ceph.com/teuthology-2014-11-22_17:05:01-upgrade:giant-x-next-distro-basic-multi/... Yuri Weinstein
08:36 AM Bug #10165: ceph_test_rados got short read
Same problem in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-23_09:54:59-powercycle-giant-distro-basic-... Yuri Weinstein
11:28 AM rbd Bug #10180 (Resolved): qemu tests crash host kernel
... Sage Weil
11:16 AM Bug #10176: Segmentation fault in upgrade:firefly:singleton-firefly-distro-basic-vps run
I think we should remove import_export.sh from these tag-based upgrades, where we're hitting issues that are fixed la... Josh Durgin
09:12 AM Bug #10176 (Resolved): Segmentation fault in upgrade:firefly:singleton-firefly-distro-basic-vps run
Log are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_17:05:01-upgrade:firefly:singleton-firefly-distr... Yuri Weinstein
11:04 AM rbd Bug #10123 (Pending Backport): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vp...
Josh Durgin
10:49 AM Bug #8204: "timed out waiting for admin_socket to appear after osd.5 restart" in upgrade:dumpling...
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-21_17:18:01-upgrade:firefly-x-next-distro-basic-... Yuri Weinstein
10:18 AM Bug #9487 (Resolved): dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim...
Sage Weil
10:17 AM Bug #9113 (Resolved): osd: snap trimming eats memory, linearly
Sage Weil
09:47 AM Bug #10097: failed: mon_thrash
Same issue in run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_17:15:01-upgrade:giant-giant-distro-basic... Yuri Weinstein
09:41 AM rgw Bug #10177 (Can't reproduce): test_multipart_upload failed in upgrade:dumpling-firefly-x:parallel...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_18:15:02-upgrade:dumpling-firefly-x:parallel-gi... Yuri Weinstein
09:25 AM Bug #9920: admin socket check hang, osd appears fine
Same issue in run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_17:18:02-upgrade:firefly-x-next-distro-ba... Yuri Weinstein
09:19 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Same issue on next in run http://pulpito.ceph.com/teuthology-2014-11-22_17:18:02-upgrade:firefly-x-next-distro-basic-... Yuri Weinstein
09:07 AM CephFS Bug #9997 (In Progress): test_client_pin case is failing
John Spray
07:40 AM devops Feature #10046: run make check on every pull request
http://tracker.ceph.com/issues/10175 will make it possible to rely on the content of deps.deb.txt to install the requ... Loïc Dachary
07:33 AM Feature #9817 (Resolved): display X.XX deep-scrub starts
Loïc Dachary
06:51 AM Bug #10175 (Fix Under Review): deps.deb.txt is obsolete
https://github.com/ceph/ceph/pull/2994 Loïc Dachary
05:18 AM Bug #10175 (Resolved): deps.deb.txt is obsolete
It is not consistently maintained because it is not tested Loïc Dachary
04:30 AM Feature #9728 (In Progress): erasure-code: jerasure support for NEON
Loïc Dachary
04:29 AM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Loïc Dachary
03:41 AM Bug #10173 (Fix Under Review): autogen.sh will fail if submodule URL changes
https://github.com/ceph/ceph/pull/2992 Loïc Dachary
03:30 AM Bug #10173 (Resolved): autogen.sh will fail if submodule URL changes
After an initial "git submodule update":https://github.com/ceph/ceph/blob/master/autogen.sh#L32, if the URL of a subm... Loïc Dachary
02:29 AM Linux kernel client Bug #10141: rbd_img_obj_request_fill+0x81/0x200
For the record, since the stack trace doesn't explain anything, this was the following BUG_ON in osd_req_op_extent_in... Ilya Dryomov
02:21 AM Linux kernel client Bug #10141 (Resolved): rbd_img_obj_request_fill+0x81/0x200
Fixed with "rbd: don't treat CEPH_OSD_OP_DELETE as extent op". I rebased it into testing before "libceph: add CREATE... Ilya Dryomov
12:59 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
I am wondering if the following race occurred:
Let us assume A and B are two OSDs having the connection (pipe) bet...
Guang Yang

11/23/2014

11:32 PM Bug #10017 (In Progress): OSD wrongly marks object as unfound if only the primary is corrupted fo...
Loïc Dachary
11:29 PM Bug #10018 (Fix Under Review): OSD assertion failure if the hinfo_key xattr is not there (corrupt...
Loïc Dachary
11:08 PM Feature #10172 (Resolved): AsyncMessenger: Bind async thread to special cpu core
Now, 1-2 async op thread can fully meet a OSD's network demand with SSD backend. So maybe we can bind limited thread ... Haomai Wang
10:56 PM rgw Bug #10145: rgw swift functional test: testChunkedPut (test.functional.tests.TestFileUTF8)
I am using Ubuntu 14.04
And these are my apache ang fast cgi version:...
Shambhu Rajak
08:31 PM Bug #10166: fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc: 8706: FAILED asse...
ubuntu@teuthology:/a/teuthology-2014-11-23_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/615700 Sage Weil
08:30 PM CephFS Bug #9997: test_client_pin case is failing
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_23:04:01-fs-next-testing-basic-multi/603971/ Greg Farnum
06:53 PM Bug #9998 (Fix Under Review): Replaced OSD weight below 0
https://github.com/ceph/ceph/pull/2986 Sage Weil

11/22/2014

09:55 PM Bug #10171 (Resolved): DBObjectMap: ghobject_t header key excludes hash for EC pools
... Sage Weil
09:16 PM rbd Bug #9771 (Won't Fix): Segmentation fault after upgrade v0.80.5 -> v0.80.6
Sage Weil
09:12 PM rbd Feature #6228 (Resolved): image name metavariable
Sage Weil
11:55 AM Bug #10085 (In Progress): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
It looks to me like this is a result of our naughty rwlock handling: https://github.com/ceph/ceph/pull/2937
There'...
Greg Farnum
10:13 AM Bug #9998: Replaced OSD weight below 0
I've just reproduced this on my test cluster. I'm using Ceph v0.87 (@c51c8f9d80fa4e0168aa52685b8de40e42758578@) with ... Pawel Sadowski
09:21 AM Bug #9998: Replaced OSD weight below 0
I'm still having trouble reproducing this. :( Maybe you can attach a copy of an osdmap just prior to adding the osd?... Sage Weil
04:04 AM Bug #9998: Replaced OSD weight below 0
Dan van der Ster wrote:
> In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been add...
Pawel Sadowski
03:58 AM Bug #9998: Replaced OSD weight below 0
In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been added to the crush map by the ... Dan van der Ster
09:17 AM rgw Bug #10103 (Pending Backport): swift tests failing
Sage Weil

11/21/2014

11:53 PM Bug #9998: Replaced OSD weight below 0
I'm changing weights manually by editing CRUSH map. Pawel Sadowski
05:48 PM Bug #9998: Replaced OSD weight below 0
wip-9998 has 2 fixes, but i'm convinced they are the same bug... Sage Weil
05:39 PM Bug #9998 (Need More Info): Replaced OSD weight below 0
Can you clarify how you are doing this?
"change host weight while not changing OSD weights (i.e. sum(weight(osd)...
Sage Weil
09:22 PM Bug #10052 (Pending Backport): LibRadosTwoPools[EC]PP.PromoteSnap failure
Sage Weil
07:52 PM Messengers Bug #10022 (Resolved): AsyncMessenger: Wrong newly_acked_seq when replacing existing connection
Haomai Wang
05:31 PM Bug #10004: ceph osd find does not correctly report crush locations
What version is this? I can't reproduce it on giant. Sage Weil
05:28 PM Bug #10165: ceph_test_rados got short read
2014-11-20T21:38:21.790 INFO:tasks.rados.rados.0.vpm200.stdout:only read 3829760 out of size 3832037
Sage Weil
07:51 AM Bug #10165 (Duplicate): ceph_test_rados got short read
Runs:
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-20_17:05:01-upgrade:giant-x-next-distro-basic-vps/
Job...
Yuri Weinstein
04:23 PM Bug #10168: dumpling: Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty())
that was in suite:upgrade:dumpling-x Yuri Weinstein
04:18 PM Bug #10168 (Resolved): dumpling: Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_...
Samuel Just
03:00 PM Bug #10168: dumpling: Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty())
Samuel Just
03:00 PM Bug #10168: dumpling: Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty())
have branch, testing build on wip-sam-dumpling-testing. Caused by the backport, 03c5344f74991ec351cdc8a55f6495d49647... Samuel Just
02:50 PM Bug #10168 (Resolved): dumpling: Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_...
Assertion: osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty())
ceph version 0.67.11-42-g103c6a0 (103...
Samuel Just
03:01 PM Bug #10167 (Duplicate): osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty()) in up...
Samuel Just
01:42 PM Bug #10167 (Duplicate): osd/ReplicatedPG.cc: 7573: FAILED assert(!pg_log.get_log().empty()) in up...
ubuntu@teuthology:/a/teuthology-2014-11-20_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/611779... Sage Weil
02:30 PM rgw Bug #9206: rgw: cross rgw message headers filtered by apache 2.4
Because this came up on ceph-users recently: this is fixed in master with this commit:... Ken Dreyer
12:34 PM CephFS Bug #9674 (Resolved): nightly failed multiple_rsync.sh
I haven't seen this fail since then, hurray. Greg Farnum
11:30 AM Bug #10163: rados bench parameter -b producing wrong values when different blocksize used in writes
We should at least fix up the output to warn about this and not lie, even if we don't respect the requested block siz... Greg Farnum
01:13 AM Bug #10163 (Resolved): rados bench parameter -b producing wrong values when different blocksize u...
The -b (blocksize) parameter used in rados bench does produce wrong measurements iff a preceeding rados bench write w... René Gallati
10:19 AM rgw Bug #10162: s3tests-test-readwrite failure
This appears to be happening on the overnight tests: See #10108 Anonymous
10:00 AM Bug #6003: journal Unable to read past sequence 406 ...
ubuntu@teuthology:/a/sage-2014-11-20_17:03:30-rados:thrash-wip-watch-notify-distro-basic-multi/611427 Sage Weil
09:10 AM rbd Bug #10122 (New): "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vps run
Jason Dillaman
06:06 AM rbd Bug #10122: "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vps run
The explanation is that there was a race condition between deleting a pool and unprotecting the snapshot. When unpro... Jason Dillaman
05:06 AM rbd Bug #10122 (In Progress): "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vp...
Jason Dillaman
08:54 AM Linux kernel client Bug #10141 (In Progress): rbd_img_obj_request_fill+0x81/0x200
Ilya Dryomov
08:23 AM devops Fix #5900: Create a Python package for ceph Python bindings
Discussed in IRC today with Alfredo and others: We're going to keep the pyceph modules as individual Python packages ... Ken Dreyer
08:03 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
Today I restarted every mon and osd on the test cluster (again) and confirmed it is all running 0.67.11-4-g496e561. N... Dan van der Ster
07:55 AM Bug #10166 (Resolved): fiemap or FileStore::do_sparse_copy_range bug: osd/ReplicatedPG.cc: 8706: ...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-20_17:05:01-upgrade:giant-x-next-distro-basic-vps/... Yuri Weinstein
07:10 AM Documentation #9867: PGs per OSD documentation needs clarification
It should also be noted that the PG per Pool distribution should be directly proportional to the overall distribution... Michael Kidd
06:46 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
I've been seeing similar issues using Ubuntu 14.04 as a guest VM. RT throttling occurs, so I tried disabling all the ... Warren Wang
06:39 AM CephFS Bug #10151 (In Progress): mds client cache pressure health warning oscillates on/off
Reproduced this locally by just allowing 3 mons in a vstart cluster and following the procedure from the mds_client_l... John Spray
12:50 AM CephFS Bug #10151: mds client cache pressure health warning oscillates on/off
Yes -- the leader is reporting the health warning but the peons are not.
The warning is "Client 2922132 failing to...
John Spray
06:34 AM CephFS Fix #10135: OSDMonitor: allow adding cache pools to cephfs pools already in use
Yeah, we didn't think about this first time around because the focus was on cache tiers to EC pools, but it would mak... John Spray
03:58 AM CephFS Bug #10164: Dirfrag objects for deleted dir not purged until MDS restart
Alternatively less contrived way to see the issue: just do a loop of "cp -r /etc . ; rm -rf ./etc" in a filesystem mo... John Spray
03:14 AM CephFS Bug #10164 (Resolved): Dirfrag objects for deleted dir not purged until MDS restart

Seen while playing with the #9881 flush functionality: the dirfrag objects for deleted directories are never cleane...
John Spray
01:24 AM Bug #10085: dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
PPS Bug info: https://sourceware.org/bugzilla/show_bug.cgi?id=17561
My patch posted here https://bugs.gentoo.org/sho...
Denis kaganovich

11/20/2014

11:20 PM Bug #10119 (Resolved): 0.88 EC+ KV OSDs crashing
Haomai Wang
09:50 PM Bug #10085: dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
PS Bug is filled to (https://bugs.gentoo.org/show_bug.cgi?id=529076 ), but I think there was near "vanilla" case. Denis kaganovich
09:41 PM Bug #10085: dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
"Own packages"? It is Gentoo.
Denis kaganovich
06:35 PM rbd Bug #10123 (Fix Under Review): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vp...
Jason Dillaman
05:46 PM rgw Bug #10162 (Duplicate): s3tests-test-readwrite failure
Running teuthology using the following yaml file failed:... Anonymous
04:42 PM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
https://github.com/ceph/ceph-qa-suite/pull/250 teuthology tests Loïc Dachary
02:44 PM Bug #10018 (In Progress): OSD assertion failure if the hinfo_key xattr is not there (corrupted?) ...
Loïc Dachary
03:47 PM Bug #10028 (Duplicate): ec_lost_unfound failing on giant
#10065 Sage Weil
02:42 PM Bug #10028: ec_lost_unfound failing on giant
Loïc Dachary
03:35 PM rgw Feature #10159 (New): rgw: sync agent support for object versioning
Yehuda Sadeh
03:35 PM rgw Feature #10158 (Closed): rgw: sync agent support for bucket sharding
Yehuda Sadeh
03:17 PM Bug #10157 (Resolved): PGLog::(read|write)_log don't write out rollback_info_trimmed_to
In practice, this means that replicated pgs will scan their log on the first operations after boot needlessly. EC pg... Samuel Just
02:51 PM Bug #10150: osd/ReplicatedPG.cc: 10853: FAILED assert(r >= 0) (in _scan_range)
Samuel Just
02:46 PM Bug #10150: osd/ReplicatedPG.cc: 10853: FAILED assert(r >= 0) (in _scan_range)
Hah, that assert isn't valid. The object in question might be in the process of being removed *if* it is at the star... Samuel Just
08:34 AM Bug #10150 (Resolved): osd/ReplicatedPG.cc: 10853: FAILED assert(r >= 0) (in _scan_range)
... Sage Weil
02:43 PM Bug #9810: dout_emergency is silenced in ceph-osd
Loïc Dachary
02:42 PM Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool
Loïc Dachary
02:42 PM Feature #9728: erasure-code: jerasure support for NEON
Loïc Dachary
02:42 PM Bug #9485: Monitor crash due to wrong crush rule set
Loïc Dachary
02:42 PM Bug #8741: osd: ec plugin leak
Loïc Dachary
02:41 PM Bug #10065: hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
Loïc Dachary
08:43 AM Bug #10065: hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-17_02:32:01-rados-giant-distro-basic-multi/604495
...
Sage Weil
02:41 PM Bug #9785: /etc/ceph/dmcrypt-keys and key contents are created world-readable
Loïc Dachary
02:34 PM Bug #8011: osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
Samuel Just
02:20 PM Bug #8011: osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
Urgh, non-blocking flushes do not cause scrub to pause. I think the simplest solution is to fail a non-blocking scru... Samuel Just
08:38 AM Bug #8011: osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
this popped up again: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-17_02:32:01-rados-giant-distr... Sage Weil
02:29 PM Feature #10156 (Rejected): Backport cluster_fingerprint feature to Dumpling
Sometimes the subject says it all. Neil Levine
01:18 PM rbd Feature #10154 (Resolved): librbd: use early snapshot context for copyup operations so snapshots ...
If we send the copyup operation with a snapshot context with an empty list of snap ids and a snap seq before the earl... Josh Durgin
12:09 PM rgw Bug #10015 (In Progress): rgw sync agent: 403 when syncing object that has tilde in its name
Confirmed that requests is doing the unquoting of a quoted url with the tilde char:... Alfredo Deza
12:09 PM Bug #10153 (Resolved): Rados.shutdown() dies with Illegal instruction (core dumped)
In rados.py, Rados.shutdown() produces "Illegal instruction (core dumped)" when called.
To test, try applying the ...
Joe Julian
11:52 AM devops Bug #10152: drop tiobench references
This involves deleting the associated Jenkins task as well: http://jenkins.ceph.com/job/tiobench Ken Dreyer
11:50 AM devops Bug #10152 (Rejected): drop tiobench references
Both Fedora and Debian have dropped their tiobench packages from their distros because tiobench failed to build from ... Ken Dreyer
11:46 AM rbd Bug #10149 (Duplicate): Giant: data corruption with console rbd export
This is a duplicate of #9936 and will be backported to Giant soon. In the meantime, as a workaround you can export t... Jason Dillaman
07:42 AM rbd Bug #10149 (Duplicate): Giant: data corruption with console rbd export
- borrow large non-zeroed image, ten gig in my case
- upload it via cli, format 2 is set
- try to download the imag...
Andrey Korolyov
11:46 AM devops Bug #9793: Fedora 20 ceph-extras Repo missing
Both Fedora and Debian have dropped their tiobench packages from their distros because tiobench failed to build from ... Ken Dreyer
11:41 AM Bug #8797: "ceph status" do not exit with python_2.7.8
In order to get the exit code, I tried this:... Joe Julian
10:19 AM Bug #9921: msgr/osd/pg dead lock giant
passed a smoke test, ready for rados run Sage Weil
10:08 AM Bug #8978: ceph ping not working as expected
Can you give me an example of how you are running the ceph ping command and the output you are seeing. I just tested ... Eric Eastman
09:59 AM rgw Bug #10145: rgw swift functional test: testChunkedPut (test.functional.tests.TestFileUTF8)
Chunked put can fail if running the wrong fastcgi module. Yehuda Sadeh
09:59 AM CephFS Bug #10151 (Resolved): mds client cache pressure health warning oscillates on/off
seeing this on lab cluster. not sure if it is a problem in the mds health reporting or the mon, but it goes on and o... Sage Weil
07:30 AM devops Bug #10148 (Rejected): Giant/Wheezy SysV: /etc/init.d/ceph -a start shifts crushmap to executing ...
Got:... Andrey Korolyov
07:17 AM Messengers Feature #10147 (Resolved): Add unittest for Messenger
Haomai Wang
06:50 AM Bug #10146: ceph-disk: sometimes the journal symlink is not created
I've pushed the alternative fix in the same pull req. Dan van der Ster
02:15 AM Bug #10146 (In Progress): ceph-disk: sometimes the journal symlink is not created
I like the idea of not changing the uuid Loïc Dachary
01:35 AM Bug #10146 (Resolved): ceph-disk: sometimes the journal symlink is not created
Hi,
We observed in practise that sometimes the journal symlink is not created during a ceph-disk prepare run.
En...
Dan van der Ster

11/19/2014

10:29 PM Linux kernel client Feature #9906: Inline data support
Zheng Yan
09:43 PM rgw Bug #10145 (Can't reproduce): rgw swift functional test: testChunkedPut (test.functional.tests.Te...
... Shambhu Rajak
09:36 PM rgw Bug #10144 (Can't reproduce): rgw swift functional test: testChunkedPut (test.functional.tests.Te...
... Shambhu Rajak
06:20 PM Bug #8797: "ceph status" do not exit with python_2.7.8
This works around the problem, while also destroying the exit code from the ceph program, so if you rely on that, thi... Dan Mick
05:32 PM CephFS Bug #10131 (Resolved): kclient: dentry still in use on umount
Zheng Yan
02:20 PM rgw Bug #10099 (Duplicate): radosgw-agent - error geting op state: list index out of range
Brian Andrus
12:37 PM rgw Bug #10102 (Fix Under Review): sync agent: does not handle gracefully transient errors
PR opened: https://github.com/ceph/radosgw-agent/pull/11 Alfredo Deza
12:21 PM Bug #10138: osd: crash in SnapSet::from_snap_set
Whoops! I misread the version. Do you have a core file? Sage Weil
12:09 PM Bug #10138: osd: crash in SnapSet::from_snap_set
Sage Weil wrote:
> Please try using the latest firefly release. 0.87 is comparatively old.. many bugs were fixed in...
Paul Emmerich
10:45 AM Bug #10138 (Rejected): osd: crash in SnapSet::from_snap_set
Please try using the latest firefly release. 0.87 is comparatively old.. many bugs were fixed in 0.80.5 and again in... Sage Weil
04:48 AM Bug #10138: osd: crash in SnapSet::from_snap_set
Sorry for the formatting fail. The crash log was supposed to look like this:... Paul Emmerich
04:46 AM Bug #10138 (Can't reproduce): osd: crash in SnapSet::from_snap_set
We are running ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) in a 3-server setup with 18 OSDs (4 HDDs ... Paul Emmerich
12:07 PM Bug #10128 (Resolved): ceph_objectstore_tool --op export to stdout broken
Backport to giant as 6cb9a2499cac2645e2cc6903ab29dfd95aac26c7 David Zafman
12:05 PM rgw Documentation #10142 (Resolved): Update S3 compatibility table to reflect bucket location support
Table at http://ceph.com/docs/master/radosgw/s3/ Neil Levine
12:05 PM Bug #9439 (Resolved): pg_op_must_wait() not checking FILTER variants
David Zafman
05:18 AM Bug #9439: pg_op_must_wait() not checking FILTER variants
https://github.com/ceph/ceph/pull/2962 Loïc Dachary
12:05 PM Bug #10077 (Resolved): ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
David Zafman
11:13 AM Linux kernel client Bug #10141 (Resolved): rbd_img_obj_request_fill+0x81/0x200
... Sage Weil
09:43 AM rgw Feature #9932 (Fix Under Review): rgw: map swift X-Storage-Policy header to rgw pools
Yehuda Sadeh
05:50 AM rbd Bug #10139 (New): librbd cpu usage 4x higher than krbd
librbd cpu usage is quite huge currently, around 4-5x higher than krbd.
(Tested with fio+krbd vs fio+librbd, rando...
alexandre derumier

11/18/2014

11:51 PM CephFS Bug #10131: kclient: dentry still in use on umount
it's a VFS bug. fixed by... Zheng Yan
11:04 PM CephFS Bug #10131 (In Progress): kclient: dentry still in use on umount
Zheng Yan
09:20 AM CephFS Bug #10131 (Resolved): kclient: dentry still in use on umount
... Greg Farnum
11:31 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Here are two new logs -- only filestore OSDs are up, all KV OSD are down.
Dmitry Smirnov
11:07 PM Bug #10119 (Fix Under Review): 0.88 EC+ KV OSDs crashing
https://github.com/ceph/ceph/pull/2966 Haomai Wang
08:06 AM Bug #10119: 0.88 EC+ KV OSDs crashing
Thankyou, I have started a 3 osds keyvaluestore cluster to do benchmark and try to trigger crash Haomai Wang
02:53 AM Bug #10119: 0.88 EC+ KV OSDs crashing
I added the debug_keyvaluestore logging, and restarted them. The osds starting to crash immediately again, but there ... Kenneth Waegeman
05:55 PM devops Bug #9665 (Pending Backport): ceph-disk zap should call partprobe
let's wait a week or two before backporting Loïc Dachary
05:54 PM devops Bug #9665 (Resolved): ceph-disk zap should call partprobe
Loïc Dachary
07:05 AM devops Bug #9665: ceph-disk zap should call partprobe
... Loïc Dachary
05:44 PM Bug #10114 (Resolved): assembly files need annotation to assert that stack should not be executable
https://github.com/ceph/ceph/pull/2961 Loïc Dachary
05:40 PM Bug #10114: assembly files need annotation to assert that stack should not be executable
https://github.com/ceph/ceph/pull/2963 Loïc Dachary
01:16 PM Bug #10114: assembly files need annotation to assert that stack should not be executable
https://github.com/ceph/ceph/pull/2946 Loïc Dachary
01:15 PM Bug #10114: assembly files need annotation to assert that stack should not be executable
Looks like it's merged, does this need to be backported? Samuel Just
05:04 PM Bug #10118: messenger drops messages between osds
Samuel Just wrote:
> If you can reproduce with logs, that would help. The repops are supposed to complete in strict...
Guang Yang
01:12 PM Bug #10118: messenger drops messages between osds
If you can reproduce with logs, that would help. The repops are supposed to complete in strict order, this could be ... Samuel Just
04:29 PM devops Bug #9793: Fedora 20 ceph-extras Repo missing
This must've been an oversight when we started shipping Fedora 20 packages. It led me to dig into why we're still shi... Ken Dreyer
03:40 PM CephFS Fix #10135 (Resolved): OSDMonitor: allow adding cache pools to cephfs pools already in use
Right now we disallow this with _check_remove_tier(), I believe because we were worried about coordinating the switch... Greg Farnum
03:12 PM rgw Bug #10103: swift tests failing
Seem to be the same problem in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-17_17:15:01-upgrade:dumplin... Yuri Weinstein
03:11 PM Bug #10107 (Duplicate): Coredump in upgrade:giant-x-next-distro-basic-multi run
I think this is caused by the same thing as 10059, duplicate marking. Samuel Just
03:10 PM Bug #7996: 0.78: OSD is not suspend-friendly (unresponsive cluster on OSD crash)
"Won't fix" should be normally accompanied by explanation... Dmitry Smirnov
01:53 PM Bug #7996 (Won't Fix): 0.78: OSD is not suspend-friendly (unresponsive cluster on OSD crash)
Samuel Just
03:06 PM Bug #9788 (Rejected): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeou...
I think this one is the giant messenger deadlock, #9921, updated 9921, closing this ticket again. Samuel Just
03:05 PM Bug #9921: msgr/osd/pg dead lock giant
I think this is another instance:
ubuntu@teuthology:/a/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-...
Samuel Just
02:37 PM CephFS Feature #1398: qa: multiclient file io test
Answering my own question: Item 2 above. It looks like this can all be done from python. Anonymous
02:21 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
As per jdillaman's suggestion on IRC, I have backed off to the PVE 2.6.32-34-pve kernel from 3.10.0-5-pve and can no ... Brad House
12:49 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
second batch dump from same locked process as requested Brad House
12:43 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
and a 3rd for good measure Brad House
12:41 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
another attempt and trace Brad House
12:30 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Attached gdb output with libc and qemu debug symbols. Brad House
09:31 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Greg, incidentally several of the attached backtraces show the Pipe reader thread waiting on the pipe lock:... Jason Dillaman
09:03 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Jason, exactly what information is making you think the Pipe is hung waiting on a lock? And what version is in use ri... Greg Farnum
07:48 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Logs show that the pipe reader to osd.0 is hung waiting for the pipe lock. The last message from that thread is:
<p...
Jason Dillaman
05:56 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Thanks, I'll start reviewing these this morning. Jason Dillaman
05:27 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Attacked is the blktrace of the latest lockup.
Then the qemu output exceeded your max file size (by a couple of KB),...
Brad House
04:12 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Sure, I'll do that this morning first. Then I found the repo that proxmox is using to build qemu-kvm, so I'll rebuil... Brad House
02:01 PM Bug #10104 (Fix Under Review): rados.py: wait_for_* don't wait; should have poll, wait, and wait+...
Dan Mick
01:57 PM Bug #8978 (Can't reproduce): ceph ping not working as expected
Joao Eduardo Luis
01:52 PM Bug #9369 (Can't reproduce): init: ceph-osd (...) main process (...) killed by ABRT signal
Samuel Just
01:50 PM Bug #9439 (Fix Under Review): pg_op_must_wait() not checking FILTER variants
David Zafman
01:48 PM Bug #9438 (Resolved): librados API generated doc broken
Samuel Just
01:45 PM Bug #9738: rados cli: objects not present in a snapshot are listed anyway
Ugh, need to look at the object info to do this right. Should either fix or change docs or remove. Samuel Just
01:45 PM Feature #9720: erasure-code: non regression should test jerasure variants
Loïc Dachary
01:43 PM Bug #9748 (Rejected): Dead jobs in upgrade:dumpling-x-firefly-distro-basic-multi run
Probably some kind of networking issue, closing until we get more intel. Samuel Just
01:42 PM Bug #9784: All tools should be named consistently and argument parsing should be better
any tool that a user uses should have -'s. the source files should always use _'s.. just the final executable uses -... Sage Weil
01:41 PM Bug #9784: All tools should be named consistently and argument parsing should be better
ceph-objectstore-tool. Generally, dashes for things people actually use, underscores for tests. Samuel Just
01:42 PM Bug #9751 (Rejected): ceph tell osd.6 version hangs
Samuel Just
01:38 PM Bug #9801: ceph 0.80.7 build rpm packages in centos 7 error
mkcephfs is removed post-firefly anyway, ignore the warning Sage Weil
01:38 PM Bug #9801 (Won't Fix): ceph 0.80.7 build rpm packages in centos 7 error
Samuel Just
01:37 PM Feature #7104 (New): rest-api: support commands requiring 'w' cap without 'rw' cap
the 'mds set' command is 'rw'. confused what the bug is... pls reopen and clarify if this is still an issue Sage Weil
01:37 PM Feature #7104 (Rejected): rest-api: support commands requiring 'w' cap without 'rw' cap
Samuel Just
01:37 PM Bug #9818 (Resolved): ENXIO qa/workunits/cephtool/test.sh:test_osd_bench
did not happen for a long time, looks like it's stable at last Loïc Dachary
01:36 PM Bug #10132 (Resolved): osd: tries to set ioprio when the config option is blank
Saw this in a log:... Greg Farnum
01:35 PM Bug #9761 (Rejected): ceph-osd: segfault at 654c30 ip 00007f00dc5f1f07 sp 00007f00c5642e00 error ...
Samuel Just
01:33 PM Bug #9941 (Rejected): rados command line crashes when trying to copy pool snapshot
The correct answer here will be to deprecate this command. We are talking about a more sophisticated import/export t... Samuel Just
01:32 PM Bug #9971 (Rejected): OSD crashes again after restarting due to op thread time out at writing pg ...
Sounds like the disk is too slow for the timeouts, you'll have to increase them. Samuel Just
01:27 PM Bug #10126 (Rejected): "Exception: timed out waiting for admin_socket" in upgrade:giant-x-next-di...
I suspect this is slowness in VM machines. There are no core files and nothing I could see of interest in osd.10 log... David Zafman
01:25 PM Bug #10008 (Resolved): "obsolete rollback obj" error in upgrade:firefly-x-giant-distro-basic-vps run
Samuel Just
01:21 PM Bug #10013 (Rejected): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
upgrading the libraries -> crash Samuel Just
01:20 PM Bug #10067: ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
client.admin.* is normal. Crash probably is not. Samuel Just
01:20 PM Bug #10069 (Rejected): SyncEntryTimeout::finish() timeout
probably a slow vm Samuel Just
01:19 PM rgw Bug #10102: sync agent: does not handle gracefully transient errors
Updated the description, and the RGW is not returning a 400 but a 500. The agent should probably get updated to under... Alfredo Deza
07:12 AM rgw Bug #10102 (In Progress): sync agent: does not handle gracefully transient errors
Alfredo Deza
01:19 PM Bug #10085 (Rejected): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
whatever the platform is, you'll have to build your own packages, I guess. Samuel Just
01:14 PM Bug #10117: OSD crashes if xattr "_" is absent for the file when doing backfill scanning (Replica...
This should probably be a feature request for the backlog. We need a test reproducing it and some code to tolerate i... Samuel Just
01:07 PM Bug #10129 (Pending Backport): Bad locking in the trunc method of libradosstriper
Samuel Just
03:03 AM Bug #10129: Bad locking in the trunc method of libradosstriper
giant backport https://github.com/ceph/ceph/pull/2954 Loïc Dachary
02:49 AM Bug #10129: Bad locking in the trunc method of libradosstriper
https://github.com/ceph/ceph/pull/2951 testing Loïc Dachary
02:44 AM Bug #10129: Bad locking in the trunc method of libradosstriper
Loïc Dachary
02:22 AM Bug #10129 (Resolved): Bad locking in the trunc method of libradosstriper
A catch badly positioned makes the locking void and can lead to race conditions. Sebastien Ponce
01:06 PM Bug #9970 (Resolved): document erasure coded pool simple operations
Loïc Dachary
01:00 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
David Zafman
12:59 PM Feature #10064 (Resolved): add ceph_objectstore_tool tests to make check
Loïc Dachary
10:25 AM Bug #10128 (Pending Backport): ceph_objectstore_tool --op export to stdout broken
David Zafman
01:44 AM Bug #10128 (Resolved): ceph_objectstore_tool --op export to stdout broken
https://github.com/ceph/ceph/pull/2950 Loïc Dachary
09:07 AM rbd Bug #10123: "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
LibRBD.ListChildren was the last logged test Josh Durgin
07:53 AM devops Feature #10046: run make check on every pull request
https://github.com/ceph/ceph/pull/2956 Loïc Dachary
06:37 AM devops Bug #8896 (Rejected): missing i386 packages for Trusty
we no longer build i386 and I don't think there are plans to add them back.
Alfredo Deza
01:49 AM Feature #9943: osd: mark pg and use replica on EIO from client read
Submit pull request.
https://github.com/ceph/ceph/pull/2952
Wei Luo

11/17/2014

11:04 PM Bug #10128: ceph_objectstore_tool --op export to stdout broken
It would be nice to fix the unit test to use all variants of export and import. David Zafman
10:59 PM Bug #10128 (Resolved): ceph_objectstore_tool --op export to stdout broken

The change a2bd2aa7 broke --op export to stdout. It is writing text using out.
I want to backport this fix to g...
David Zafman
07:45 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Yep, if free you can paste crash logs with debug_keyvaluestore=20/20 Haomai Wang
02:51 PM Bug #9978: keyvaluestore: void ECBackend::handle_sub_read
Do you need any additional "debug_keyvaluestore=20/20" logs? It's been another week... Is there any progress? Any hop... Dmitry Smirnov
07:26 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
I can see four outstanding read requests in the last set of logs that you provided. Any chance you can re-run the sa... Jason Dillaman
02:26 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Using krbd instead of librbd with qemu doesn't hang, however, in the guest with dd, the total sequential performance ... Brad House
11:05 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
blktrace and qemu log attached as requested. I could not gracefully kill blktrace as the vm hardlocked so hopefully ... Brad House
10:33 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
None of the Ceph threads in the provided backtraces appeared to be deadlocked. It's possible a IO completion is bein... Jason Dillaman
09:40 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
logs from 3 runs back-to-back, forcibly killing the vm and restarting it between each attempt Brad House
09:28 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Brad, it would be helpful to see a few back-to-back GDB backtraces. In the full backtrace above, all blocked threads... Jason Dillaman
09:22 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
CPU usage is 0 when the lock occurs, so I don't think it is due to excess cpu usage.
I can definitely try those ...
Brad House
08:35 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
alexandre derumier wrote:
> Hi,
>
> >>kernel:BUG: soft lockup - CPU#0 stuck for 23s!
>
> by default share 1thr...
alexandre derumier
08:31 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Hi,
>>kernel:BUG: soft lockup - CPU#0 stuck for 23s!
by default share 1thread for many things (clock,io access,...
alexandre derumier
08:28 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
During lockup:... Brad House
07:10 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
I should also mention I am brad_mssw in the #ceph IRC channel on oftc if there are any suggestions or things to try. Brad House
05:47 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
Realized I was missing the librados debug symbols, here it is again, and also backtraced all threads:... Brad House
05:36 AM rbd Bug #10116: Ceph vm guest disk lockup when using fio
What is more interesting to me is if I break into it with GDB when it is hung, then tell it to continue, I get notifi... Brad House
05:55 PM devops Bug #8896: missing i386 packages for Trusty
Would that be possible to have this fixed soon? I'm running into the problem since I can't upgrade 1 of my 3 servers ... Jean-Sébastien Frerot
03:43 PM Bug #10096 (Resolved): ceph-disk prepare fails to unmount temp file successfully
Sage Weil
02:58 PM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
Oddly, I'm able to reproduce it easily on v0.67.11, but not wip-9113-9487-dumpling (496e561d81f2dd1bf92d588fc3afc2431... Samuel Just
02:49 PM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
This test cluster is currently running 0.67.11-4-g496e561, mons and osds.
On our prod cluster we still run ceph-0....
Dan van der Ster
10:53 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
All other osds are running that branch, right? Also, which sha1 was it which you thought was working (the branches h... Samuel Just
03:17 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
Well the PG isn't empty -- I've been writing a bunch of data to it using rados bench. Basically, I'm having trouble g... Dan van der Ster
02:55 PM Bug #10126 (Rejected): "Exception: timed out waiting for admin_socket" in upgrade:giant-x-next-di...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-17_08:56:42-upgrade:giant-x-next-distro-basic-vps/... Yuri Weinstein
02:00 PM Bug #10125 (Resolved): radosgw is being started as root not apache with systemd
On RHEL 7 when radosgw is started with systemd it runs as root not apache which causes problems with the s3gw.fcgi is... Sheldon Mustard
12:57 PM Bug #10124: monitor recieves bus error signal
if reproducible
<joao> 'mon_debug_dump_transactions = true' and 'mon_debug_dump_location = /path'
Noah Watkins
12:37 PM Bug #10124: monitor recieves bus error signal
Oh, it looks like this has been reported by Joao before to leveldb list: https://groups.google.com/forum/#!topic/leve... Noah Watkins
12:01 PM Bug #10124 (Rejected): monitor recieves bus error signal
This happend in the latest giant verison. Bus error seems like something wrong with hardware, but the issue suspiciou... Noah Watkins
10:15 AM Bug #10119: 0.88 EC+ KV OSDs crashing
Hmm, it's strange because I already fixed this bug previously. Maybe it's another?
Could you run crashed OSD agai...
Haomai Wang
06:53 AM Bug #10119 (Resolved): 0.88 EC+ KV OSDs crashing
Hi,
I am further testing the EC+ KV setup, and the OSDs were crashing again, so I updated ticket #9727.
But after ...
Kenneth Waegeman
10:02 AM Bug #10093 (Resolved): ceph-monstore-tool: FAILED assert(!is_open)
Joao Eduardo Luis
09:49 AM rbd Bug #10123 (Resolved): "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:48 AM rbd Bug #10122 (Resolved): "LibRBD.TestClone" FAILED in upgrade:dumpling-x-firefly-distro-basic-vps run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:39 AM rgw Bug #10121 (Duplicate): "test.functional.tests.TestAccountUTF8" error in upgrade:dumpling-x-firef...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-16_19:13:03-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:02 AM Bug #10063 (Pending Backport): ceph_objectstore_tool does not support getting attributes for eras...
Loïc Dachary
08:05 AM Bug #9913 (Pending Backport): mon: audit log entires for forwarded requests lack info
Sage Weil
07:38 AM Bug #9913 (Fix Under Review): mon: audit log entires for forwarded requests lack info
https://github.com/ceph/ceph/pull/2944 Joao Eduardo Luis
08:02 AM devops Bug #10120 (New): "Assertion: os/FileStore.cc" in upgrade:firefly-x-next-distro-basic-multi run
Sandon, that was mira034:... Yuri Weinstein
07:53 AM devops Bug #10120 (Rejected): "Assertion: os/FileStore.cc" in upgrade:firefly-x-next-distro-basic-multi run
that fail_eio assert means we got EIO back from the fs, whcih means there is a bad disk Sage Weil
07:34 AM devops Bug #10120 (Rejected): "Assertion: os/FileStore.cc" in upgrade:firefly-x-next-distro-basic-multi run
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-15_17:13:01-upgrade:firefly-x-next-distro-basic-mu... Yuri Weinstein
07:52 AM rgw Bug #10103: swift tests failing
Also in runs:
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-14_02:35:01-smoke-master-distro-basic-multi/
...
Yuri Weinstein
02:40 AM Bug #10118 (Can't reproduce): messenger drops messages between osds
Log snippets before the daemon crash:... Guang Yang
01:53 AM Bug #9285: osd: promoted object can get evicted before promotion completes
We probably have this issue on our ceph cluster (0.80.7 on commodity PC hardware + 10G ethernet) and that this is blo... Laurent GUERBY

11/16/2014

09:41 PM Bug #10117 (Won't Fix): OSD crashes if xattr "_" is absent for the file when doing backfill scann...
We observed a OSD crash pattern which is due to xattr "_" is absent for the file (on filesystem) which result in an a... Guang Yang
06:20 PM rbd Bug #10116: Ceph vm guest disk lockup when using fio
I wonder if this isn't the issue from #9854.
fio gets through writing the test files, and the lock occurs during t...
Brad House
02:31 PM rbd Bug #10116 (Closed): Ceph vm guest disk lockup when using fio
When running a disk benchmark within a guest, I'm getting a disk lockup that doesn't ever appear to resolve itself. ... Brad House
02:09 AM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
Samuel Just wrote:
> This is almost certainly unrelated to those two bugs. This is a specific edge case in divergen...
Dmitry Smirnov

11/14/2014

10:46 PM Bug #10115: mon not running. osd is dead
my ceoh version is 0.80.1. i install them on ubuntu 12.04.4
uname -a : Linux controller 3.11.0-26-generic #45~preci...
? ??
10:22 PM Bug #10115: mon not running. osd is dead
this is the log file on one of my ceph node. ? ??
10:10 PM Bug #10115 (Can't reproduce): mon not running. osd is dead
my ceph did't config the cephx. i sloved one problem before as this issue said:http://tracker.ceph.com/issues/8851.
...
? ??
06:05 PM Bug #10114 (Fix Under Review): assembly files need annotation to assert that stack should not be ...
seeming workaround in wip-execstack
Dan Mick
05:58 PM Bug #10114: assembly files need annotation to assert that stack should not be executable

References:
https://bugzilla.redhat.com/show_bug.cgi?id=1118504 the original bug that noticed the problem on Fe...
Dan Mick
05:30 PM Bug #10114 (Resolved): assembly files need annotation to assert that stack should not be executable
Dan Mick
05:10 PM Bug #10113: --log-to-stderr with -f/-d sends a lot of things to logfile
on a vstart cluster with 3 osds, if I stop osd.2 and restart like:
./ceph-osd -i 2 -c ./ceph.conf --log-to-stderr ...
Dan Mick
05:10 PM Bug #10113 (Duplicate): --log-to-stderr with -f/-d sends a lot of things to logfile
Dan Mick
03:45 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
Samuel Just
03:12 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
This is almost certainly unrelated to those two bugs. This is a specific edge case in divergent write recovery. Samuel Just
11:43 AM devops Cleanup #7722 (Resolved): Make /admin/build-doc distro independent
John Wilkins
11:41 AM devops Cleanup #7722: Make /admin/build-doc distro independent
Updated the procedure doc with all dependencies. John Wilkins
11:43 AM Bug #9788 (New): "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" is...
Logs are in http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/... Yuri Weinstein
10:22 AM Cleanup #10110 (New): librados: mark old objects_begin interface deprecated
There is some minor refactoring needed since the new methods call the old ones when ns == "". The fix is probably to... Sage Weil
10:18 AM devops Tasks #8366: Update ceph.com/docs to default to the latest major release (0.80)
Can we update it to the latest major release with the backports--e.g., v0.80.7? I finally have someone to help with t... John Wilkins
10:12 AM Bug #9487: dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not ...
I think that's an annoying special case for snaps purged on an empty pg. Both the old primary which did the trim and... Samuel Just
08:09 AM Bug #10107: Coredump in upgrade:giant-x-next-distro-basic-multi run
... Sage Weil
07:40 AM Bug #10107 (Duplicate): Coredump in upgrade:giant-x-next-distro-basic-multi run
(Maybe related to #8733)
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-13_17:04:11-upgrade:gi...
Yuri Weinstein
08:03 AM Bug #10109 (Duplicate): "LibRadosTwoPoolsECPP.PromoteSnap" test failed in upgrade:dumpling-firefl...
3 tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:15:02-upgrade:dumpling-firefly-x:p... Yuri Weinstein
07:55 AM rgw Bug #10108 (Duplicate): s3tests fail in upgrade:dumpling-firefly-x:parallel-next-distro-basic-mul...
All tests failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-13_17:10:02-upgrade:dumpling-firefly-x... Yuri Weinstein
07:47 AM Bug #10105: crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
the upgrade from 0.80.1 to 0.80.7 case was a bad disk. Sage Weil
07:32 AM Bug #9727: 0.86 EC+ KV OSDs crashing
Hi,
I tried this again on the new 0.88 release.
After about 30 minutes of testing, the EC-KV OSDs started crashin...
Kenneth Waegeman
04:51 AM Messengers Feature #10029: Retry binding on IPv6 address if not available
I started playing with this a bit (no commits yet), I simply loop in SimpleMessenger's Accepter.cc and retry to bind ... Wido den Hollander
03:26 AM Feature #9979 (In Progress): osd: cache: proxy reads (instead of redirect)
https://github.com/ceph/ceph/pull/2927 Loïc Dachary
02:17 AM rgw Bug #10106 (Resolved): rgw acl response should start with <?xml version="1.0" ?>
I encountered some surprising behaviour when playing with radosgw and s3cmd.
You can probably make a convincing case...
Jon Kåre Hellan
02:10 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
Loïc Dachary

11/13/2014

10:32 PM Bug #10052 (Fix Under Review): LibRadosTwoPools[EC]PP.PromoteSnap failure
https://github.com/ceph/ceph/pull/2926 Sage Weil
10:19 PM Bug #10052 (In Progress): LibRadosTwoPools[EC]PP.PromoteSnap failure
// read baz
{
bufferlist bl;
ASSERT_EQ(-ENOENT, ioctx.read("baz", bl, 1, 0));
}
I think this usu...
Sage Weil
05:44 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/sage-2014-11-12_13:30:37-smoke-wip-warn-max-pg-distro-basic-multi/598501 Sage Weil
08:49 PM Bug #10105 (Can't reproduce): crash in PG::peek_map_epoch on upgrade from 0.80.4 to 0.80.7
... Sage Weil
05:48 PM Bug #10104 (Resolved): rados.py: wait_for_* don't wait; should have poll, wait, and wait+cb versions
Completion.wait_for_{safe, complete} are using the poll functions "is_{safe,complete}"; the comments indicate that's ... Dan Mick
05:47 PM rgw Bug #10103 (Resolved): swift tests failing
ubuntu@teuthology:/a/dzafman-2014-11-13_10:42:58-rgw-wip-10082-testing-basic-multi$ teuthology-ls . | grep FAIL
5996...
Sage Weil
05:02 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
Any progress? Dmitry Smirnov
04:36 PM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
Sage Weil
04:28 PM Feature #10064 (Fix Under Review): add ceph_objectstore_tool tests to make check
https://github.com/ceph/ceph/pull/2915 Loïc Dachary
04:28 PM Bug #10063 (Fix Under Review): ceph_objectstore_tool does not support getting attributes for eras...
https://github.com/ceph/ceph/pull/2915 Loïc Dachary
03:48 PM rgw Bug #10102 (Resolved): sync agent: does not handle gracefully transient errors
on a copy operation, rgw sent back 400 and the sync agent got stuck in the following loop:... Yehuda Sadeh
12:58 PM rgw Bug #9587 (Pending Backport): ceph-radosgw sysvinit script on EL6 cannot set ulimit
Loïc Dachary
12:25 PM rgw Bug #10099 (Duplicate): radosgw-agent - error geting op state: list index out of range
radosgw-agent logs the following, and objects are not synced to the secondary gateway.
INFO:urllib3.connectionpool...
Brian Andrus
12:25 PM Bug #10096: ceph-disk prepare fails to unmount temp file successfully
Notes:
- Issuing a short delay before 'umount' fixes the issue - this is a terrible workaround
- Issuing 'sync' b...
Blaine Gardner
07:52 AM Bug #10096 (Resolved): ceph-disk prepare fails to unmount temp file successfully
I have been testing on a virtual machine for ease of testing, and 'ceph-disk prepare' kept forwarding an error from '... Blaine Gardner
11:07 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
Sage Weil
11:02 AM Bug #10095 (Fix Under Review): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
https://github.com/ceph/ceph/pull/2920 Sage Weil
07:37 AM Bug #10095 (Resolved): (crush_bucket_adjust_item_weight()+0) [0x7d1540] crash
ubuntu@teuthology:/a/samuelj-2014-11-11_22:08:30-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/597458
...
Samuel Just
10:36 AM Bug #9835 (Resolved): osd: bug in misdirected op checks (firefly)
Sage Weil
10:25 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
Haomai Wang
09:49 AM Feature #10098 (Resolved): wanted: command to clear 'incomplete' PGs
Hello,
Please create a command that would clear 'incomplete' PGs.
Perhaps ceph pg force_create_pg could be extend...
c sights
08:32 AM rbd Bug #9854 (Pending Backport): librbd: reads contending for cache space can cause livelock
Jason Dillaman
08:28 AM Bug #10097 (Resolved): failed: mon_thrash
debian 7.0
logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-12_17:15:01-upgrade:giant-giant-dist...
Yuri Weinstein
07:17 AM Support #10024: Cluster unreachable after restart
Hi,
I've missed anything?
Did I do something wrong?
Because I didn't get any answer after more than 1 week.
Thank...
Luca Mazzaferro
06:59 AM Cleanup #10094 (New): Create new git repo for json_spirit
json spirt is currently part of the code tree of ceph, but it's external code. There was also no update within a long... Danny Al-Gaaf
06:58 AM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
greg is right, these time out semi-regularly. increased the timeout on master, giant, firefly. Sage Weil
06:38 AM Bug #10093 (Fix Under Review): ceph-monstore-tool: FAILED assert(!is_open)
https://github.com/ceph/ceph/pull/2914 Loïc Dachary
06:35 AM Bug #10093 (Resolved): ceph-monstore-tool: FAILED assert(!is_open)
Using a vstart cluster + stoph.sh:... Loïc Dachary
04:17 AM Bug #9916: osd: crash in check_ops_in_flight
Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMP...
Guang Yang
12:14 AM Feature #9943 (In Progress): osd: mark pg and use replica on EIO from client read
Current OSD check PG map and get only k items and send sub-read request. So if one read failed. It assert and core du... Wei Luo

11/12/2014

09:21 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
How do we tell the difference between (2) and (3)? In both cases, ceph_objectstore_tool will see there is no SHARDS ... Sage Weil
09:06 PM Bug #10077: ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to

I see from the code that there are a couple of scenarios that need to be handled or at least documented:
1. Expo...
David Zafman
08:59 PM CephFS Bug #10092 (Resolved): multiple_rsync.sh + ceph-fuse timing out on firefly
teuthology-2014-11-11_23:04:01-fs-firefly-distro-basic-multi/598145
teuthology-2014-11-11_23:04:01-fs-firefly-distro...
Sage Weil
08:25 PM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
Wei is working on this along with http://tracker.ceph.com/issues/9943 . Guang Yang
06:52 PM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
Greg Farnum wrote:
> What version are you running? This looks like one of a couple of bugs that have been resolved i...
Wenjun Huang
10:47 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
What version are you running? This looks like one of a couple of bugs that have been resolved in the latest point rel... Greg Farnum
04:26 AM Messengers Bug #10080: Pipe::connect() cause osd crash when osd reconnect to its peer
And the peer OSD's log is as below:... Wenjun Huang
03:40 AM Messengers Bug #10080 (Resolved): Pipe::connect() cause osd crash when osd reconnect to its peer
When our cluster load is heavy, the osd sometimes crashes. The critical log is as below:
-278> 2014-08-20 11:04:28...
Wenjun Huang
05:15 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Jason Dillaman
05:13 PM rbd Bug #9771: Segmentation fault after upgrade v0.80.5 -> v0.80.6
Commit b75f85a2 added new elements to the _Thread_ class, breaking ABI. In this (and several other upgrade tests fro... Jason Dillaman
05:08 PM Feature #9957: librados: add fadvise op
See the pull request: https://github.com/ceph/ceph/pull/2905 jianpeng ma
04:09 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken
Sage Weil
03:27 PM rgw Bug #10090 (Fix Under Review): ceph_objectstore_tool import broken
David Zafman
02:15 PM rgw Bug #10090 (Resolved): ceph_objectstore_tool import broken

The tool can't import because it finds that the recently removed collection still exists.
Is may be because fini...
David Zafman
12:37 PM rbd Bug #10002 (Resolved): Errors during import_export test in upgrade:firefly-x-next-distro-basic-vp...
commit:e94d3c11edb9c9cbcf108463fdff8404df79be33 Josh Durgin
11:38 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
Sage Weil
10:09 AM Bug #10083: cephtool/test.sh: osd create w/o uuid test is noisy
Verified to work with... Loïc Dachary
09:53 AM Bug #10083 (Fix Under Review): cephtool/test.sh: osd create w/o uuid test is noisy
https://github.com/ceph/ceph/pull/2902 Loïc Dachary
09:29 AM Bug #10083 (Resolved): cephtool/test.sh: osd create w/o uuid test is noisy
... Sage Weil
10:56 AM Bug #10085 (Resolved): dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
After upgrade to glibc 2.20, "ceph" & "rbd" commands exiting with "Illegal instruction" exit message and !=0 exit cod... Denis kaganovich
10:00 AM Feature #9598 (Pending Backport): re-enable Objecter fast dispatch
sage-2014-11-11_08:26:01-rados-wip-sage-testing-distro-basic-multi Sage Weil
08:42 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issue http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:03:01-upgrade:firefly:older-firefly-distro-ba... Yuri Weinstein
08:29 AM rgw Bug #10082 (Resolved): Segmentation fault in upgrade:dumpling-firefly-x:parallel-next-distro-basi...
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-11_17:10:01-upgrade:dumpling-firefly-x:parallel-ne... Yuri Weinstein
06:53 AM rbd Feature #2467 (Resolved): qemu: implement bdrv_invalidate_cache
Merged upstream: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=be21788495fdc8251b04dd4bfd0cdce95c49d75b Jason Dillaman
01:23 AM Messengers Feature #10079 (Resolved): AsyncMessenger: Support select for other OS
AsyncMessenger already support epoll and kqueue, but for other legacy OS or windows, we need to use select for the wo... Haomai Wang

11/11/2014

06:17 PM rbd Bug #10002 (Fix Under Review): Errors during import_export test in upgrade:firefly-x-next-distro-...
https://github.com/ceph/ceph/pull/2899 Josh Durgin
08:23 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:15:02-upgrade:dumpling-firefly-x:parallel-... Yuri Weinstein
08:17 AM rbd Bug #10002: Errors during import_export test in upgrade:firefly-x-next-distro-basic-vps run
Seems similar issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_17:05:02-upgrade:firefly:singleton-f... Yuri Weinstein
05:20 PM Bug #10052: LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/sage-2014-11-11_14:57:42-smoke-wip-warn-max-pg-distro-basic-multi/596722 Sage Weil
02:59 PM CephFS Bug #8090: multimds: mds crash in check_rstats
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-10_23:18:02-multimds-giant-testing-basic-multi/595393 Sage Weil
02:54 PM Bug #10077 (Resolved): ceph_objectstore_tool: sets SHARDS feature on export it doesn't need to
user on 0.87 exported a replicated pg and couldn't import it because the shards feature wasn't set on the osd.
w...
Sage Weil
02:14 PM rgw Feature #9933: rgw: implement S3 RR (reduced redundancy) API
Hmm, was looking just now at the S3 api, and it seems that you can set RR per object, not per bucket. This complicate... Yehuda Sadeh
11:01 AM Bug #10069 (Rejected): SyncEntryTimeout::finish() timeout

The ceph_objectstore_tool aborted in FileStore code.
On my wip-9780 branch which is rebased on current master ru...
David Zafman
10:31 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Replying to my own post for posterity:
I figured out why those Git hashes don't align. It's bug in log.cgi. Appare...
Ken Dreyer
08:50 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
Looks fixed Yuri Weinstein
09:53 AM Bug #10067 (Can't reproduce): ::posix_memalign abort ceph::buffer::create_page_aligned in 0.80.7
Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_19:13:02-upgrade:dumpling-x-firefly-distro-basi... Yuri Weinstein
09:01 AM rgw Feature #9013 (Resolved): rgw: set civetweb as a default frontend
Sage Weil
08:48 AM rgw Bug #10066: rgw: failed md5sum on s3tests-test-readwrite
Same problem in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-10_18:11:17-upgrade:firefly:newer-firefly-dist... Yuri Weinstein
07:22 AM rgw Bug #10066 (Resolved): rgw: failed md5sum on s3tests-test-readwrite
... Sage Weil
08:16 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issues in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_17:18:01-upgrade:firefly-x-next-distro-b... Yuri Weinstein
08:02 AM Bug #10016 (Resolved): "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
tests passed. Yuri Weinstein
07:25 AM rgw Bug #9917 (Won't Fix): RADOSGW: Not able to create Swift objects with erasure coded pool
Sage Weil
03:51 AM rgw Bug #9917: RADOSGW: Not able to create Swift objects with erasure coded pool
OK,I was not aware of this, seems sane behaviour to me. pushpesh sharma
07:21 AM rgw Bug #10062: s3-test failures using keystone authentication
Looks like for a few of them eg. the date ones occur as it looks like radosgw doesn't consider checking the date head... Abhishek Lekshmanan
05:02 AM rgw Bug #10062 (Resolved): s3-test failures using keystone authentication
* "rgw: check for timestamp for s3 keystone auth":https://github.com/ceph/ceph/pull/2993
* "wip: rgw: check keystone...
Abhishek Lekshmanan
07:20 AM Bug #10065 (Duplicate): hung ec-lost-unfound.yaml, failed of osd.{0,2,3}
this pattern keeps popping up:... Sage Weil
07:16 AM Bug #7995: osd shutdown: ./common/shared_cache.hpp: 93: FAILED assert(weak_refs.empty())
ubuntu@teuthology:/a/teuthology-2014-11-10_02:32:01-rados-giant-distro-basic-multi/594038 Sage Weil
06:40 AM Feature #10064 (Resolved): add ceph_objectstore_tool tests to make check
The "ceph_objectstore_tool.py":https://github.com/ceph/ceph/blob/giant/src/test/ceph_objectstore_tool.py tests can be... Loïc Dachary
06:35 AM Bug #10063: ceph_objectstore_tool does not support getting attributes for erasure coded objects
... Loïc Dachary
06:33 AM Bug #10063 (Resolved): ceph_objectstore_tool does not support getting attributes for erasure code...
... Loïc Dachary
04:37 AM Bug #9554: "FAILED assert(0 == "hit suicide timeout")" in upgrade:firefly-firefly-testing-basic-v...
Yes it reproduced in giant too. Sahana Lokeshappa

11/10/2014

11:46 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Just wanted to add that lack of timeout causes havoc all over the place... Autofs, backup scrips mounting CephFS on d... Dmitry Smirnov
04:05 PM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Although it terminates on "Ctrl+C" a timeout would be _very_ useful because it would prevent system from hanging on b... Dmitry Smirnov
11:11 AM CephFS Bug #10041: ceph-fuse: never exit when no MDS server is available
Was it blocking in the foreground? Did SIGKILL (ie, control-C) work on it?
We can add a configurable timeout but I...
Greg Farnum
01:07 AM CephFS Bug #10041 (Resolved): ceph-fuse: never exit when no MDS server is available
I'm attempting to mount CephFS using Fuse client (i.e. _ceph-fuse_) which do not exit if all MDS servers are down (I ... Dmitry Smirnov
10:57 PM CephFS Bug #10061 (New): uclient: MDS: output cap data in messages
MClientCaps messages don't dump the caps they're updating, and generally neither does anything else. We need to optio... Greg Farnum
10:55 PM CephFS Feature #10060 (New): uclient: warn about stuck cap flushes
It can be hard to diagnose issues that involve cap state. To help with that, the client should keep track of its cap ... Greg Farnum
10:40 PM CephFS Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr
In next branch as commit:65c33503c83ff8d88781c5c3ae81d88d84c8b3e4 and in giant as commit:fc5354dec55248724f8f6b795e3a... Greg Farnum
09:36 PM CephFS Bug #9341: MDS: very slow rejoin
Thanks. Dmitry Smirnov
09:27 PM CephFS Bug #9341 (Resolved): MDS: very slow rejoin
This is backported to giant as of commit:97e423f52155e2902bf265bac0b1b9ed137f8aa0. The test for it also got backporte... Greg Farnum
09:26 PM CephFS Bug #9800 (Resolved): client-limits test is not passing
Backported in commit:387efc5fe1fb148ec135a6d8585a3b8f8d97dbf8 Greg Farnum
06:15 PM Bug #10042: OSD crash doing object recovery with EC pool
I'm not sure either, investigating. Loïc Dachary
05:15 PM Bug #10042: OSD crash doing object recovery with EC pool
Hi Loic,
I am still a little bit confused in terms of what happened behind the crash (and what is the relation betwe...
Guang Yang
05:30 AM Bug #10042: OSD crash doing object recovery with EC pool
Loïc Dachary
03:49 AM Bug #10042 (Duplicate): OSD crash doing object recovery with EC pool
We observed one OSD crash with the following assertion failure:... Guang Yang
06:10 PM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
Sage Weil
06:45 AM rbd Bug #10045 (Resolved): common/Cond.h: 52: FAILED assert(mutex.is_locked()) in close_image()
... Sage Weil
05:44 PM Bug #9921: msgr/osd/pg dead lock giant
Giving Sage this ticket since he took the PR. Greg Farnum
05:35 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
testing this PR https://github.com/ceph/ceph-qa-suite/pull/233
http://pulpito.front.sepia.ceph.com/teuthology-2014...
Yuri Weinstein
03:06 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
- install.upgrade:
all:
branch: giant
is upgrading all roles
Sage Weil
02:29 PM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
Still failed - http://pulpito.front.sepia.ceph.com/teuthology-2014-11-10_10:56:16-upgrade:giant-giant-distro-basic-mu... Yuri Weinstein
10:48 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
Moved client.0 to a separate node, testing now
https://github.com/ceph/ceph-qa-suite/pull/232
Yuri Weinstein
09:57 AM Bug #10016: "Segmentation fault" in upgrade:giant-giant-distro-basic-multi run
... Sage Weil
05:20 PM CephFS Bug #10025 (Resolved): Journal undump causes MDS to crash when start pos is not on object boundary
Merged into next in commit:69be8e9b30c18e47c17ff7dafc4ac8fbe00d48e7, and the appropriate backport bits were merged la... Greg Farnum
04:34 PM rgw Feature #9359 (Resolved): rgw: Export user stats in get-user-info Adminops API
Yehuda Sadeh
04:21 PM rgw Bug #9907 (Pending Backport): radosgw-admin: can't disable max_size quota
Sage Weil
04:13 PM rgw Feature #8911 (Pending Backport): RGW doesn't return 'x-timestamp' in header which is used by 'Vi...
Sage Weil
04:09 PM Bug #10059: osd/ECBackend.cc: 876: FAILED assert(0)
This bug makes me cry as it is the reason for my cluster to be _completely down_ for over 10 days now... Duplicate ad... Dmitry Smirnov
03:20 PM Bug #10059 (Resolved): osd/ECBackend.cc: 876: FAILED assert(0)
-1> 2014-11-09 14:13:01.334410 7f8b93c8b700 10 filestore(/var/lib/ceph/osd/ceph-3) FileStore::read(1.1ds0_head/78... Samuel Just
03:59 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
When I look at the log for http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02... Ken Dreyer
03:29 PM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Disk space looks ok to me:... Ken Dreyer
10:28 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
From http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel7-amd64-basic/log.cgi?log=6977d02f0d31c453cdf554a8f1796... Ken Dreyer
10:03 AM devops Bug #10049: "Failed to fetch package" "rhel7_0-x86_64-basic"
Needs a link:
http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic...
Zack Cerza
09:12 AM devops Bug #10049 (Resolved): "Failed to fetch package" "rhel7_0-x86_64-basic"
Seems wide spread on next run using rhel 7
Run teuthology-2014-11-09_17:18:01-upgrade:firefly-x-next-distro-basic-...
Yuri Weinstein
03:40 PM Bug #10057 (In Progress): msgr: skipped message on peer reconnect
... Sage Weil
01:42 PM Bug #10057 (Can't reproduce): msgr: skipped message on peer reconnect
ubuntu@teuthology:/a/teuthology-2014-11-09_23:06:01-krbd-next-testing-basic-multi/593102... Sage Weil
03:36 PM Feature #9420: erasure-code: tools and archive to check for non regression of encoding
the backport is needed to generate the content of https://github.com/ceph/ceph-erasure-code-corpus/tree/master/v0.80.... Loïc Dachary
03:32 PM Feature #9420 (Pending Backport): erasure-code: tools and archive to check for non regression of ...
Loïc Dachary
02:57 PM Feature #9420 (Resolved): erasure-code: tools and archive to check for non regression of encoding
I don't think this needs to be backported. Samuel Just
03:06 PM Bug #10058 (Can't reproduce): next stuck in recovery, no progress
/a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi/591906
/a/sage-2014-11-09_07:49:57-rados-next-testing-bas...
Samuel Just
02:59 PM Bug #9986 (Pending Backport): objecter: map epoch skipping broken
Samuel Just
02:56 PM Feature #9262 (Resolved): Additional namespace issues
Samuel Just
02:55 PM Feature #9031 (Resolved): List RADOS namespaces and list all objects in all namespaces
Samuel Just
02:53 PM Bug #6756 (Pending Backport): journal full hang on startup
Samuel Just
02:51 PM Bug #9852 (Resolved): mon: monitor asserts on 'ceph mds add_data_pool X' if X is an ID that DNE
Samuel Just
02:49 PM Bug #9987 (Pending Backport): mon: min_last_epoch_complete tracking broken
Samuel Just
02:12 PM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
Sage Weil
11:18 AM Bug #10053 (In Progress): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
ubuntu@teuthology:/a/sage-2014-11-09_07:49:57-rados-next-testing-basic-multi$ teuthology-ls . | grep FAIL
591648 FAI...
Sage Weil
11:14 AM Bug #10053 (Resolved): ./ceph tell osd.0 injectargs --no-osd_debug_op_order failure
ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590242
...
Samuel Just
01:40 PM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
* how to use ceph_objectstore_tool https://github.com/ceph/ceph-qa-suite/blob/giant/tasks/ceph_objectstore_tool.py
*...
Loïc Dachary
06:20 AM Bug #10018: OSD assertion failure if the hinfo_key xattr is not there (corrupted?) during scrubbing
The tests should use the same as #9887 which requires https://github.com/ceph/ceph-qa-suite/compare/wip-dzaddscrub Loïc Dachary
01:27 PM Feature #10056 (New): Object metadata mismatch detection and handling
Possible things we may want to address:
- clone vs head snapshot metadata mismatches
- object metadata vs ondis...
Samuel Just
01:23 PM Feature #10055 (New): PG metadata corruption detection and handling
Possible problems we might want to handle:
- missing pg info
- missing pg epoch
- missing pg log
Correct ...
Samuel Just
01:21 PM Feature #10054 (New): OSD level metadata mismatch handling
Meta feature for detecting and handling OSD metadata.
Possible directions:
- full osdmap vs incremental mismatch?
Samuel Just
11:57 AM devops Feature #10046: run make check on every pull request
Removing myself and clarifying the scope. I would be happy to help with the implementation but I'm not equipped to ta... Loïc Dachary
07:48 AM devops Feature #10046 (Resolved): run make check on every pull request
And report back on the success / failure, with the logs attached for debugging. The suggested approach is to define a... Loïc Dachary
11:24 AM CephFS Bug #9997: test_client_pin case is failing
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:04:01-fs-next-testing-basic-multi/593068/ Greg Farnum
11:23 AM CephFS Bug #6613: samba is crashing in teuthology
Still happening: http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_23:14:01-samba-next-testing-basic-multi/59... Greg Farnum
11:13 AM Bug #10052 (Resolved): LibRadosTwoPools[EC]PP.PromoteSnap failure
ubuntu@teuthology:/a/samuelj-2014-11-07_21:48:36-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/590439
...
Samuel Just
09:53 AM rbd Bug #10026 (Duplicate): "Assertion: common/Cond.h" in rbd-master-testing-basic-multi run
#10045 Sage Weil
09:52 AM Bug #10033 (Won't Fix): ceph pg <pg> query hangs when OSD down, EC PG
In this case teh osd seems to be up (the pg state isn't 'stale'), so this is expected behavior (the osd hasn't respon... Sage Weil
09:51 AM rbd Bug #10051 (Won't Fix): kernel-mounted RBD image may block shutdown
init-rbdmap fails to unmap an RBD image when the latter is still in use.
As consequence system shutdown hangs dead w...
Dmitry Smirnov
09:46 AM rgw Bug #9899 (Fix Under Review): Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-du...
Per Sage - removed mon_thrash tests from the rgw/ section, https://github.com/ceph/ceph-qa-suite/pull/230 Yuri Weinstein
09:30 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
this bug was fixed in 0.80.3 or 0.80.4. i think we need to make the 'older' tests skip the mon_thrash tests. Sage Weil
09:23 AM rgw Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic...
Same issue in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-09_10:00:02-upgrade:dumpling-dumpling-distro... Yuri Weinstein
09:19 AM devops Bug #10050 (Rejected): "Segmentation fault" (radosgw-admin) in upgrade:firefly:singleton-firefly-...
Logs rae in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_17:05:02-upgrade:firefly:singleton-firefly-dist... Yuri Weinstein
09:05 AM Bug #10013: "Segmentation fault" in upgrade:dumpling-x-firefly-distro-basic-vps run
Same issue in http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_19:13:01-upgrade:dumpling-x-firefly-distro-ba... Yuri Weinstein
08:43 AM Bug #9913: mon: audit log entires for forwarded requests lack info
session is with the monitor that forwarded the request. there's no auth handler for the session as it is a monitor. ... Joao Eduardo Luis
08:41 AM rbd Bug #10030 (Pending Backport): Crash when attempting to open non-existent parent image
Sage Weil
08:40 AM Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-...
Same issue in job http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-09_18:13:01-upgrade:firefly-x-giant-distro-b... Yuri Weinstein
08:24 AM Bug #9864 (Can't reproduce): osd doesn't report new stats for 3 hours when running test LibCephFS...
not enough info to tell why teh client test hung. let's see if it happens again! Sage Weil
08:08 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Looking into the osd logs show that the osds don't report new stats for the ~3 hours because no pgs are update in tha... Joao Eduardo Luis
07:47 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Joao Eduardo Luis
07:44 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
Not so weird after all.
Log shows that last log is created because we had some stats to report:...
Joao Eduardo Luis
07:30 AM Bug #9864: osd doesn't report new stats for 3 hours when running test LibCephFS.MulticlientSimple
this is not the monitor taking 2 hours to commit. The log snippets above refer to two different proposals: the first... Joao Eduardo Luis
06:08 AM Feature #10044 (New): ECUtil::HashInfoRef should have a NONE value
So that "ECBackend::get_hash_info":https://github.com/ceph/ceph/blob/giant/src/osd/ECBackend.cc#L1435 can return it i... Loïc Dachary
05:10 AM Bug #10040 (Rejected): install ceph packages broken for firefly
The problem here is that the machine needs to be properly cleaned up from newer Ceph packages.
It is always proble...
Alfredo Deza
04:13 AM Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size...
Hi Sam,
Any suggestion in terms of how to fix this issue?
One potential solution is to validate the digest for ea...
Guang Yang
 

Also available in: Atom