Project

General

Profile

Activity

From 02/03/2012 to 03/03/2012

03/03/2012

09:33 PM Bug #2128: filestore: check() fails during sync
could it be commit:75cbed61e94a7974e40230360c6781d85f47576d ? Sage Weil
09:11 PM Bug #2133: osd: recovery_complete
Sage Weil
02:18 PM Bug #2133 (Resolved): osd: recovery_complete
pull raced with clones, clone_subset changed, it got confused.... Sage Weil
09:10 PM Bug #2135: cephtool: osdc/Objecter.cc: 375: FAILED assert(initialized)
librados shutdown race Sage Weil
07:38 PM Bug #2135 (Resolved): cephtool: osdc/Objecter.cc: 375: FAILED assert(initialized)
... Sage Weil
03:16 PM CephFS Bug #1796: mds: exit cleanly on EBLACKLISTED
people hit this and it's confusing when ceph-mds crashes...
wip-1796
Sage Weil
02:38 PM Feature #2134 (Resolved): qa: smoke suite
pick out some regression tests that run reasonably quickly and have decent coverage. Sage Weil

03/02/2012

09:59 PM Bug #2132 (Resolved): FAILED assert(!missing.is_missing(soid))
Possibly a duplicate of Issue #1191 or Issue #339 (both closed with could not reproduce).
Prior to this assert th...
Matthew Roy
09:36 PM Linux kernel client Bug #2099 (Rejected): messenger: unexpected socket state (4)
OK, this is not a bug. I caused it by inserting this WARN_ON() message
in a case statement in ceph_state_change(). ...
Alex Elder
09:29 PM Linux kernel client Cleanup #2131 (New): ceph: xattr: use the generic kernel xattr code
The Linux kernel has a generic set of routines to support
extended attributes. When I posted some recent changes
t...
Alex Elder
09:28 PM Linux kernel client Cleanup #2130: ceph: xattr: complete cleanups following review
Forgot to assign it to myself Alex Elder
09:27 PM Linux kernel client Cleanup #2130 (Rejected): ceph: xattr: complete cleanups following review
As requested by Mark... I have a number of changes to make to
fs/ceph/xattr.c based on my review of that code last ...
Alex Elder
08:12 PM Linux kernel client Bug #2129 (New): ceph: xattr: call __build_xattrs() *before* cap check
While reviewing a change to the xattr code, Sage noticed that some
calls to __build_xattrs() were being made *after*...
Alex Elder
04:27 PM Bug #2128 (Rejected): filestore: check() fails during sync
... Sage Weil
03:08 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
ok, i have a theory what's going on. can you try the new wip-2116, and run with debug ms = 20?
thanks!
Sage Weil
10:07 AM Feature #2127 (New): Save kernel core dumps on all of our test machines
The claim is that there is a netdump module that will UDP-squirt kernel coredumps to a waiting server, which is proba... Anonymous
09:53 AM Bug #2126 (Duplicate): osd: recover_primary did nothing when num_missing==1
... Sage Weil
09:46 AM Bug #2118 (Resolved): osd: flawed commit_op_seq check on startup
Sage Weil
08:43 AM Feature #2125 (Resolved): osd: put large xattrs in leveldb
either when we fear the fs can't handle them, or unconditionally, or something.
Sage Weil
07:33 AM Feature #1422: libvirt: rbd storage pool
Made some more progress on this, code seems to be stable.
Working:
* Single and multiple monitors
* Authenticati...
Wido den Hollander

03/01/2012

10:00 PM Bug #2103: osd: lockdep error on watch_lock
must reenable this in qa suite when it's fixed! Sage Weil
05:18 PM Bug #2122 (Resolved): objecter: Asserts if authorization fails
Fixed by commit:cd313885783a5a69a554139b5b41d21a666c815b Josh Durgin
08:36 AM Bug #2122: objecter: Asserts if authorization fails
Ah, I had a patch to fix this in the wip-testrados branch. I'll rebase and merge that today. The new asserts in the o... Josh Durgin
06:45 AM Bug #2122 (Resolved): objecter: Asserts if authorization fails
While working on the libvirt RBD storage driver I noticed the following crash:... Wido den Hollander
01:46 PM Tasks #2123 (Closed): Ignore this task - I'm checking out the bug report process.
Ken Franklin
09:02 AM Tasks #2123: Ignore this task - I'm checking out the bug report process.
using "Update" option in tracker Ken Franklin
09:00 AM Tasks #2123 (Closed): Ignore this task - I'm checking out the bug report process.
just using the task ticket to walk through the issue lifecycle. Ken Franklin
11:45 AM Bug #2124 (Resolved): crash when malformed auth key is provided
We should guard all calls to decode_base64:... Yehuda Sadeh
09:52 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
Saw this a couple of times on a client in a small ceph cluster. It seems to be correlated with dd runs using various... Mark Nelson
08:11 AM Bug #2115 (Rejected): OSD failed to start: Operation not permitted
Sage Weil
02:13 AM Bug #2115: OSD failed to start: Operation not permitted
problem resolved. Thank you very much to your hint! I didn't ever think it is caused by communication.
I created a...
soft crack
02:48 AM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
I can almost always reproduce it.
I just upgraded my cluster to:
> ceph version 0.42.2-206-gd77c579 (commit:d77c5...
Wido den Hollander

02/29/2012

09:22 PM Bug #2022: osd: misdirectect request
... Sage Weil
09:16 PM Bug #2080: osd: scrub on disk size does not match object info size
hit this again, ... Sage Weil
02:57 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
i'm hoping wip-2116 fixes it... Sage Weil
02:31 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
Wido, are you able to reproduce this reliably? I have an idea what the problem is, but have never reproduced this. ... Sage Weil
02:17 PM Bug #2002: osd: racy push/pull for clones
reenabling this in my thrashing tests. if all goes well, i'll reenable in master under the assumption that sam's cle... Sage Weil
02:16 PM Bug #1977 (Can't reproduce): mon: ceph command hang
we can reopen if this ever pops up again Sage Weil
01:59 PM Feature #2111 (In Progress): msgr workloads
What we're looking for here are basic tests like connect, send message, kill connection, send another message; and ve... Greg Farnum
01:30 PM Messengers Bug #1747 (Resolved): msgr: osd connection originates from wrong port
commit:b1f264406f93af35600786f58e75908c393cf2ed Sage Weil
12:21 PM Messengers Bug #1747: msgr: osd connection originates from wrong port
wip-1747 Sage Weil
11:25 AM Messengers Bug #1747: msgr: osd connection originates from wrong port
just hit this again. osd.1:... Sage Weil
12:48 PM rgw Bug #2121 (Resolved): radosgw: reload command for init script
Sage Weil
09:48 AM rgw Bug #2121: radosgw: reload command for init script
Sage Weil
09:25 AM rgw Bug #2121 (Resolved): radosgw: reload command for init script
Sage Weil
12:48 PM Bug #1458 (Resolved): Run ceph suite with valgrind enabled
Sage Weil
11:13 AM Bug #1975: btrfs: EINVAL on snap create
see also this thread: http://marc.info/?t=132768583600004&r=1&w=2 Sage Weil
10:46 AM Bug #1975: btrfs: EINVAL on snap create
the EINVAL seems to have come from... Sage Weil
10:44 AM Bug #1975: btrfs: EINVAL on snap create
somehow we end up here in btrfs:... Sage Weil
10:39 AM Bug #1975: btrfs: EINVAL on snap create
quick brain dump:
- last time this reproduced i narrowed it down to a case where there were racing rmdirs with the...
Sage Weil
10:55 AM Bug #2115: OSD failed to start: Operation not permitted
it looks like you may be having trouble authenticating with the monitor. can you reproduce this with 'debug ms = 1'? ... Sage Weil
10:28 AM Bug #2031 (Can't reproduce): paxos: failed assert (begin->last_committed == last_committed)
Sage Weil
10:09 AM Messengers Bug #2086 (Resolved): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
merged! Sage Weil
10:06 AM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
Sage suggested I could just add a local dispatch to the shutdown or wait functions to test this properly...I did, and... Greg Farnum
09:18 AM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
Sage Weil
09:27 AM Bug #1873: crush_rule type is inconsistent
It's __s16 or int so that a negative value can mean undefined/not specified. I'm inclined to just leave this as is... Sage Weil
09:18 AM Bug #2119 (Resolved): osd: do_query to !up osd
Sage Weil

02/28/2012

06:39 PM Bug #2115: OSD failed to start: Operation not permitted
See attachment please soft crack
09:17 AM Bug #2115: OSD failed to start: Operation not permitted
Can you attach the actual log? I want to make sure there is no subtle difference in the output. Thanks! Sage Weil
01:40 AM Bug #2115: OSD failed to start: Operation not permitted
ceph version 0.42.2 (commit:732f3ec94e39d458230b7728b2a936d431e19322) soft crack
01:38 AM Bug #2115 (Rejected): OSD failed to start: Operation not permitted
I'm setting up a new ceph cluster on ubuntu 11.10 with kernel version 3.0.0-16-server x86_64. The osd server failed t... soft crack
05:57 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
To be clear, I didn't try and generate the actual failure condition that was causing an assert before — that should b... Greg Farnum
05:55 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
wip-2086 should fix this.
Ran a simple test:...
Greg Farnum
05:27 PM Messengers Bug #2086 (In Progress): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
Greg Farnum
04:51 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
Okay, looks like the local_pipe doesn't get its message queue cleared...I'm checking the others and looking at how it... Greg Farnum
04:55 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
Implementing #1956 would solve this issue, and would make the entire atomic scheme simpler.
Yehuda Sadeh
03:03 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
This was reported by a user through the ml. We should figure out with that user whether it's a real issue, or a red h... Yehuda Sadeh
02:51 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
Do we care? You can't do partial updates to objects IIRC, so many writers pretty much has to be wrong somehow or other. Greg Farnum
02:35 PM rgw Bug #2120 (Resolved): rgw: atomic write guard doesn't scale well
shen there is a large number of writers to the same object. Yehuda Sadeh
04:48 PM rgw Bug #2106 (Resolved): failed s3tests.functional.test_s3.test_100_continue
Machines were running wrong apache and fastcgi modules. Yehuda Sadeh
04:23 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
This may be a messenger issue, but it's not losing that initial message — notice how osd5 tries to send a ping back t... Greg Farnum
11:26 AM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
the other side of this conversation is... Sage Weil
11:20 AM Bug #2116 (In Progress): Repeated messages of "heartbeat_check: no heartbeat from"
looks like a msgr issue?... Sage Weil
07:35 AM Bug #2116 (Resolved): Repeated messages of "heartbeat_check: no heartbeat from"
As discussed on the ml I gathered some logs.
Today I upgraded my whole cluster to 0.42.2 from 0.41.
Due to the ...
Wido den Hollander
12:54 PM Bug #1789 (Resolved): mon: failed assert(paxosv == pg_map.version)
Pushed to master in commit:d10e1f46df8cc252f2f1d57cf5e577ea38eee1ae Greg Farnum
12:48 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
Okay, figured it out. Our current slurp code pulls in all the incrementals, then sends off a request for latest_stash... Greg Farnum
12:01 PM Bug #2119 (Resolved): osd: do_query to !up osd
... Sage Weil
11:09 AM Bug #2118: osd: flawed commit_op_seq check on startup
Sage Weil
10:08 AM Bug #2118 (Resolved): osd: flawed commit_op_seq check on startup
the check that current/commit_op_seq == newest snap is flawed because ceph-osd can write a new current/commit_op-seq ... Sage Weil
10:09 AM Bug #2104 (Won't Fix): teuthology: wait_for_clean doesn't wait for last_epoch_started to propagate
Sage Weil
10:09 AM Bug #2107 (Resolved): teuthology: lost_unfound fails pg state assert
Sage Weil
09:41 AM devops Feature #2117 (New): qa: gitbuilder that does ENCODE_DUMP
Sage Weil

02/27/2012

04:20 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
The guards for something like that shouldn't be too complicated to set up...actually, I thought they were at one poin... Greg Farnum
04:19 PM Bug #1789 (In Progress): mon: failed assert(paxosv == pg_map.version)
Iiiinteresting. This assert is the post-update check, after loading and running through all the incrementals. (Meanin... Greg Farnum
01:41 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
Shouldn't be related — this is a problem with a single monitor daemon and the other is a write problem that an MDS is... Greg Farnum
12:35 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
Core dump attached. Dumb thought: could this be related to http://tracker.newdream.net/issues/2110, they happened wit... Matthew Roy
10:14 AM Bug #1789: mon: failed assert(paxosv == pg_map.version)
Crash occurred on the third monitor when starting after being down for several hours shortly after cluster creation. ... Matthew Roy
02:07 PM CephFS Bug #2110 (Duplicate): osdc/Journaler.cc: 360: FAILED assert(r >= 0)
#1796 Sage Weil
01:40 PM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
can you attach ceph-mds too? or better yet, fire up gdb ceph-mds core and print out the value of r from that frame. ... Sage Weil
12:00 PM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
Sage Weil wrote:
> Do you have a core file? I'm curious what the value of 'r' is.
Attached. Probably. (datetime ...
Matthew Roy
11:43 AM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
Do you have a core file? I'm curious what the value of 'r' is. Sage Weil
11:40 AM CephFS Bug #2110 (Duplicate): osdc/Journaler.cc: 360: FAILED assert(r >= 0)
Assert in MDS. This cluster was running a CephFS home directory workload with one active MDS and one MDS in standby r... Matthew Roy
01:49 PM Bug #2045 (Need More Info): osd: dout_lock deadlock
Sage Weil
01:33 PM Feature #2114 (Resolved): old sepia setup on new hardware
Sage Weil
01:31 PM Feature #2113 (Resolved): objectcacher perfcounters
Sage Weil
01:18 PM Feature #2112 (Resolved): msgr fault injection
Sage Weil
01:18 PM Feature #2111 (Fix Under Review): msgr workloads
Develop the interfaces which will allow us to break messenger sockets at precisely-defined points.
Allow comparison ...
Sage Weil
11:38 AM Tasks #2109: qa/benchmark: Explore using Filebench for benchmarks / stress testing
Justification and a good intro: http://cuddletech.com/blog/pivot/entry.php?id=949 Anonymous
11:36 AM Tasks #2109 (New): qa/benchmark: Explore using Filebench for benchmarks / stress testing
http://filebench.sourceforge.net/
"Ships with more than 40 pre-defined personalities, including the one that descr...
Anonymous
11:05 AM Feature #2108 (New): track object states to inform error injection/testing
Sage Weil
11:04 AM Feature #1412 (Resolved): qa: spec out messenger testing
we now have a high-level plan on how to attack msgr testing. Sage Weil
10:03 AM Bug #1977: mon: ceph command hang
Pretty sure you pushed changes the day you filed it (note reference in previous message), although I can't find the e... Greg Farnum
09:51 AM rgw Bug #2106: failed s3tests.functional.test_s3.test_100_continue
Strange, I can see the request in the apache logs, but not in the rgw logs.... Yehuda Sadeh
09:12 AM Bug #2107 (Resolved): teuthology: lost_unfound fails pg state assert
ubuntu@teuthology:/a/nightly_coverage_2012-02-27-a/14063... Sage Weil

02/26/2012

08:56 PM Bug #1977: mon: ceph command hang
Hmm, I wonder if somehow misdiagnosed this, or inadvertantly fixed it: haven't seen this hang in weeks, and it happen... Sage Weil
05:09 PM rgw Bug #2106 (Resolved): failed s3tests.functional.test_s3.test_100_continue
... Sage Weil
05:02 PM Bug #2022: osd: misdirectect request
ubuntu@teuthology:/a/nightly_coverage_2012-02-26-a/13876$ grep WRN ceph.log
2012-02-26 01:18:03.166529 osd.1 10.3.1...
Sage Weil
11:19 AM Bug #2105 (Resolved): filestore: mkfs does not create initial snap
This bug almost the same as this bug:http://tracker.newdream.net/issues/1707
I followed the instruction:http://ceph....
Yunpeng Gao

02/25/2012

09:33 PM Bug #2104 (Won't Fix): teuthology: wait_for_clean doesn't wait for last_epoch_started to propagate
Sage Weil
09:06 PM Bug #2103 (Resolved): osd: lockdep error on watch_lock
... Sage Weil
09:04 PM Bug #2102 (Can't reproduce): osd: pg stuck in backfill
... Sage Weil

02/24/2012

03:30 PM Feature #2054 (Resolved): teuthology: run radosgw through valgrind
ok, this now works with yaml like... Sage Weil
01:52 PM Feature #2006 (Resolved): osd: report what is blocking peering completion
commit:5c6e8b3795d0cf58814619bfc15cb0841e9a4f17 Sage Weil
01:51 PM CephFS Bug #1792 (Can't reproduce): crash in ceph-mds
even if we could, we would never know, since there isn't any distinguishing info here, and the teuth archive is gone. Sage Weil
01:48 PM RADOS Bug #2096 (Resolved): crush: adjust weight broken for tree, list buckets
commit:708be0a5abef63a5da8409ad13719adb7bb744f8 Sage Weil
01:47 PM RADOS Feature #2101 (Resolved): crushtool: check for weight overflow on reweight
Sage Weil
11:56 AM Feature #2007 (Resolved): osd: enumerate unfound, lost objects, possible locations
Sage Weil
09:52 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations
wip-2007 Sage Weil
11:34 AM Feature #2030 (Resolved): osd: clean up mark_unfound api
Sage Weil
10:34 AM Messengers Feature #2100 (Resolved): msgr: Prevent throttled clients from slowing down non-throttled connect...
Right now, it seems a throttled connection will still receive a TCP receive buffer's worth of data, but because the u... Anonymous
09:15 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
I don't think any of these other states are necessarily problematic, as long as the socket eventually ends up in CLOS... Sage Weil
08:49 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
This may be related to http://tracker.newdream.net/issues/1803 and http://permalink.gmane.org/gmane.comp.file-systems... Josh Durgin
08:33 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
Adding that I see more of the same WARNING() messages in the log for
the same state, as well as others for state 5, ...
Alex Elder
08:13 AM Linux kernel client Bug #2099 (Rejected): messenger: unexpected socket state (4)
Running tests defined by the YAML file below. Note that branch
wip-messenger is 107a8aaf21d01ee6cbc7a638faf1328f2bd...
Alex Elder
07:59 AM CephFS Bug #2092: BUG at fs/ceph/caps.c:999
mdsc->mutex protects the globalish mds client state (request/session lists), which is different from session->s_mutex... Sage Weil
06:57 AM CephFS Bug #2092: BUG at fs/ceph/caps.c:999
Just a quick look at this.
Here's the code:
static void __queue_cap_release(struct ceph_mds_session *session,
...
Alex Elder
06:10 AM Bug #2091 (Can't reproduce): corrupt v5 inc osdmap
logs don't go far enough back. :(
moral of the story: next time grab the full mon data dir immediately in case it...
Sage Weil
05:57 AM Linux kernel client Bug #1907 (Resolved): rbd: don't reuse device ids while they're still in use elsewhere
Committed a couple of weeks ago and has seen no bad effect during the
intervening testing. So I'm marking this one ...
Alex Elder

02/23/2012

08:07 PM Feature #2030: osd: clean up mark_unfound api
wip-2030
Sage Weil
06:52 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
it did. probably a race with another thread in connect() or accept() reregistering a new Pipe.. connect() pbly
Sage Weil
06:47 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
We sure this was run including commit:ebbfdefa120ae93b95780c67027ec9efd4b7b5cd? Greg Farnum
04:38 PM Feature #2006 (In Progress): osd: report what is blocking peering completion
wip-pg-query Sage Weil
04:07 PM Bug #2098 (Resolved): xfs/ext4 non-idempotent transaction
Forcing a sync after a non-idempotent transaction is not adequate to ensure correctness during journal replay.
Con...
Samuel Just
03:36 PM Bug #1820 (Resolved): deprecate "ceph stop"
Sage Weil
02:37 PM Bug #1820: deprecate "ceph stop"
ok, tested all this in wip-1820. 'deactivate' already moves the ceph-mds to standby (not exit), all good there.
n...
Sage Weil
11:30 AM Bug #1820: deprecate "ceph stop"
yeah. i think the simplest is to make 'leave' refuse if it's is < max_mds.
and we could drop max mds from the cep...
Sage Weil
11:22 AM Bug #1820: deprecate "ceph stop"
Oh, I've talked of this before. It might be nice to have a "start ceph-mds only to process a leftover journal and han... Anonymous
11:19 AM Bug #1820: deprecate "ceph stop"
Changing docs is easy, and the branches already rip out "documented" commands. Let's just make it make sense.
I wo...
Anonymous
11:04 AM Bug #1820: deprecate "ceph stop"
It can easily go back into standby (via the respawn() -> execve() path) instead of shutting down. Then it's really "... Sage Weil
10:54 AM Bug #1820: deprecate "ceph stop"
On termination the process exits. On receipt of a stop command it exports authority over the filesystem hierarchy to ... Greg Farnum
10:52 AM Bug #1820: deprecate "ceph stop"
Tommi Virtanen wrote:
> Greg, how is "ceph mds stop 0" different from that ceph-mds receiving a local request to ter...
Anonymous
10:51 AM Bug #1820: deprecate "ceph stop"
Greg, how is "ceph mds stop 0" different from that ceph-mds receiving a local request to terminate (e.g. SIGTERM)? Anonymous
10:49 AM Bug #1820: deprecate "ceph stop"
No, the important part is the hierarchy authority export. Then it shuts down; it's not a "go standby". I guess you co... Greg Farnum
10:48 AM Bug #1820: deprecate "ceph stop"
Which makes me think, is the concept of "go standby" of any value, if there's something that'll automatically say the... Anonymous
10:44 AM Bug #1820: deprecate "ceph stop"
It sounds like that does two things: move the MDS from active to standby, and terminate it. And we're removing the "r... Anonymous
10:31 AM Bug #1820: deprecate "ceph stop"
That one is a bit different.. it's instructing ceph-mds to export all of it's metadata to another node and leave the ... Sage Weil
10:11 AM Bug #1820: deprecate "ceph stop"
Yeah. I can't speak for the threading & locking changes, but the command removal is trivial.
That still leaves
...
Anonymous
09:51 AM Bug #1820: deprecate "ceph stop"
wip-stop and wip-2090 Sage Weil
03:35 PM Bug #2095 (Resolved): osd: need feature bit for v0.42 osdmap encoding change
commit:ddc99983228e761f754e0038aecbe341d7e2181f Sage Weil
09:27 AM Bug #2095: osd: need feature bit for v0.42 osdmap encoding change
we had a feature bit already, we just needed to conditonally encodee the old format, and tweak MOSDMap to reencode ma... Sage Weil
03:16 PM Bug #2094 (Resolved): osd: pgs remapped to down+out osd
making remapped and clean mutually exclusive. commit:e8bc42ff435e5648b88b818775d8fa47989af5dc Sage Weil
10:43 AM Bug #2094: osd: pgs remapped to down+out osd
Reproduced again with stats flushing. This seems to happen every time with this configuration (maybe having only 2 os... Josh Durgin
03:14 PM Bug #2091: corrupt v5 inc osdmap
ok.. yeah, it looks like the monitor may have published a bad inc update or something? unclear. i'll check with the... Sage Weil
03:11 PM Bug #2091: corrupt v5 inc osdmap
OK, picking a few things out of the original corruption report.
The basic header stuff is the same as before, as e...
Alex Elder
02:48 PM Feature #2015 (Resolved): osd: dump in-flight ops via admin socket
Sage Weil
02:37 PM CephFS Feature #2097 (Rejected): mds: 'ceph mds activate <gid>'
ability to explicitly instruct a standby mds to join the active cluster. Sage Weil
12:04 PM Messengers Bug #1985 (Won't Fix): msgr: creating new Pipe for pre-existing connection leaks Pipe if they don...
at least until we demonstrate the problem (after the msg leak fix). this will probably be moot after refactoring som... Sage Weil
12:01 PM RADOS Bug #2096: crush: adjust weight broken for tree, list buckets
wip-crush-adjust Sage Weil
10:48 AM RADOS Bug #2096 (Resolved): crush: adjust weight broken for tree, list buckets
... Sage Weil
11:25 AM Bug #2090 (Resolved): mon: assertion failed on shutdown
commit:963dec82880717054c760a745cf93cc7b43112df Sage Weil
09:06 AM Bug #2080 (Resolved): osd: scrub on disk size does not match object info size
Sage Weil

02/22/2012

10:12 PM Linux kernel client Cleanup #2093: ceph-client: messenger: the "to" parameter to read_partial() needs to go
I think it's right as is... all of those read calls are non-blocking. So the first time around in_base_pos is 0 and ... Sage Weil
05:28 PM Linux kernel client Cleanup #2093 (Resolved): ceph-client: messenger: the "to" parameter to read_partial() needs to go
I have been doing some refactoring of the net/ceph/messenger.c. One of
my aims was to understand the how (and why) ...
Alex Elder
09:33 PM Bug #2091: corrupt v5 inc osdmap
the first badness in the log is below. once it missed one incremental, things probably got out of sync and the pg_te... Sage Weil
09:28 PM Bug #2091: corrupt v5 inc osdmap
Oh.. that means the pg_temp mapping was inserted by a previous inc map, probably. we need to find the first instance... Sage Weil
06:23 PM Bug #2091: corrupt v5 inc osdmap
I've manually decoded the entire ceph_osdmap dumped in the log and everything
therein looks fine. (This was overkil...
Alex Elder
01:20 PM Bug #2091: corrupt v5 inc osdmap
I'm starting to look at this in detail but haven't concluded what went wrong yet.
Does it matter whether it was th...
Alex Elder
09:33 AM Bug #2091: corrupt v5 inc osdmap
reencoded to old format (using latest ceph-dencoder) gives us... Sage Weil
09:28 AM Bug #2091 (Can't reproduce): corrupt v5 inc osdmap
... Sage Weil
09:20 PM Bug #2090: mon: assertion failed on shutdown
... Sage Weil
09:20 PM Bug #2090: mon: assertion failed on shutdown
wip-2090 Sage Weil
05:04 AM Bug #2090 (Resolved): mon: assertion failed on shutdown
I was running repeated cycles of the kernel_untar_build.sh workunit
to try to reproduce a problem in the client and ...
Alex Elder
09:17 PM Bug #2095 (Resolved): osd: need feature bit for v0.42 osdmap encoding change
Sage Weil
07:02 PM Bug #2094 (Resolved): osd: pgs remapped to down+out osd
This is why the dump_stuck test fails on master. When one osd is marked out, the pg is remapped incorrectly:... Josh Durgin
10:06 AM Feature #2005 (Resolved): mon: track timestamps on pg states
Sage Weil
10:06 AM Feature #2058 (Resolved): ceph: query pg state
Sage Weil
10:03 AM Feature #2054: teuthology: run radosgw through valgrind
wip-valgrind Sage Weil
09:45 AM CephFS Bug #2092 (Can't reproduce): BUG at fs/ceph/caps.c:999
... Sage Weil
09:36 AM Bug #2022: osd: misdirectect request
hit this again:... Sage Weil

02/21/2012

04:58 PM rgw Cleanup #2089 (Resolved): rgw: less dout(0) noise?
i think that's hwere this si coming from:... Sage Weil
03:32 PM Feature #1932 (Resolved): mon: before accepting a new crushmap, monitor should validate and test ...
Sage Weil
03:31 PM Feature #2088 (Rejected): msgr: refactor 2 threads to one
Sage Weil
03:30 PM Feature #1412 (New): qa: spec out messenger testing
Sage Weil
03:29 PM Feature #1412: qa: spec out messenger testing
er, wrong bug! Sage Weil
12:22 PM rgw Bug #2083 (Resolved): rgw: test_object_raw_authenticated* fail (on xfs?)
Should be fixed now. Updated relevant teuthology tests to use only url safe chars. Also updated rgw-admin to disallow... Yehuda Sadeh
10:34 AM rgw Bug #2083: rgw: test_object_raw_authenticated* fail (on xfs?)
Not really related to xfs. The problem is that when generating authenticated urls, boto doesn't escape the access key... Yehuda Sadeh
10:55 AM Feature #2087 (Resolved): lightweight filestore workload generator
simple program that uses FileStore and generates something that looks vaguely like what an OSD does. e.g.,
- stre...
Sage Weil
09:13 AM Bug #2084: segfault in tcmalloc
and again (hammer b.yaml). right before the crash sched_scrub() was called...... Sage Weil

02/20/2012

04:19 PM Messengers Bug #2086 (Resolved): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
... Sage Weil
02:54 PM Linux kernel client Cleanup #2085 (New): kclient: improve mtime update in page_mkwrite
this should be done in the various helpers we call when we successfully mark a page dirty, not in the outer function. Sage Weil
02:29 PM Bug #1765 (Resolved): osd: 'call' op can return data even if op is modifying
commit:afc1748db52911295708e4afbe7fd7884c97dbbf Sage Weil
02:27 PM Bug #1821 (Resolved): librados: rados_create_with_context is unusable
we could still add refcounting to the CephContext later. Sage Weil
02:24 PM Bug #2084 (Can't reproduce): segfault in tcmalloc
heap corruption?... Sage Weil
01:52 PM Linux kernel client Bug #2081: msgr: spinlock badness?
ubuntu@teuthology:/a/nightly_coverage_2012-02-20-b/12984 with same trace on the console. Sage Weil
01:10 PM Bug #2080: osd: scrub on disk size does not match object info size
Sage Weil
08:48 AM Bug #2080: osd: scrub on disk size does not match object info size
reproduced with log. metropolis:~sage/bug-2080 Sage Weil
06:20 AM Bug #2080: osd: scrub on disk size does not match object info size
ubuntu@teuthology:/a/master-2012-02-19_19:50:05/12884 Sage Weil
08:31 AM Cleanup #2021 (Resolved): fix signal handlers
Sage Weil
06:29 AM rgw Bug #2083 (Resolved): rgw: test_object_raw_authenticated* fail (on xfs?)
This fails sometimes, but not always. It seems to happen more often on xfs, but maybe that's my imagination.... Sage Weil

02/19/2012

03:52 PM Bug #2082 (Resolved): osd: broken queuing during replay
... Sage Weil
03:49 PM Bug #1638 (Won't Fix): Can't create object with large xattrs in a single operation (on extN)
Sage Weil
03:48 PM CephFS Bug #2018 (Resolved): mds: can't change file_max
oh, i fixed this a week or two ago. the problem was that the file isn't open read/write, but Client was still trying ... Sage Weil
03:46 PM Bug #2032 (Resolved): paxos: somehow didn't update stash alongside new states
Sage Weil
03:45 PM Bug #2044 (Resolved): osd: pg stuck in active+backfill
Sage Weil
03:45 PM Feature #1412 (Can't reproduce): qa: spec out messenger testing
this code has been refactored a bit.
the messenger tests won't directly trigger this, though we may the/an under...
Sage Weil
03:45 PM Bug #1631 (Can't reproduce): osd: failed assert(repop_queue.front() == repop)
this code has been refactored a bit.
the messenger tests won't directly trigger this, though we may the/an under...
Sage Weil
03:40 PM Feature #1932: mon: before accepting a new crushmap, monitor should validate and test some inputs
wip-crush Sage Weil
02:51 PM Bug #2080: osd: scrub on disk size does not match object info size
... Sage Weil

02/18/2012

11:13 PM Linux kernel client Bug #2081 (Can't reproduce): msgr: spinlock badness?
captured this console fragment from a crashed qa run... Sage Weil
10:57 PM Bug #2070 (Duplicate): osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
ok i didn't observe this crash and trace it back, but i'm almost certain it's the same as #2075.
commit:344c202203...
Sage Weil
01:54 PM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
ubuntu@teuthology:/a/nightly_coverage_2012-02-18-a/12494 Sage Weil
10:56 PM Bug #2075 (Resolved): osd: recover_got assert
commit:344c20220345197c03fbaf46e2c1289d81a0a14f Sage Weil
02:01 PM Bug #2075: osd: recover_got assert
ubuntu@teuthology:/a/nightly_coverage_2012-02-18-a/12489... Sage Weil
10:01 PM Feature #2074 (Rejected): teuthology: remove old kernel packages
i did this manually on sepia. new teuth will reimage regularly. Sage Weil
09:24 PM Messengers Bug #2073 (Resolved): msgr: shutdown can hang
this appears to be fixed with commit:787dd1709797876dd9fa6004c6723df859003b59, unless there is some subtle difference... Sage Weil
03:51 PM Feature #2034 (Resolved): osd: refactor push code
Sage Weil
03:50 PM Feature #2058: ceph: query pg state
wip-pg-query Sage Weil
02:15 PM Bug #2061 (Resolved): osd: scrub mismatch
pretty sure this was fixed by the recover refactor.. haven't hit it since then. Sage Weil
01:48 PM Bug #2080 (Resolved): osd: scrub on disk size does not match object info size
... Sage Weil

02/17/2012

04:26 PM Bug #1975: btrfs: EINVAL on snap create
We aren't triggering this any more, now that the filestore transaction bug is fixed. Sage Weil
03:13 PM Bug #2061: osd: scrub mismatch
oooooh, these went away and i was confused. but hten i just ran the regression suite against next and hit them again... Sage Weil
01:22 PM Bug #2068 (Resolved): osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.hist...
Sage Weil
12:46 PM Bug #2079 (Duplicate): rbd: creating a snapshot with the same name doesn't return an error
... Josh Durgin
12:37 PM Cleanup #2078 (Resolved): ceph tool: only output response data to stdout
By default, "ceph osd getmap" or any other command that fetches binary data outputs it to stdout. However, other info... Josh Durgin
10:32 AM Bug #2077 (Resolved): mon: assert in Paxos::is_consistent
we don't need a stash for v == 1. make is_consistent() check match slurp() logic. commit:db41bdda7e02aedc42d14be635... Sage Weil
09:41 AM Bug #2077 (Resolved): mon: assert in Paxos::is_consistent
I tripped across a bug when adding a new monitor into an existing cluster
(see attached). I was on GIT commit
4b3bb...
Sage Weil
09:36 AM Bug #2076 (Resolved): ceph fails to build with gcc 4.7
commit:d913e5e670282c19a35c6cb420fc1d711c388cc4 Sage Weil
09:30 AM Bug #2076: ceph fails to build with gcc 4.7
That is indeed fine.
Thanks!
David Nalley
09:25 AM Bug #2076: ceph fails to build with gcc 4.7
Committing these, with both of your signed-off-by's.. I assume that's okay? Sage Weil
08:13 AM Bug #2076 (Resolved): ceph fails to build with gcc 4.7
Fedora has moved to gcc 4.7 for the upcoming Fedora 17 release[1].
Currently Ceph fails to build with gcc 4.7.
...
David Nalley

02/16/2012

08:55 PM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
ubuntu@teuthology:/a/nightly_coverage_2012-02-16-b/12294 Sage Weil
11:32 AM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
if i had to guess this is related to the pg init() refactor. not much to be found from the core, except that pg->sta... Sage Weil
09:39 AM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
also hit this on ubuntu@teuthology:/a/nightly_coverage_2012-02-15-b/12169 Sage Weil
09:36 AM Bug #2070 (Duplicate): osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
ubuntu@teuthology:/a/nightly_coverage_2012-02-15-b/12164... Sage Weil
08:44 PM Bug #2075 (Resolved): osd: recover_got assert
... Sage Weil
08:37 PM Messengers Bug #2073: msgr: shutdown can hang
here's the bt:... Sage Weil
04:15 PM Messengers Bug #2073 (Resolved): msgr: shutdown can hang
saw this... Sage Weil
04:36 PM Feature #2074 (Rejected): teuthology: remove old kernel packages
sepia disks are filling up from all the old kernel packages (/lib/modules/$version is 1.3 GB each) Sage Weil
04:10 PM rgw Bug #2072 (Resolved): rgw: owner cannot change acl if it doesn't have bucket read permission
rgw_op.cc:read_acls() tests for read permission, this is wrong. Yehuda Sadeh
03:11 PM CephFS Bug #2071: kclient: pjd mkfifo failures
ubuntu@teuthology:/a/nightly_coverage_2012-02-16-b/12255 Sage Weil
03:11 PM CephFS Bug #2071 (Can't reproduce): kclient: pjd mkfifo failures
... Sage Weil

02/15/2012

03:28 PM Linux kernel client Bug #2069 (Can't reproduce): client crash during kernel_untar_build rm -r step
this keeps happening:... Sage Weil
03:24 PM Bug #2022: osd: misdirectect request
weird, saw this twice a few days (maybe 18 runs apart), but wasn't able to reproduce after several hundred iterations... Sage Weil
03:20 PM Bug #2033 (Closed): osd: segfault in OSD::update_heartbeat_peers()
I'm not totally sure how this happened, but the new heartbeat locking should avoid it.. Sage Weil
03:18 PM Cleanup #2049 (Resolved): osd: improve heartbeat peer locking
Sage Weil
03:18 PM Bug #2060 (Resolved): osd: lone osd is not marked degraded with replication level 2
Sage Weil
02:11 PM Bug #2056 (Resolved): osd: unfound object during backfill qa test
fixed in backfill task.. it was killing a second osd before waiting for things to peer/recover from the first failure. Sage Weil
12:01 PM Bug #2068: osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.history.same_in...
Oh, i see the problem.. the osdmap ref is taken by lock().. this pg hasn't seen the new map yet.
just need to tag...
Sage Weil
11:49 AM Bug #2068: osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.history.same_in...
looking at the core file.
- we are primary
- replica is sending us an info message, with one record. it is therefo...
Sage Weil
09:21 AM Bug #2068 (Resolved): osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.hist...
... Sage Weil

02/14/2012

10:50 PM Bug #1765 (In Progress): osd: 'call' op can return data even if op is modifying
the c++ librados api now separates these operations. osd now refuses to return any result data payload if op is mark... Sage Weil
09:36 PM Cleanup #2021 (In Progress): fix signal handlers
Sage Weil
09:30 PM CephFS Bug #1991 (Duplicate): mds: crash during clean shutdown
see #1549.. we are racing with exit(0) from the SIGTERM handler Sage Weil
09:28 PM Bug #2032: paxos: somehow didn't update stash alongside new states
Can we close this one? Sage Weil
09:27 PM Bug #2037 (Resolved): mon: a crash in the middle of slurping is unrecoverable
Sage Weil
05:06 PM Bug #2067 (Resolved): librados: we leak CephContext from rados_create()
Sage Weil
05:05 PM rgw Feature #2066 (Resolved): rgw: make list_objects efficient
Sage Weil
03:14 PM Feature #1772: rbd: define new on-disk header format
The other point that came up was, if rbd can't delete the parent volume until all children have been deleted, the gla... Anonymous
03:13 PM Feature #1772: rbd: define new on-disk header format
Being a little bit more explicit: the point of the UUIDs is to allow child images to add themselves to the parent's l... Anonymous
01:07 PM Feature #1772: rbd: define new on-disk header format
To get around the issue of a child image needing to update the parent image's header, Sage suggested only allowing ac... Josh Durgin
02:07 PM Bug #1821: librados: rados_create_with_context is unusable
see wip-1821 Sage Weil
01:06 PM Feature #988: librbd: trivial layering
To get around the issue of a child image needing to update the parent image's header, Sage suggested using only allow... Josh Durgin
12:45 PM CephFS Feature #2065: teuthology: specify mount options for kclient task
take a dict instead of list, and let you specify mount options on a per-client basis. Sage Weil
12:45 PM CephFS Feature #2065 (Closed): teuthology: specify mount options for kclient task
Sage Weil
11:55 AM Linux kernel client Bug #2064 (Resolved): ceph-client: messenger: nocrc flag not implemented correctly
The "nocrc" option is supposed to disable CRC32 calculation on messages
sent between ceph entities. The default is ...
Alex Elder
11:55 AM rgw Bug #2063 (Resolved): rgw: access key shouldn't contain chars that need to be url encoded
We see some issues in our tests that when generating signed url these chars aren't being encoded. We should try to av... Yehuda Sadeh
11:50 AM Bug #2062 (Resolved): filestore: idempotent test failed
the test was broken. triggered by filestore now noticing clone could fail.
commit:7b1c144f21c3ccfe2dfd4342e3d5461...
Sage Weil
11:36 AM Bug #2062 (Resolved): filestore: idempotent test failed
... Sage Weil
09:38 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
in my case, this looks like #2045. Sage Weil
07:59 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
I just hit this in qa, ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-02-14-a/11871. Sage Weil
09:37 AM Bug #2061 (Resolved): osd: scrub mismatch
New one, "[ERR] 0.c scrub stat mismatch, got 6/6 objects, 2/5 clones, 13511948/13511948 bytes."
Workload was
<pre...
Sage Weil
09:31 AM Bug #2045: osd: dout_lock deadlock
again, although this time there is a write that looks blocked somehow... Sage Weil

02/13/2012

04:33 PM Feature #2028 (Resolved): qa: allocate disks to btrfs on new hardware
Sage Weil
01:23 PM Feature #2028 (In Progress): qa: allocate disks to btrfs on new hardware
Sage Weil
03:47 PM Bug #2060 (Resolved): osd: lone osd is not marked degraded with replication level 2
With only one osd in, 'ceph -s' and 'ceph health' should report that the cluster has degraded objects. Josh Durgin
02:54 PM Feature #1836 (Resolved): filejournal: use async directio to write to the journal
Sage Weil
02:50 PM rgw Feature #773 (Resolved): rgw: efficient list-objects filtering
That was fixed when we introduced the bucket index. Yehuda Sadeh
02:40 PM rgw Bug #2048 (Resolved): rgw: multipart upload listing return key starting with _multipart_
It seems that this has already been resolved, most likely by the fix for #2025. Yehuda Sadeh
01:14 PM Feature #2058 (Resolved): ceph: query pg state
Sage Weil
01:12 PM Feature #2005 (In Progress): mon: track timestamps on pg states
Sage Weil
01:11 PM Feature #2005 (Resolved): mon: track timestamps on pg states
Sage Weil
01:06 PM Feature #1962 (In Progress): ferro: Trigger vMedia boot via IPMI/DRAC
Sage Weil
01:06 PM Feature #1571 (In Progress): osd: non-trivial map object
Sage Weil
12:11 PM rgw Cleanup #2036 (Resolved): rgw: bucket index tree contains the same info 3 times
Ok, as of commit:9065dbd36d35b6e44c66293e74b6ba92031ca9ae it's only appears twice. Removing another copy of the objec... Yehuda Sadeh
09:37 AM Bug #2056 (Resolved): osd: unfound object during backfill qa test
ubuntu@teuthology:/a/nightly_coverage_2012-02-13-a/11793
it happened a couple days earlier, too.
Sage Weil

02/12/2012

04:16 PM Bug #1759 (Resolved): mds/client: truncate size overflow, fails with EINVAL
this is a problem with weird truncate_seq/size values in requests, that the osd is now cleaning up.
commit:0ded7e4da...
Sage Weil
04:15 PM Bug #1688 (Closed): Benjamin: pg stuck in scrub
Sage Weil
02:29 PM Bug #2022: osd: misdirectect request
saw this again on rados_api_tests:... Sage Weil

02/11/2012

10:43 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
commit:508be8e3b3b47b71035d07d26dead49b3b91463d hopefully fixes the issue. Also reverted previous fix. Yehuda Sadeh
09:42 PM rgw Bug #2043 (In Progress): rgw: cannot use '+' in url
It's still broken. Certain clients use '+' as a space. I think that the apache rewrite rule makes things inconsistent. Yehuda Sadeh
10:32 PM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
I guess a btrfs one. Right now I'm running a couple of virtual machines without any issues, so for now we can leave t... Wido den Hollander
09:57 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
This looks like a btrfs or kernel issue to me. Have you seen it since? Sage Weil
03:08 PM Bug #1943 (Duplicate): osd: bad clone transaction on journal replay
Sage Weil
03:07 PM Bug #1949 (Resolved): osd: ENOTEMPTY on collection removal from snaptrimmer
fixed by commit:7c6dff487171deb37852e2fb059dcb6e3af65702 Sage Weil
03:05 PM Feature #2038 (Rejected): mon: can't currently do commands/get status when not in quorum
I think it's fine to fail (or actually, block/wait on) authentication if we are out of quorum. The client will retry... Sage Weil
02:22 PM Cleanup #2023 (Resolved): btrfs: Use btrfs device scan instead of btrfsctl -a
commit:a414fd51c7c5ae5dbe9e3af7db6f17741a58c1a7 Sage Weil
10:23 AM Bug #1758 (Can't reproduce): OSD segfault in SimpleMessenger::send_message
Haven't seen this one in ages, either. Going to assume it's been fixed. Sage Weil
10:22 AM Bug #1992 (Can't reproduce): OSD::get_or_create_pg
Hmm, we haven't been able to trigger this with our thrashing. Sage Weil
10:21 AM Bug #1493 (Resolved): cmon: nice error message on undecodable (osdmap, monmap) input
commit:7eff37be494714febed4e6724237c03722b4e8c5 Sage Weil
10:07 AM Feature #2055 (Duplicate): osd: fix up push cloning
Sage Weil

02/10/2012

09:17 PM RADOS Bug #1738 (Duplicate): bad crushmap behavior
Sage Weil
09:17 PM RADOS Bug #2047 (Duplicate): crush: with a rack->host->device hierarchy, several down devices are likel...
Sage Weil
05:30 PM Bug #1949: osd: ENOTEMPTY on collection removal from snaptrimmer
Sage Weil
01:22 PM Bug #1949: osd: ENOTEMPTY on collection removal from snaptrimmer
another log, with filestore debugging, and the contents of the fs. There was... Sage Weil
05:06 PM rgw Bug #2051 (Resolved): rgw: can't use '%' in object name
Fixed, commit:7e32a3d4bc90d84970754350414c553e7ca01299. Yehuda Sadeh
02:48 PM rgw Bug #2051 (Resolved): rgw: can't use '%' in object name
Yehuda Sadeh
04:34 PM Feature #2055 (Duplicate): osd: fix up push cloning
Sage Weil
04:32 PM Feature #2054 (Resolved): teuthology: run radosgw through valgrind
Sage Weil
04:13 PM Feature #2053 (Rejected): librados: caching
Sage Weil
04:12 PM Feature #2052 (Resolved): librbd: caching
Sage Weil
03:30 PM rgw Cleanup #2036: rgw: bucket index tree contains the same info 3 times
the reason it is kept 3 times is that we index it by the bucket name, have the bucket name as one of the fields in th... Yehuda Sadeh
03:20 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
Fixed, commit:a6d7629c177fbab722a7a0c7f861caf91ff92deb. Yehuda Sadeh
03:19 PM Bug #2050 (Resolved): rgw: crash at Objecter::_linger_commit()
Fixed, commit:0de1d5502b0d9ab0f0809947a0664586d7754a08. Yehuda Sadeh
02:27 PM Bug #2050: rgw: crash at Objecter::_linger_commit()
We think that what happens is this:
librados::linger()
->ack response
unregister_watcher()
->commit response
...
Yehuda Sadeh
02:26 PM Bug #2050 (Resolved): rgw: crash at Objecter::_linger_commit()
ubuntu@teuthology:/a/nightly_coverage_2012-02-09-a/11236$ cat ./remote/ubuntu@sepia72.ceph.dreamhost.com/log/rgw.stdo... Yehuda Sadeh
01:58 PM Cleanup #2049: osd: improve heartbeat peer locking
need to move heartbeat peer stuff out from under osd_lock to facilitate pushing pg peering crap into the worker threads Sage Weil
01:57 PM Cleanup #2049 (Resolved): osd: improve heartbeat peer locking
Sage Weil

02/09/2012

09:41 PM Bug #1974: osd: radosmodel crash on thrashing
commit:359dfb9966d15d997f9e0351a5ed8de1faae62fe Sage Weil
09:41 PM Bug #1974 (Resolved): osd: radosmodel crash on thrashing
Sage Weil
09:20 PM Bug #1975: btrfs: EINVAL on snap create
I'm pretty sure this was triggered by #2046. There is still a btrfs bug, but we were doing the wrong thing if rmdir ... Sage Weil
09:18 PM Bug #2013 (Resolved): osd: messages for pgs we don't store are never freed
Sage Weil
04:38 PM Bug #2046 (Resolved): filestore: do_op running during commit
commit:1009d1a016f049e19ad729a0c00a354a3956caf7 and commit:93d7ef96316f30d3d7caefe07a5a747ce883ca2d Sage Weil
04:02 PM Bug #2046: filestore: do_op running during commit
this was broken by commit:259c509a8941bf7cdad8bd4ede0ccd73ca8a83d3, way back in v0.25! Sigh. The wait condition for... Sage Weil
10:05 AM Bug #2046 (Resolved): filestore: do_op running during commit
commit_start() is supposed to quiesce writes, but I see... Sage Weil
04:24 PM Bug #2044: osd: pg stuck in active+backfill
This should be fixed by commit:f0334673ab8547807b961aae19a8e53531585e3f. Josh Durgin
10:55 AM rgw Bug #2048 (Resolved): rgw: multipart upload listing return key starting with _multipart_
reported by jdwilson over irc. Yehuda Sadeh
10:41 AM RADOS Bug #2047 (Resolved): crush: with a rack->host->device hierarchy, several down devices are likely...
See http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/5166
Sage says the cause is down devices only tr...
Josh Durgin
10:02 AM Bug #2045: osd: dout_lock deadlock
ubuntu@teuthology:/a/nightly_coverage_2012-02-09-a/11210
metropolis:~sage/bug-2045
Sage Weil
09:56 AM Bug #2045 (Can't reproduce): osd: dout_lock deadlock
a thread is blocked on dout_lock, can't tell who. Sage Weil

02/08/2012

09:30 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
Hmm.. yeah, I don't think we have anything beyond these console dumps. And we don't capture any kind of kernel core ... Sage Weil
09:17 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
Is there a core file for this problem anywhere?
It would really be nice to poke around in the message, or the
con...
Alex Elder
07:45 PM Bug #2044 (Resolved): osd: pg stuck in active+backfill
jmlowe ran into this on his cluster several times. The primary doing backfill failed to requeue the pg for recovery.
...
Josh Durgin
04:54 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
Either in signed urls (e.g., as part of the uid), or in object names. Reason is that url_decode removes it. Relax url... Yehuda Sadeh
04:45 PM Bug #2042: mon: crash in LogMonitor::update_from_paxos
ubuntu@teuthology:/a/nightly_coverage_2012-02-08-b/11127 Sage Weil
04:45 PM Bug #2042: mon: crash in LogMonitor::update_from_paxos
core + binary + tarball are at metropolis:~sage/bug-2042 Sage Weil
04:43 PM Bug #2042 (Duplicate): mon: crash in LogMonitor::update_from_paxos
... Sage Weil
02:26 PM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
Alex Elder
02:23 PM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
After a few weeks of wandering around the code, figuring out how
things work and refactoring and fixing things as I ...
Alex Elder
01:17 PM Cleanup #2041 (Resolved): osd: move peering into worker threads
Sage Weil
10:52 AM Bug #1974: osd: radosmodel crash on thrashing
Just hit this:
- clean_up_local removed an object (due it a 'delete' log entry)
- a read came in and read it befo...
Sage Weil
06:41 AM rgw Feature #2040 (Resolved): rgw: disable rgw log through ceph.conf
Currently the way to do it is through the apache conf. Yehuda Sadeh

02/07/2012

10:56 PM Bug #2013 (In Progress): osd: messages for pgs we don't store are never freed
see wip-pg-waiters? Sage Weil
10:46 PM CephFS Bug #1996 (Duplicate): mds: scatter_nudge() bad pointer on shutdown?
this is the signal handler thing Sage Weil
10:45 PM Bug #1901 (Resolved): Missing files in ceph packages results in build failure of tests
Sage Weil
10:43 PM rgw Bug #1721 (Can't reproduce): rgw: spurious multipart-upload failures
Sage Weil
10:41 PM Bug #1626 (Can't reproduce): ceph-mon HA not working right; all must be up
Sage Weil
10:37 PM CephFS Bug #1902 (Won't Fix): mds: unittest_interval_tree bad memory access
Sage Weil
10:37 PM Bug #1659 (Can't reproduce): Upgrade from 0.27 -> 0.37 going wrong, OSDs miss map updates
Sage Weil
10:35 PM Bug #1564 (Won't Fix): osd: osd should not be primary before data is replicated
no more backlogs, so this problem is mostly moot. it can sort of still happen (to a vastly decreased degree), but it... Sage Weil
10:33 PM Bug #1529 (Can't reproduce): cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone sug...
Sage Weil
10:31 PM Bug #1797 (Resolved): configure doesn't link to pthread on Fedora 14 on linking librados-config
I'm going to assume that using the automake pthread macros fix this (commit:c5144eed4eadf5cfaa0a41c0ced2a1cd3462289f)... Sage Weil
10:30 PM Cleanup #1899 (Resolved): use acx_pthread instead of hardcoding libs and cflags into build system
applied this a while back, commit:c5144eed4eadf5cfaa0a41c0ced2a1cd3462289f Sage Weil
10:29 PM rgw Feature #2039 (Rejected): rgw: keep more than one bucket marker object
We generate a unique bucket index id by leveraging the pg version returned on a write operation to a special bucket m... Yehuda Sadeh
10:28 PM CephFS Bug #1827 (Resolved): libceph: hang on creating a file
finally looked at this. the problem is just that open wasn't passed O_WRONLY or O_RDWR, and ceph_write() wasn't retu... Sage Weil
10:20 PM Feature #2038 (Rejected): mon: can't currently do commands/get status when not in quorum
For obvious reasons, the MonClient has to authenticate with a monitor before talking to it. Right now this is accompl... Greg Farnum
10:05 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
Made a new bug for that issue anyway. #2037 Greg Farnum
04:50 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
oh, that was meant for #2032! Sage Weil
04:49 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
I think that could happen, so I'll check and fix it if so, but it's not what happened here. Greg Farnum
04:44 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
oh.. maybe it was slurping, and crashed before it stashed. when it restarted it didn't go back into slurp, because t... Sage Weil
03:03 PM Bug #2031 (Can't reproduce): paxos: failed assert (begin->last_committed == last_committed)
... Greg Farnum
10:04 PM Bug #2037 (Resolved): mon: a crash in the middle of slurping is unrecoverable
If a monitor comes up and starts slurping, it will start adding incremental maps to its store and update [first|last]... Greg Farnum
09:39 PM Bug #1547 (Resolved): client log doesn't go to stderr unless 'log file' specified
fixed this a few releases back Sage Weil
09:38 PM Bug #1688: Benjamin: pg stuck in scrub
is this old/fixed? haven't seen it in a while Sage Weil
09:03 PM Feature #2024 (Resolved): make gitbuilders time out when github is sucking
Sage Weil
04:04 PM rgw Cleanup #2036 (Resolved): rgw: bucket index tree contains the same info 3 times
This is apparent by running strings on the index objects. We should be able to reduce the excessive information (whic... Yehuda Sadeh
04:01 PM rgw Bug #2035 (Resolved): rgw: bucket removal fails
bucket removal sometimes either return 'access denied' or 'bucket not empty' Yehuda Sadeh
03:45 PM Bug #2033: osd: segfault in OSD::update_heartbeat_peers()
... Sage Weil
03:32 PM Bug #2033 (Closed): osd: segfault in OSD::update_heartbeat_peers()
just hit this twice, on two different clusters, both under testrados workloads.... Sage Weil
03:32 PM Feature #2034 (Resolved): osd: refactor push code
Sage Weil
03:07 PM Bug #2032 (Resolved): paxos: somehow didn't update stash alongside new states
lxo reported that on one monitor, after seeing #2031 and bringing the monitor back up (much later), the monitor faile... Greg Farnum
02:28 PM Bug #1909 (Resolved): Two mons crash after starting the third one
this really looks like the bug fixed in commit:bfbeae68c045de76ede86ca4f72d2a760a19c84b... the sender sent a message ... Sage Weil
02:19 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
We only saw this the once, but we believe the bug and want to keep it open. Anonymous
02:18 PM Messengers Bug #1747: msgr: osd connection originates from wrong port
We only saw this the once, but we believe the bug and want to keep it open. Anonymous
02:14 PM Bug #1631: osd: failed assert(repop_queue.front() == repop)
We haven't seen this, but hope that the messenger tests now being designed will flush it out again. Anonymous
02:11 PM CephFS Bug #1947 (Duplicate): mds: SIGBUS during _mark_dirty
#1549 Sage Weil
02:06 PM RADOS Feature #1639: osd: guard against bad objects in cls map functions
the specific instance was fixed. can we in general catch any exception in the class methods? safely? Sage Weil
02:02 PM Bug #1530 (Can't reproduce): osd crash during build_inc_scrub_map
Sage Weil
11:28 AM Feature #2030: osd: clean up mark_unfound api

ceph pg 1.2 mark_unfound_revert foo
NOT ceph tell osd.12 mark_unfound revert pgid objectname
Sage Weil
11:27 AM Feature #2030 (Resolved): osd: clean up mark_unfound api
Sage Weil
11:27 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations

ceph pg 1.2 list_missing|list_unfound
- list of missing objects, lcoators, and known locations (if !unfound)
Sage Weil
11:19 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations
PGLS_MISSING
(new pg op) using rados
Sage Weil
11:27 AM Feature #2006: osd: report what is blocking peering completion

ceph pg 1.2 status|query
- peering status
- recovery status
- another interseting status
Sage Weil
11:11 AM Feature #2006: osd: report what is blocking peering completion
ceph ...
ceph tell <who> ....
ceph pg query 1.2
map pg, query osd directly with
['pg', 'query', '1.2']
Sage Weil
11:07 AM Feature #2005: mon: track timestamps on pg states
query list of stale/unpeered/whatever pgs
ceph pg dump_stuck [--format=json|plain]
Sage Weil
10:06 AM Bug #1974: osd: radosmodel crash on thrashing
Summary: An object was deleted, but after a recovery was found to be back ... which is almost surely indicative of a ... Anonymous

02/06/2012

05:48 PM Bug #1975: btrfs: EINVAL on snap create
RATIONALE:
We seem to be able to make this happen, and believe it to be a btrfs bug.
We are not calling it u...
Anonymous
05:44 PM Feature #1932: mon: before accepting a new crushmap, monitor should validate and test some inputs
Users can create their own rules, so bad rules will happen, and we must do a better job of making the Monitors robust... Anonymous
04:22 PM rgw Bug #2025 (Resolved): rgw: objects starting with underscore are badly listed
Fixed, commit:c23d217c93bb6ed21c1b07e347710e18446a3abc. Yehuda Sadeh
04:22 PM rgw Bug #2029 (Resolved): rgw: space in object name is turned into a different character
Fixed, commit:6df25e53abe37b19b38e5657dbf3b4c37f03d8e3. Yehuda Sadeh
02:37 PM rgw Bug #2029 (Resolved): rgw: space in object name is turned into a different character
looks like we fail to use the url-decoded object name. Yehuda Sadeh
02:11 PM Feature #2028 (Resolved): qa: allocate disks to btrfs on new hardware
root isn't consistently on /dev/sda, it seems. or on a consistent /dev/disk/by-path on the plana nodes. Sage Weil
02:04 PM Feature #1970 (Resolved): osd: migrate to new encoding schemes
this is all done, but unmerged; it'll get pulling into a release with a bunch of other encoding updates. Sage Weil
10:52 AM rgw Bug #2027 (Can't reproduce): rgw -> apache miscommunication
There were some mystery failures, where we've seen rgw getting requests from apache, processing them, sending respons... Yehuda Sadeh
10:10 AM Bug #1973 (Can't reproduce): osd: segfault in ReplicatedPG::remove_object_with_snap_hardlinks
let's chalk this up to the bad object_info_t Sage Weil
10:06 AM Bug #1984 (Can't reproduce): osd: failed assert, got into finish_recovery_ops without any recover...
Sage Weil
10:00 AM Bug #1490 (Resolved): cfuse assert failure: assert(ob->last_commit_tid < tid)
Sage Weil
03:36 AM Bug #2026 (Can't reproduce): osd: ceph::HeartbeatMap::check_touch_file
After my data loss due to a btrfs bug I re-installed my whole cluster with 0.41 and kernel 3.2 (ceph-client with btrf... Wido den Hollander

02/05/2012

08:54 PM Bug #1975: btrfs: EINVAL on snap create
... Sage Weil
07:11 PM rgw Bug #2025 (Resolved): rgw: objects starting with underscore are badly listed
Yehuda Sadeh

02/04/2012

06:02 PM Feature #2024 (Resolved): make gitbuilders time out when github is sucking
Sage Weil
03:32 PM CephFS Bug #1945: blogbench hang on caps
ubuntu@teuthology:/a/nightly_coverage_2012-02-04-a/10600 Sage Weil
12:01 PM Cleanup #2023 (Resolved): btrfs: Use btrfs device scan instead of btrfsctl -a
I justed upgraded my btrfs userland tools and saw:... Wido den Hollander

02/03/2012

11:00 PM Bug #2022 (Resolved): osd: misdirectect request
from rados_api_tests.yaml:
[WRN] client.4292 10.3.14.128:0/3016298 misdirected client.4292.0:4 0.0 to osd.1 not [0,1...
Sage Weil
10:48 AM Cleanup #2021 (Resolved): fix signal handlers
Sage Weil
10:45 AM Feature #2008 (Resolved): mon: include full/nearfull in health check
Sage Weil
10:31 AM Feature #2004 (Resolved): qa: make deb gitbuilder faster
Sage Weil
10:11 AM Feature #2020 (Duplicate): collectd: submit plugin upstream
Sage Weil
 

Also available in: Atom