Activity
From 02/03/2012 to 03/03/2012
03/03/2012
- 09:33 PM Bug #2128: filestore: check() fails during sync
- could it be commit:75cbed61e94a7974e40230360c6781d85f47576d ?
- 09:11 PM Bug #2133: osd: recovery_complete
- 02:18 PM Bug #2133 (Resolved): osd: recovery_complete
- pull raced with clones, clone_subset changed, it got confused....
- 09:10 PM Bug #2135: cephtool: osdc/Objecter.cc: 375: FAILED assert(initialized)
- librados shutdown race
- 07:38 PM Bug #2135 (Resolved): cephtool: osdc/Objecter.cc: 375: FAILED assert(initialized)
- ...
- 03:16 PM CephFS Bug #1796: mds: exit cleanly on EBLACKLISTED
- people hit this and it's confusing when ceph-mds crashes...
wip-1796 - 02:38 PM Feature #2134 (Resolved): qa: smoke suite
- pick out some regression tests that run reasonably quickly and have decent coverage.
03/02/2012
- 09:59 PM Bug #2132 (Resolved): FAILED assert(!missing.is_missing(soid))
- Possibly a duplicate of Issue #1191 or Issue #339 (both closed with could not reproduce).
Prior to this assert th... - 09:36 PM Linux kernel client Bug #2099 (Rejected): messenger: unexpected socket state (4)
- OK, this is not a bug. I caused it by inserting this WARN_ON() message
in a case statement in ceph_state_change(). ... - 09:29 PM Linux kernel client Cleanup #2131 (New): ceph: xattr: use the generic kernel xattr code
- The Linux kernel has a generic set of routines to support
extended attributes. When I posted some recent changes
t... - 09:28 PM Linux kernel client Cleanup #2130: ceph: xattr: complete cleanups following review
- Forgot to assign it to myself
- 09:27 PM Linux kernel client Cleanup #2130 (Rejected): ceph: xattr: complete cleanups following review
- As requested by Mark... I have a number of changes to make to
fs/ceph/xattr.c based on my review of that code last ... - 08:12 PM Linux kernel client Bug #2129 (New): ceph: xattr: call __build_xattrs() *before* cap check
- While reviewing a change to the xattr code, Sage noticed that some
calls to __build_xattrs() were being made *after*... - 04:27 PM Bug #2128 (Rejected): filestore: check() fails during sync
- ...
- 03:08 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- ok, i have a theory what's going on. can you try the new wip-2116, and run with debug ms = 20?
thanks! - 10:07 AM Feature #2127 (New): Save kernel core dumps on all of our test machines
- The claim is that there is a netdump module that will UDP-squirt kernel coredumps to a waiting server, which is proba...
- 09:53 AM Bug #2126 (Duplicate): osd: recover_primary did nothing when num_missing==1
- ...
- 09:46 AM Bug #2118 (Resolved): osd: flawed commit_op_seq check on startup
- 08:43 AM Feature #2125 (Resolved): osd: put large xattrs in leveldb
- either when we fear the fs can't handle them, or unconditionally, or something.
- 07:33 AM Feature #1422: libvirt: rbd storage pool
- Made some more progress on this, code seems to be stable.
Working:
* Single and multiple monitors
* Authenticati...
03/01/2012
- 10:00 PM Bug #2103: osd: lockdep error on watch_lock
- must reenable this in qa suite when it's fixed!
- 05:18 PM Bug #2122 (Resolved): objecter: Asserts if authorization fails
- Fixed by commit:cd313885783a5a69a554139b5b41d21a666c815b
- 08:36 AM Bug #2122: objecter: Asserts if authorization fails
- Ah, I had a patch to fix this in the wip-testrados branch. I'll rebase and merge that today. The new asserts in the o...
- 06:45 AM Bug #2122 (Resolved): objecter: Asserts if authorization fails
- While working on the libvirt RBD storage driver I noticed the following crash:...
- 01:46 PM Tasks #2123 (Closed): Ignore this task - I'm checking out the bug report process.
- 09:02 AM Tasks #2123: Ignore this task - I'm checking out the bug report process.
- using "Update" option in tracker
- 09:00 AM Tasks #2123 (Closed): Ignore this task - I'm checking out the bug report process.
- just using the task ticket to walk through the issue lifecycle.
- 11:45 AM Bug #2124 (Resolved): crash when malformed auth key is provided
- We should guard all calls to decode_base64:...
- 09:52 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
- Saw this a couple of times on a client in a small ceph cluster. It seems to be correlated with dd runs using various...
- 08:11 AM Bug #2115 (Rejected): OSD failed to start: Operation not permitted
- 02:13 AM Bug #2115: OSD failed to start: Operation not permitted
- problem resolved. Thank you very much to your hint! I didn't ever think it is caused by communication.
I created a... - 02:48 AM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- I can almost always reproduce it.
I just upgraded my cluster to:
> ceph version 0.42.2-206-gd77c579 (commit:d77c5...
02/29/2012
- 09:22 PM Bug #2022: osd: misdirectect request
- ...
- 09:16 PM Bug #2080: osd: scrub on disk size does not match object info size
- hit this again, ...
- 02:57 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- i'm hoping wip-2116 fixes it...
- 02:31 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- Wido, are you able to reproduce this reliably? I have an idea what the problem is, but have never reproduced this. ...
- 02:17 PM Bug #2002: osd: racy push/pull for clones
- reenabling this in my thrashing tests. if all goes well, i'll reenable in master under the assumption that sam's cle...
- 02:16 PM Bug #1977 (Can't reproduce): mon: ceph command hang
- we can reopen if this ever pops up again
- 01:59 PM Feature #2111 (In Progress): msgr workloads
- What we're looking for here are basic tests like connect, send message, kill connection, send another message; and ve...
- 01:30 PM Messengers Bug #1747 (Resolved): msgr: osd connection originates from wrong port
- commit:b1f264406f93af35600786f58e75908c393cf2ed
- 12:21 PM Messengers Bug #1747: msgr: osd connection originates from wrong port
- wip-1747
- 11:25 AM Messengers Bug #1747: msgr: osd connection originates from wrong port
- just hit this again. osd.1:...
- 12:48 PM rgw Bug #2121 (Resolved): radosgw: reload command for init script
- 09:48 AM rgw Bug #2121: radosgw: reload command for init script
- 09:25 AM rgw Bug #2121 (Resolved): radosgw: reload command for init script
- 12:48 PM Bug #1458 (Resolved): Run ceph suite with valgrind enabled
- 11:13 AM Bug #1975: btrfs: EINVAL on snap create
- see also this thread: http://marc.info/?t=132768583600004&r=1&w=2
- 10:46 AM Bug #1975: btrfs: EINVAL on snap create
- the EINVAL seems to have come from...
- 10:44 AM Bug #1975: btrfs: EINVAL on snap create
- somehow we end up here in btrfs:...
- 10:39 AM Bug #1975: btrfs: EINVAL on snap create
- quick brain dump:
- last time this reproduced i narrowed it down to a case where there were racing rmdirs with the... - 10:55 AM Bug #2115: OSD failed to start: Operation not permitted
- it looks like you may be having trouble authenticating with the monitor. can you reproduce this with 'debug ms = 1'? ...
- 10:28 AM Bug #2031 (Can't reproduce): paxos: failed assert (begin->last_committed == last_committed)
- 10:09 AM Messengers Bug #2086 (Resolved): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- merged!
- 10:06 AM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- Sage suggested I could just add a local dispatch to the shutdown or wait functions to test this properly...I did, and...
- 09:18 AM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- 09:27 AM Bug #1873: crush_rule type is inconsistent
- It's __s16 or int so that a negative value can mean undefined/not specified. I'm inclined to just leave this as is...
- 09:18 AM Bug #2119 (Resolved): osd: do_query to !up osd
02/28/2012
- 06:39 PM Bug #2115: OSD failed to start: Operation not permitted
- See attachment please
- 09:17 AM Bug #2115: OSD failed to start: Operation not permitted
- Can you attach the actual log? I want to make sure there is no subtle difference in the output. Thanks!
- 01:40 AM Bug #2115: OSD failed to start: Operation not permitted
- ceph version 0.42.2 (commit:732f3ec94e39d458230b7728b2a936d431e19322)
- 01:38 AM Bug #2115 (Rejected): OSD failed to start: Operation not permitted
- I'm setting up a new ceph cluster on ubuntu 11.10 with kernel version 3.0.0-16-server x86_64. The osd server failed t...
- 05:57 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- To be clear, I didn't try and generate the actual failure condition that was causing an assert before — that should b...
- 05:55 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- wip-2086 should fix this.
Ran a simple test:... - 05:27 PM Messengers Bug #2086 (In Progress): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- 04:51 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- Okay, looks like the local_pipe doesn't get its message queue cleared...I'm checking the others and looking at how it...
- 04:55 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
- Implementing #1956 would solve this issue, and would make the entire atomic scheme simpler.
- 03:03 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
- This was reported by a user through the ml. We should figure out with that user whether it's a real issue, or a red h...
- 02:51 PM rgw Bug #2120: rgw: atomic write guard doesn't scale well
- Do we care? You can't do partial updates to objects IIRC, so many writers pretty much has to be wrong somehow or other.
- 02:35 PM rgw Bug #2120 (Resolved): rgw: atomic write guard doesn't scale well
- shen there is a large number of writers to the same object.
- 04:48 PM rgw Bug #2106 (Resolved): failed s3tests.functional.test_s3.test_100_continue
- Machines were running wrong apache and fastcgi modules.
- 04:23 PM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- This may be a messenger issue, but it's not losing that initial message — notice how osd5 tries to send a ping back t...
- 11:26 AM Bug #2116: Repeated messages of "heartbeat_check: no heartbeat from"
- the other side of this conversation is...
- 11:20 AM Bug #2116 (In Progress): Repeated messages of "heartbeat_check: no heartbeat from"
- looks like a msgr issue?...
- 07:35 AM Bug #2116 (Resolved): Repeated messages of "heartbeat_check: no heartbeat from"
- As discussed on the ml I gathered some logs.
Today I upgraded my whole cluster to 0.42.2 from 0.41.
Due to the ... - 12:54 PM Bug #1789 (Resolved): mon: failed assert(paxosv == pg_map.version)
- Pushed to master in commit:d10e1f46df8cc252f2f1d57cf5e577ea38eee1ae
- 12:48 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
- Okay, figured it out. Our current slurp code pulls in all the incrementals, then sends off a request for latest_stash...
- 12:01 PM Bug #2119 (Resolved): osd: do_query to !up osd
- ...
- 11:09 AM Bug #2118: osd: flawed commit_op_seq check on startup
- 10:08 AM Bug #2118 (Resolved): osd: flawed commit_op_seq check on startup
- the check that current/commit_op_seq == newest snap is flawed because ceph-osd can write a new current/commit_op-seq ...
- 10:09 AM Bug #2104 (Won't Fix): teuthology: wait_for_clean doesn't wait for last_epoch_started to propagate
- 10:09 AM Bug #2107 (Resolved): teuthology: lost_unfound fails pg state assert
- 09:41 AM devops Feature #2117 (New): qa: gitbuilder that does ENCODE_DUMP
02/27/2012
- 04:20 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- The guards for something like that shouldn't be too complicated to set up...actually, I thought they were at one poin...
- 04:19 PM Bug #1789 (In Progress): mon: failed assert(paxosv == pg_map.version)
- Iiiinteresting. This assert is the post-update check, after loading and running through all the incrementals. (Meanin...
- 01:41 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
- Shouldn't be related — this is a problem with a single monitor daemon and the other is a write problem that an MDS is...
- 12:35 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
- Core dump attached. Dumb thought: could this be related to http://tracker.newdream.net/issues/2110, they happened wit...
- 10:14 AM Bug #1789: mon: failed assert(paxosv == pg_map.version)
- Crash occurred on the third monitor when starting after being down for several hours shortly after cluster creation. ...
- 02:07 PM CephFS Bug #2110 (Duplicate): osdc/Journaler.cc: 360: FAILED assert(r >= 0)
- #1796
- 01:40 PM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
- can you attach ceph-mds too? or better yet, fire up gdb ceph-mds core and print out the value of r from that frame. ...
- 12:00 PM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
- Sage Weil wrote:
> Do you have a core file? I'm curious what the value of 'r' is.
Attached. Probably. (datetime ... - 11:43 AM CephFS Bug #2110: osdc/Journaler.cc: 360: FAILED assert(r >= 0)
- Do you have a core file? I'm curious what the value of 'r' is.
- 11:40 AM CephFS Bug #2110 (Duplicate): osdc/Journaler.cc: 360: FAILED assert(r >= 0)
- Assert in MDS. This cluster was running a CephFS home directory workload with one active MDS and one MDS in standby r...
- 01:49 PM Bug #2045 (Need More Info): osd: dout_lock deadlock
- 01:33 PM Feature #2114 (Resolved): old sepia setup on new hardware
- 01:31 PM Feature #2113 (Resolved): objectcacher perfcounters
- 01:18 PM Feature #2112 (Resolved): msgr fault injection
- 01:18 PM Feature #2111 (Fix Under Review): msgr workloads
- Develop the interfaces which will allow us to break messenger sockets at precisely-defined points.
Allow comparison ... - 11:38 AM Tasks #2109: qa/benchmark: Explore using Filebench for benchmarks / stress testing
- Justification and a good intro: http://cuddletech.com/blog/pivot/entry.php?id=949
- 11:36 AM Tasks #2109 (New): qa/benchmark: Explore using Filebench for benchmarks / stress testing
- http://filebench.sourceforge.net/
"Ships with more than 40 pre-defined personalities, including the one that descr... - 11:05 AM Feature #2108 (New): track object states to inform error injection/testing
- 11:04 AM Feature #1412 (Resolved): qa: spec out messenger testing
- we now have a high-level plan on how to attack msgr testing.
- 10:03 AM Bug #1977: mon: ceph command hang
- Pretty sure you pushed changes the day you filed it (note reference in previous message), although I can't find the e...
- 09:51 AM rgw Bug #2106: failed s3tests.functional.test_s3.test_100_continue
- Strange, I can see the request in the apache logs, but not in the rgw logs....
- 09:12 AM Bug #2107 (Resolved): teuthology: lost_unfound fails pg state assert
- ubuntu@teuthology:/a/nightly_coverage_2012-02-27-a/14063...
02/26/2012
- 08:56 PM Bug #1977: mon: ceph command hang
- Hmm, I wonder if somehow misdiagnosed this, or inadvertantly fixed it: haven't seen this hang in weeks, and it happen...
- 05:09 PM rgw Bug #2106 (Resolved): failed s3tests.functional.test_s3.test_100_continue
- ...
- 05:02 PM Bug #2022: osd: misdirectect request
- ubuntu@teuthology:/a/nightly_coverage_2012-02-26-a/13876$ grep WRN ceph.log
2012-02-26 01:18:03.166529 osd.1 10.3.1... - 11:19 AM Bug #2105 (Resolved): filestore: mkfs does not create initial snap
- This bug almost the same as this bug:http://tracker.newdream.net/issues/1707
I followed the instruction:http://ceph....
02/25/2012
- 09:33 PM Bug #2104 (Won't Fix): teuthology: wait_for_clean doesn't wait for last_epoch_started to propagate
- 09:06 PM Bug #2103 (Resolved): osd: lockdep error on watch_lock
- ...
- 09:04 PM Bug #2102 (Can't reproduce): osd: pg stuck in backfill
- ...
02/24/2012
- 03:30 PM Feature #2054 (Resolved): teuthology: run radosgw through valgrind
- ok, this now works with yaml like...
- 01:52 PM Feature #2006 (Resolved): osd: report what is blocking peering completion
- commit:5c6e8b3795d0cf58814619bfc15cb0841e9a4f17
- 01:51 PM CephFS Bug #1792 (Can't reproduce): crash in ceph-mds
- even if we could, we would never know, since there isn't any distinguishing info here, and the teuth archive is gone.
- 01:48 PM RADOS Bug #2096 (Resolved): crush: adjust weight broken for tree, list buckets
- commit:708be0a5abef63a5da8409ad13719adb7bb744f8
- 01:47 PM RADOS Feature #2101 (Resolved): crushtool: check for weight overflow on reweight
- 11:56 AM Feature #2007 (Resolved): osd: enumerate unfound, lost objects, possible locations
- 09:52 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations
- wip-2007
- 11:34 AM Feature #2030 (Resolved): osd: clean up mark_unfound api
- 10:34 AM Messengers Feature #2100 (Resolved): msgr: Prevent throttled clients from slowing down non-throttled connect...
- Right now, it seems a throttled connection will still receive a TCP receive buffer's worth of data, but because the u...
- 09:15 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
- I don't think any of these other states are necessarily problematic, as long as the socket eventually ends up in CLOS...
- 08:49 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
- This may be related to http://tracker.newdream.net/issues/1803 and http://permalink.gmane.org/gmane.comp.file-systems...
- 08:33 AM Linux kernel client Bug #2099: messenger: unexpected socket state (4)
- Adding that I see more of the same WARNING() messages in the log for
the same state, as well as others for state 5, ... - 08:13 AM Linux kernel client Bug #2099 (Rejected): messenger: unexpected socket state (4)
- Running tests defined by the YAML file below. Note that branch
wip-messenger is 107a8aaf21d01ee6cbc7a638faf1328f2bd... - 07:59 AM CephFS Bug #2092: BUG at fs/ceph/caps.c:999
- mdsc->mutex protects the globalish mds client state (request/session lists), which is different from session->s_mutex...
- 06:57 AM CephFS Bug #2092: BUG at fs/ceph/caps.c:999
- Just a quick look at this.
Here's the code:
static void __queue_cap_release(struct ceph_mds_session *session,
... - 06:10 AM Bug #2091 (Can't reproduce): corrupt v5 inc osdmap
- logs don't go far enough back. :(
moral of the story: next time grab the full mon data dir immediately in case it... - 05:57 AM Linux kernel client Bug #1907 (Resolved): rbd: don't reuse device ids while they're still in use elsewhere
- Committed a couple of weeks ago and has seen no bad effect during the
intervening testing. So I'm marking this one ...
02/23/2012
- 08:07 PM Feature #2030: osd: clean up mark_unfound api
- wip-2030
- 06:52 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- it did. probably a race with another thread in connect() or accept() reregistering a new Pipe.. connect() pbly
- 06:47 PM Messengers Bug #2086: msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- We sure this was run including commit:ebbfdefa120ae93b95780c67027ec9efd4b7b5cd?
- 04:38 PM Feature #2006 (In Progress): osd: report what is blocking peering completion
- wip-pg-query
- 04:07 PM Bug #2098 (Resolved): xfs/ext4 non-idempotent transaction
- Forcing a sync after a non-idempotent transaction is not adequate to ensure correctness during journal replay.
Con... - 03:36 PM Bug #1820 (Resolved): deprecate "ceph stop"
- 02:37 PM Bug #1820: deprecate "ceph stop"
- ok, tested all this in wip-1820. 'deactivate' already moves the ceph-mds to standby (not exit), all good there.
n... - 11:30 AM Bug #1820: deprecate "ceph stop"
- yeah. i think the simplest is to make 'leave' refuse if it's is < max_mds.
and we could drop max mds from the cep... - 11:22 AM Bug #1820: deprecate "ceph stop"
- Oh, I've talked of this before. It might be nice to have a "start ceph-mds only to process a leftover journal and han...
- 11:19 AM Bug #1820: deprecate "ceph stop"
- Changing docs is easy, and the branches already rip out "documented" commands. Let's just make it make sense.
I wo... - 11:04 AM Bug #1820: deprecate "ceph stop"
- It can easily go back into standby (via the respawn() -> execve() path) instead of shutting down. Then it's really "...
- 10:54 AM Bug #1820: deprecate "ceph stop"
- On termination the process exits. On receipt of a stop command it exports authority over the filesystem hierarchy to ...
- 10:52 AM Bug #1820: deprecate "ceph stop"
- Tommi Virtanen wrote:
> Greg, how is "ceph mds stop 0" different from that ceph-mds receiving a local request to ter... - 10:51 AM Bug #1820: deprecate "ceph stop"
- Greg, how is "ceph mds stop 0" different from that ceph-mds receiving a local request to terminate (e.g. SIGTERM)?
- 10:49 AM Bug #1820: deprecate "ceph stop"
- No, the important part is the hierarchy authority export. Then it shuts down; it's not a "go standby". I guess you co...
- 10:48 AM Bug #1820: deprecate "ceph stop"
- Which makes me think, is the concept of "go standby" of any value, if there's something that'll automatically say the...
- 10:44 AM Bug #1820: deprecate "ceph stop"
- It sounds like that does two things: move the MDS from active to standby, and terminate it. And we're removing the "r...
- 10:31 AM Bug #1820: deprecate "ceph stop"
- That one is a bit different.. it's instructing ceph-mds to export all of it's metadata to another node and leave the ...
- 10:11 AM Bug #1820: deprecate "ceph stop"
- Yeah. I can't speak for the threading & locking changes, but the command removal is trivial.
That still leaves
... - 09:51 AM Bug #1820: deprecate "ceph stop"
- wip-stop and wip-2090
- 03:35 PM Bug #2095 (Resolved): osd: need feature bit for v0.42 osdmap encoding change
- commit:ddc99983228e761f754e0038aecbe341d7e2181f
- 09:27 AM Bug #2095: osd: need feature bit for v0.42 osdmap encoding change
- we had a feature bit already, we just needed to conditonally encodee the old format, and tweak MOSDMap to reencode ma...
- 03:16 PM Bug #2094 (Resolved): osd: pgs remapped to down+out osd
- making remapped and clean mutually exclusive. commit:e8bc42ff435e5648b88b818775d8fa47989af5dc
- 10:43 AM Bug #2094: osd: pgs remapped to down+out osd
- Reproduced again with stats flushing. This seems to happen every time with this configuration (maybe having only 2 os...
- 03:14 PM Bug #2091: corrupt v5 inc osdmap
- ok.. yeah, it looks like the monitor may have published a bad inc update or something? unclear. i'll check with the...
- 03:11 PM Bug #2091: corrupt v5 inc osdmap
- OK, picking a few things out of the original corruption report.
The basic header stuff is the same as before, as e... - 02:48 PM Feature #2015 (Resolved): osd: dump in-flight ops via admin socket
- 02:37 PM CephFS Feature #2097 (Rejected): mds: 'ceph mds activate <gid>'
- ability to explicitly instruct a standby mds to join the active cluster.
- 12:04 PM Messengers Bug #1985 (Won't Fix): msgr: creating new Pipe for pre-existing connection leaks Pipe if they don...
- at least until we demonstrate the problem (after the msg leak fix). this will probably be moot after refactoring som...
- 12:01 PM RADOS Bug #2096: crush: adjust weight broken for tree, list buckets
- wip-crush-adjust
- 10:48 AM RADOS Bug #2096 (Resolved): crush: adjust weight broken for tree, list buckets
- ...
- 11:25 AM Bug #2090 (Resolved): mon: assertion failed on shutdown
- commit:963dec82880717054c760a745cf93cc7b43112df
- 09:06 AM Bug #2080 (Resolved): osd: scrub on disk size does not match object info size
02/22/2012
- 10:12 PM Linux kernel client Cleanup #2093: ceph-client: messenger: the "to" parameter to read_partial() needs to go
- I think it's right as is... all of those read calls are non-blocking. So the first time around in_base_pos is 0 and ...
- 05:28 PM Linux kernel client Cleanup #2093 (Resolved): ceph-client: messenger: the "to" parameter to read_partial() needs to go
- I have been doing some refactoring of the net/ceph/messenger.c. One of
my aims was to understand the how (and why) ... - 09:33 PM Bug #2091: corrupt v5 inc osdmap
- the first badness in the log is below. once it missed one incremental, things probably got out of sync and the pg_te...
- 09:28 PM Bug #2091: corrupt v5 inc osdmap
- Oh.. that means the pg_temp mapping was inserted by a previous inc map, probably. we need to find the first instance...
- 06:23 PM Bug #2091: corrupt v5 inc osdmap
- I've manually decoded the entire ceph_osdmap dumped in the log and everything
therein looks fine. (This was overkil... - 01:20 PM Bug #2091: corrupt v5 inc osdmap
- I'm starting to look at this in detail but haven't concluded what went wrong yet.
Does it matter whether it was th... - 09:33 AM Bug #2091: corrupt v5 inc osdmap
- reencoded to old format (using latest ceph-dencoder) gives us...
- 09:28 AM Bug #2091 (Can't reproduce): corrupt v5 inc osdmap
- ...
- 09:20 PM Bug #2090: mon: assertion failed on shutdown
- ...
- 09:20 PM Bug #2090: mon: assertion failed on shutdown
- wip-2090
- 05:04 AM Bug #2090 (Resolved): mon: assertion failed on shutdown
- I was running repeated cycles of the kernel_untar_build.sh workunit
to try to reproduce a problem in the client and ... - 09:17 PM Bug #2095 (Resolved): osd: need feature bit for v0.42 osdmap encoding change
- 07:02 PM Bug #2094 (Resolved): osd: pgs remapped to down+out osd
- This is why the dump_stuck test fails on master. When one osd is marked out, the pg is remapped incorrectly:...
- 10:06 AM Feature #2005 (Resolved): mon: track timestamps on pg states
- 10:06 AM Feature #2058 (Resolved): ceph: query pg state
- 10:03 AM Feature #2054: teuthology: run radosgw through valgrind
- wip-valgrind
- 09:45 AM CephFS Bug #2092 (Can't reproduce): BUG at fs/ceph/caps.c:999
- ...
- 09:36 AM Bug #2022: osd: misdirectect request
- hit this again:...
02/21/2012
- 04:58 PM rgw Cleanup #2089 (Resolved): rgw: less dout(0) noise?
- i think that's hwere this si coming from:...
- 03:32 PM Feature #1932 (Resolved): mon: before accepting a new crushmap, monitor should validate and test ...
- 03:31 PM Feature #2088 (Rejected): msgr: refactor 2 threads to one
- 03:30 PM Feature #1412 (New): qa: spec out messenger testing
- 03:29 PM Feature #1412: qa: spec out messenger testing
- er, wrong bug!
- 12:22 PM rgw Bug #2083 (Resolved): rgw: test_object_raw_authenticated* fail (on xfs?)
- Should be fixed now. Updated relevant teuthology tests to use only url safe chars. Also updated rgw-admin to disallow...
- 10:34 AM rgw Bug #2083: rgw: test_object_raw_authenticated* fail (on xfs?)
- Not really related to xfs. The problem is that when generating authenticated urls, boto doesn't escape the access key...
- 10:55 AM Feature #2087 (Resolved): lightweight filestore workload generator
- simple program that uses FileStore and generates something that looks vaguely like what an OSD does. e.g.,
- stre... - 09:13 AM Bug #2084: segfault in tcmalloc
- and again (hammer b.yaml). right before the crash sched_scrub() was called......
02/20/2012
- 04:19 PM Messengers Bug #2086 (Resolved): msgr: msg/SimpleMessenger.h: 203: FAILED assert(!i->second->is_on_list())
- ...
- 02:54 PM Linux kernel client Cleanup #2085 (New): kclient: improve mtime update in page_mkwrite
- this should be done in the various helpers we call when we successfully mark a page dirty, not in the outer function.
- 02:29 PM Bug #1765 (Resolved): osd: 'call' op can return data even if op is modifying
- commit:afc1748db52911295708e4afbe7fd7884c97dbbf
- 02:27 PM Bug #1821 (Resolved): librados: rados_create_with_context is unusable
- we could still add refcounting to the CephContext later.
- 02:24 PM Bug #2084 (Can't reproduce): segfault in tcmalloc
- heap corruption?...
- 01:52 PM Linux kernel client Bug #2081: msgr: spinlock badness?
- ubuntu@teuthology:/a/nightly_coverage_2012-02-20-b/12984 with same trace on the console.
- 01:10 PM Bug #2080: osd: scrub on disk size does not match object info size
- 08:48 AM Bug #2080: osd: scrub on disk size does not match object info size
- reproduced with log. metropolis:~sage/bug-2080
- 06:20 AM Bug #2080: osd: scrub on disk size does not match object info size
- ubuntu@teuthology:/a/master-2012-02-19_19:50:05/12884
- 08:31 AM Cleanup #2021 (Resolved): fix signal handlers
- 06:29 AM rgw Bug #2083 (Resolved): rgw: test_object_raw_authenticated* fail (on xfs?)
- This fails sometimes, but not always. It seems to happen more often on xfs, but maybe that's my imagination....
02/19/2012
- 03:52 PM Bug #2082 (Resolved): osd: broken queuing during replay
- ...
- 03:49 PM Bug #1638 (Won't Fix): Can't create object with large xattrs in a single operation (on extN)
- 03:48 PM CephFS Bug #2018 (Resolved): mds: can't change file_max
- oh, i fixed this a week or two ago. the problem was that the file isn't open read/write, but Client was still trying ...
- 03:46 PM Bug #2032 (Resolved): paxos: somehow didn't update stash alongside new states
- 03:45 PM Bug #2044 (Resolved): osd: pg stuck in active+backfill
- 03:45 PM Feature #1412 (Can't reproduce): qa: spec out messenger testing
- this code has been refactored a bit.
the messenger tests won't directly trigger this, though we may the/an under... - 03:45 PM Bug #1631 (Can't reproduce): osd: failed assert(repop_queue.front() == repop)
- this code has been refactored a bit.
the messenger tests won't directly trigger this, though we may the/an under... - 03:40 PM Feature #1932: mon: before accepting a new crushmap, monitor should validate and test some inputs
- wip-crush
- 02:51 PM Bug #2080: osd: scrub on disk size does not match object info size
- ...
02/18/2012
- 11:13 PM Linux kernel client Bug #2081 (Can't reproduce): msgr: spinlock badness?
- captured this console fragment from a crashed qa run...
- 10:57 PM Bug #2070 (Duplicate): osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- ok i didn't observe this crash and trace it back, but i'm almost certain it's the same as #2075.
commit:344c202203... - 01:54 PM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- ubuntu@teuthology:/a/nightly_coverage_2012-02-18-a/12494
- 10:56 PM Bug #2075 (Resolved): osd: recover_got assert
- commit:344c20220345197c03fbaf46e2c1289d81a0a14f
- 02:01 PM Bug #2075: osd: recover_got assert
- ubuntu@teuthology:/a/nightly_coverage_2012-02-18-a/12489...
- 10:01 PM Feature #2074 (Rejected): teuthology: remove old kernel packages
- i did this manually on sepia. new teuth will reimage regularly.
- 09:24 PM Messengers Bug #2073 (Resolved): msgr: shutdown can hang
- this appears to be fixed with commit:787dd1709797876dd9fa6004c6723df859003b59, unless there is some subtle difference...
- 03:51 PM Feature #2034 (Resolved): osd: refactor push code
- 03:50 PM Feature #2058: ceph: query pg state
- wip-pg-query
- 02:15 PM Bug #2061 (Resolved): osd: scrub mismatch
- pretty sure this was fixed by the recover refactor.. haven't hit it since then.
- 01:48 PM Bug #2080 (Resolved): osd: scrub on disk size does not match object info size
- ...
02/17/2012
- 04:26 PM Bug #1975: btrfs: EINVAL on snap create
- We aren't triggering this any more, now that the filestore transaction bug is fixed.
- 03:13 PM Bug #2061: osd: scrub mismatch
- oooooh, these went away and i was confused. but hten i just ran the regression suite against next and hit them again...
- 01:22 PM Bug #2068 (Resolved): osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.hist...
- 12:46 PM Bug #2079 (Duplicate): rbd: creating a snapshot with the same name doesn't return an error
- ...
- 12:37 PM Cleanup #2078 (Resolved): ceph tool: only output response data to stdout
- By default, "ceph osd getmap" or any other command that fetches binary data outputs it to stdout. However, other info...
- 10:32 AM Bug #2077 (Resolved): mon: assert in Paxos::is_consistent
- we don't need a stash for v == 1. make is_consistent() check match slurp() logic. commit:db41bdda7e02aedc42d14be635...
- 09:41 AM Bug #2077 (Resolved): mon: assert in Paxos::is_consistent
- I tripped across a bug when adding a new monitor into an existing cluster
(see attached). I was on GIT commit
4b3bb... - 09:36 AM Bug #2076 (Resolved): ceph fails to build with gcc 4.7
- commit:d913e5e670282c19a35c6cb420fc1d711c388cc4
- 09:30 AM Bug #2076: ceph fails to build with gcc 4.7
- That is indeed fine.
Thanks! - 09:25 AM Bug #2076: ceph fails to build with gcc 4.7
- Committing these, with both of your signed-off-by's.. I assume that's okay?
- 08:13 AM Bug #2076 (Resolved): ceph fails to build with gcc 4.7
- Fedora has moved to gcc 4.7 for the upcoming Fedora 17 release[1].
Currently Ceph fails to build with gcc 4.7.
...
02/16/2012
- 08:55 PM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- ubuntu@teuthology:/a/nightly_coverage_2012-02-16-b/12294
- 11:32 AM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- if i had to guess this is related to the pg init() refactor. not much to be found from the core, except that pg->sta...
- 09:39 AM Bug #2070: osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- also hit this on ubuntu@teuthology:/a/nightly_coverage_2012-02-15-b/12169
- 09:36 AM Bug #2070 (Duplicate): osd/ReplicatedPG.cc: 3627: FAILED assert(is_active())
- ubuntu@teuthology:/a/nightly_coverage_2012-02-15-b/12164...
- 08:44 PM Bug #2075 (Resolved): osd: recover_got assert
- ...
- 08:37 PM Messengers Bug #2073: msgr: shutdown can hang
- here's the bt:...
- 04:15 PM Messengers Bug #2073 (Resolved): msgr: shutdown can hang
- saw this...
- 04:36 PM Feature #2074 (Rejected): teuthology: remove old kernel packages
- sepia disks are filling up from all the old kernel packages (/lib/modules/$version is 1.3 GB each)
- 04:10 PM rgw Bug #2072 (Resolved): rgw: owner cannot change acl if it doesn't have bucket read permission
- rgw_op.cc:read_acls() tests for read permission, this is wrong.
- 03:11 PM CephFS Bug #2071: kclient: pjd mkfifo failures
- ubuntu@teuthology:/a/nightly_coverage_2012-02-16-b/12255
- 03:11 PM CephFS Bug #2071 (Can't reproduce): kclient: pjd mkfifo failures
- ...
02/15/2012
- 03:28 PM Linux kernel client Bug #2069 (Can't reproduce): client crash during kernel_untar_build rm -r step
- this keeps happening:...
- 03:24 PM Bug #2022: osd: misdirectect request
- weird, saw this twice a few days (maybe 18 runs apart), but wasn't able to reproduce after several hundred iterations...
- 03:20 PM Bug #2033 (Closed): osd: segfault in OSD::update_heartbeat_peers()
- I'm not totally sure how this happened, but the new heartbeat locking should avoid it..
- 03:18 PM Cleanup #2049 (Resolved): osd: improve heartbeat peer locking
- 03:18 PM Bug #2060 (Resolved): osd: lone osd is not marked degraded with replication level 2
- 02:11 PM Bug #2056 (Resolved): osd: unfound object during backfill qa test
- fixed in backfill task.. it was killing a second osd before waiting for things to peer/recover from the first failure.
- 12:01 PM Bug #2068: osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.history.same_in...
- Oh, i see the problem.. the osdmap ref is taken by lock().. this pg hasn't seen the new map yet.
just need to tag... - 11:49 AM Bug #2068: osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.history.same_in...
- looking at the core file.
- we are primary
- replica is sending us an info message, with one record. it is therefo... - 09:21 AM Bug #2068 (Resolved): osd: FAILED assert(infoevt.info.history.last_epoch_started >= pg->info.hist...
- ...
02/14/2012
- 10:50 PM Bug #1765 (In Progress): osd: 'call' op can return data even if op is modifying
- the c++ librados api now separates these operations. osd now refuses to return any result data payload if op is mark...
- 09:36 PM Cleanup #2021 (In Progress): fix signal handlers
- 09:30 PM CephFS Bug #1991 (Duplicate): mds: crash during clean shutdown
- see #1549.. we are racing with exit(0) from the SIGTERM handler
- 09:28 PM Bug #2032: paxos: somehow didn't update stash alongside new states
- Can we close this one?
- 09:27 PM Bug #2037 (Resolved): mon: a crash in the middle of slurping is unrecoverable
- 05:06 PM Bug #2067 (Resolved): librados: we leak CephContext from rados_create()
- 05:05 PM rgw Feature #2066 (Resolved): rgw: make list_objects efficient
- 03:14 PM Feature #1772: rbd: define new on-disk header format
- The other point that came up was, if rbd can't delete the parent volume until all children have been deleted, the gla...
- 03:13 PM Feature #1772: rbd: define new on-disk header format
- Being a little bit more explicit: the point of the UUIDs is to allow child images to add themselves to the parent's l...
- 01:07 PM Feature #1772: rbd: define new on-disk header format
- To get around the issue of a child image needing to update the parent image's header, Sage suggested only allowing ac...
- 02:07 PM Bug #1821: librados: rados_create_with_context is unusable
- see wip-1821
- 01:06 PM Feature #988: librbd: trivial layering
- To get around the issue of a child image needing to update the parent image's header, Sage suggested using only allow...
- 12:45 PM CephFS Feature #2065: teuthology: specify mount options for kclient task
- take a dict instead of list, and let you specify mount options on a per-client basis.
- 12:45 PM CephFS Feature #2065 (Closed): teuthology: specify mount options for kclient task
- 11:55 AM Linux kernel client Bug #2064 (Resolved): ceph-client: messenger: nocrc flag not implemented correctly
- The "nocrc" option is supposed to disable CRC32 calculation on messages
sent between ceph entities. The default is ... - 11:55 AM rgw Bug #2063 (Resolved): rgw: access key shouldn't contain chars that need to be url encoded
- We see some issues in our tests that when generating signed url these chars aren't being encoded. We should try to av...
- 11:50 AM Bug #2062 (Resolved): filestore: idempotent test failed
- the test was broken. triggered by filestore now noticing clone could fail.
commit:7b1c144f21c3ccfe2dfd4342e3d5461... - 11:36 AM Bug #2062 (Resolved): filestore: idempotent test failed
- ...
- 09:38 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
- in my case, this looks like #2045.
- 07:59 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
- I just hit this in qa, ubuntu@teuthology:/var/lib/teuthworker/archive/nightly_coverage_2012-02-14-a/11871.
- 09:37 AM Bug #2061 (Resolved): osd: scrub mismatch
- New one, "[ERR] 0.c scrub stat mismatch, got 6/6 objects, 2/5 clones, 13511948/13511948 bytes."
Workload was
<pre... - 09:31 AM Bug #2045: osd: dout_lock deadlock
- again, although this time there is a write that looks blocked somehow...
02/13/2012
- 04:33 PM Feature #2028 (Resolved): qa: allocate disks to btrfs on new hardware
- 01:23 PM Feature #2028 (In Progress): qa: allocate disks to btrfs on new hardware
- 03:47 PM Bug #2060 (Resolved): osd: lone osd is not marked degraded with replication level 2
- With only one osd in, 'ceph -s' and 'ceph health' should report that the cluster has degraded objects.
- 02:54 PM Feature #1836 (Resolved): filejournal: use async directio to write to the journal
- 02:50 PM rgw Feature #773 (Resolved): rgw: efficient list-objects filtering
- That was fixed when we introduced the bucket index.
- 02:40 PM rgw Bug #2048 (Resolved): rgw: multipart upload listing return key starting with _multipart_
- It seems that this has already been resolved, most likely by the fix for #2025.
- 01:14 PM Feature #2058 (Resolved): ceph: query pg state
- 01:12 PM Feature #2005 (In Progress): mon: track timestamps on pg states
- 01:11 PM Feature #2005 (Resolved): mon: track timestamps on pg states
- 01:06 PM Feature #1962 (In Progress): ferro: Trigger vMedia boot via IPMI/DRAC
- 01:06 PM Feature #1571 (In Progress): osd: non-trivial map object
- 12:11 PM rgw Cleanup #2036 (Resolved): rgw: bucket index tree contains the same info 3 times
- Ok, as of commit:9065dbd36d35b6e44c66293e74b6ba92031ca9ae it's only appears twice. Removing another copy of the objec...
- 09:37 AM Bug #2056 (Resolved): osd: unfound object during backfill qa test
- ubuntu@teuthology:/a/nightly_coverage_2012-02-13-a/11793
it happened a couple days earlier, too.
02/12/2012
- 04:16 PM Bug #1759 (Resolved): mds/client: truncate size overflow, fails with EINVAL
- this is a problem with weird truncate_seq/size values in requests, that the osd is now cleaning up.
commit:0ded7e4da... - 04:15 PM Bug #1688 (Closed): Benjamin: pg stuck in scrub
- 02:29 PM Bug #2022: osd: misdirectect request
- saw this again on rados_api_tests:...
02/11/2012
- 10:43 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
- commit:508be8e3b3b47b71035d07d26dead49b3b91463d hopefully fixes the issue. Also reverted previous fix.
- 09:42 PM rgw Bug #2043 (In Progress): rgw: cannot use '+' in url
- It's still broken. Certain clients use '+' as a space. I think that the apache rewrite rule makes things inconsistent.
- 10:32 PM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
- I guess a btrfs one. Right now I'm running a couple of virtual machines without any issues, so for now we can leave t...
- 09:57 AM Bug #2026: osd: ceph::HeartbeatMap::check_touch_file
- This looks like a btrfs or kernel issue to me. Have you seen it since?
- 03:08 PM Bug #1943 (Duplicate): osd: bad clone transaction on journal replay
- 03:07 PM Bug #1949 (Resolved): osd: ENOTEMPTY on collection removal from snaptrimmer
- fixed by commit:7c6dff487171deb37852e2fb059dcb6e3af65702
- 03:05 PM Feature #2038 (Rejected): mon: can't currently do commands/get status when not in quorum
- I think it's fine to fail (or actually, block/wait on) authentication if we are out of quorum. The client will retry...
- 02:22 PM Cleanup #2023 (Resolved): btrfs: Use btrfs device scan instead of btrfsctl -a
- commit:a414fd51c7c5ae5dbe9e3af7db6f17741a58c1a7
- 10:23 AM Bug #1758 (Can't reproduce): OSD segfault in SimpleMessenger::send_message
- Haven't seen this one in ages, either. Going to assume it's been fixed.
- 10:22 AM Bug #1992 (Can't reproduce): OSD::get_or_create_pg
- Hmm, we haven't been able to trigger this with our thrashing.
- 10:21 AM Bug #1493 (Resolved): cmon: nice error message on undecodable (osdmap, monmap) input
- commit:7eff37be494714febed4e6724237c03722b4e8c5
- 10:07 AM Feature #2055 (Duplicate): osd: fix up push cloning
02/10/2012
- 09:17 PM RADOS Bug #1738 (Duplicate): bad crushmap behavior
- 09:17 PM RADOS Bug #2047 (Duplicate): crush: with a rack->host->device hierarchy, several down devices are likel...
- 05:30 PM Bug #1949: osd: ENOTEMPTY on collection removal from snaptrimmer
- 01:22 PM Bug #1949: osd: ENOTEMPTY on collection removal from snaptrimmer
- another log, with filestore debugging, and the contents of the fs. There was...
- 05:06 PM rgw Bug #2051 (Resolved): rgw: can't use '%' in object name
- Fixed, commit:7e32a3d4bc90d84970754350414c553e7ca01299.
- 02:48 PM rgw Bug #2051 (Resolved): rgw: can't use '%' in object name
- 04:34 PM Feature #2055 (Duplicate): osd: fix up push cloning
- 04:32 PM Feature #2054 (Resolved): teuthology: run radosgw through valgrind
- 04:13 PM Feature #2053 (Rejected): librados: caching
- 04:12 PM Feature #2052 (Resolved): librbd: caching
- 03:30 PM rgw Cleanup #2036: rgw: bucket index tree contains the same info 3 times
- the reason it is kept 3 times is that we index it by the bucket name, have the bucket name as one of the fields in th...
- 03:20 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
- Fixed, commit:a6d7629c177fbab722a7a0c7f861caf91ff92deb.
- 03:19 PM Bug #2050 (Resolved): rgw: crash at Objecter::_linger_commit()
- Fixed, commit:0de1d5502b0d9ab0f0809947a0664586d7754a08.
- 02:27 PM Bug #2050: rgw: crash at Objecter::_linger_commit()
- We think that what happens is this:
librados::linger()
->ack response
unregister_watcher()
->commit response
... - 02:26 PM Bug #2050 (Resolved): rgw: crash at Objecter::_linger_commit()
- ubuntu@teuthology:/a/nightly_coverage_2012-02-09-a/11236$ cat ./remote/ubuntu@sepia72.ceph.dreamhost.com/log/rgw.stdo...
- 01:58 PM Cleanup #2049: osd: improve heartbeat peer locking
- need to move heartbeat peer stuff out from under osd_lock to facilitate pushing pg peering crap into the worker threads
- 01:57 PM Cleanup #2049 (Resolved): osd: improve heartbeat peer locking
02/09/2012
- 09:41 PM Bug #1974: osd: radosmodel crash on thrashing
- commit:359dfb9966d15d997f9e0351a5ed8de1faae62fe
- 09:41 PM Bug #1974 (Resolved): osd: radosmodel crash on thrashing
- 09:20 PM Bug #1975: btrfs: EINVAL on snap create
- I'm pretty sure this was triggered by #2046. There is still a btrfs bug, but we were doing the wrong thing if rmdir ...
- 09:18 PM Bug #2013 (Resolved): osd: messages for pgs we don't store are never freed
- 04:38 PM Bug #2046 (Resolved): filestore: do_op running during commit
- commit:1009d1a016f049e19ad729a0c00a354a3956caf7 and commit:93d7ef96316f30d3d7caefe07a5a747ce883ca2d
- 04:02 PM Bug #2046: filestore: do_op running during commit
- this was broken by commit:259c509a8941bf7cdad8bd4ede0ccd73ca8a83d3, way back in v0.25! Sigh. The wait condition for...
- 10:05 AM Bug #2046 (Resolved): filestore: do_op running during commit
- commit_start() is supposed to quiesce writes, but I see...
- 04:24 PM Bug #2044: osd: pg stuck in active+backfill
- This should be fixed by commit:f0334673ab8547807b961aae19a8e53531585e3f.
- 10:55 AM rgw Bug #2048 (Resolved): rgw: multipart upload listing return key starting with _multipart_
- reported by jdwilson over irc.
- 10:41 AM RADOS Bug #2047 (Resolved): crush: with a rack->host->device hierarchy, several down devices are likely...
- See http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/5166
Sage says the cause is down devices only tr... - 10:02 AM Bug #2045: osd: dout_lock deadlock
- ubuntu@teuthology:/a/nightly_coverage_2012-02-09-a/11210
metropolis:~sage/bug-2045 - 09:56 AM Bug #2045 (Can't reproduce): osd: dout_lock deadlock
- a thread is blocked on dout_lock, can't tell who.
02/08/2012
- 09:30 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- Hmm.. yeah, I don't think we have anything beyond these console dumps. And we don't capture any kind of kernel core ...
- 09:17 PM Linux kernel client Bug #1793: NULL pointer dereference at try_write+0x627/0x1060
- Is there a core file for this problem anywhere?
It would really be nice to poke around in the message, or the
con... - 07:45 PM Bug #2044 (Resolved): osd: pg stuck in active+backfill
- jmlowe ran into this on his cluster several times. The primary doing backfill failed to requeue the pg for recovery.
... - 04:54 PM rgw Bug #2043 (Resolved): rgw: cannot use '+' in url
- Either in signed urls (e.g., as part of the uid), or in object names. Reason is that url_decode removes it. Relax url...
- 04:45 PM Bug #2042: mon: crash in LogMonitor::update_from_paxos
- ubuntu@teuthology:/a/nightly_coverage_2012-02-08-b/11127
- 04:45 PM Bug #2042: mon: crash in LogMonitor::update_from_paxos
- core + binary + tarball are at metropolis:~sage/bug-2042
- 04:43 PM Bug #2042 (Duplicate): mon: crash in LogMonitor::update_from_paxos
- ...
- 02:26 PM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
- 02:23 PM Linux kernel client Bug #1907: rbd: don't reuse device ids while they're still in use elsewhere
- After a few weeks of wandering around the code, figuring out how
things work and refactoring and fixing things as I ... - 01:17 PM Cleanup #2041 (Resolved): osd: move peering into worker threads
- 10:52 AM Bug #1974: osd: radosmodel crash on thrashing
- Just hit this:
- clean_up_local removed an object (due it a 'delete' log entry)
- a read came in and read it befo... - 06:41 AM rgw Feature #2040 (Resolved): rgw: disable rgw log through ceph.conf
- Currently the way to do it is through the apache conf.
02/07/2012
- 10:56 PM Bug #2013 (In Progress): osd: messages for pgs we don't store are never freed
- see wip-pg-waiters?
- 10:46 PM CephFS Bug #1996 (Duplicate): mds: scatter_nudge() bad pointer on shutdown?
- this is the signal handler thing
- 10:45 PM Bug #1901 (Resolved): Missing files in ceph packages results in build failure of tests
- 10:43 PM rgw Bug #1721 (Can't reproduce): rgw: spurious multipart-upload failures
- 10:41 PM Bug #1626 (Can't reproduce): ceph-mon HA not working right; all must be up
- 10:37 PM CephFS Bug #1902 (Won't Fix): mds: unittest_interval_tree bad memory access
- 10:37 PM Bug #1659 (Can't reproduce): Upgrade from 0.27 -> 0.37 going wrong, OSDs miss map updates
- 10:35 PM Bug #1564 (Won't Fix): osd: osd should not be primary before data is replicated
- no more backlogs, so this problem is mostly moot. it can sort of still happen (to a vastly decreased degree), but it...
- 10:33 PM Bug #1529 (Can't reproduce): cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone sug...
- 10:31 PM Bug #1797 (Resolved): configure doesn't link to pthread on Fedora 14 on linking librados-config
- I'm going to assume that using the automake pthread macros fix this (commit:c5144eed4eadf5cfaa0a41c0ced2a1cd3462289f)...
- 10:30 PM Cleanup #1899 (Resolved): use acx_pthread instead of hardcoding libs and cflags into build system
- applied this a while back, commit:c5144eed4eadf5cfaa0a41c0ced2a1cd3462289f
- 10:29 PM rgw Feature #2039 (Rejected): rgw: keep more than one bucket marker object
- We generate a unique bucket index id by leveraging the pg version returned on a write operation to a special bucket m...
- 10:28 PM CephFS Bug #1827 (Resolved): libceph: hang on creating a file
- finally looked at this. the problem is just that open wasn't passed O_WRONLY or O_RDWR, and ceph_write() wasn't retu...
- 10:20 PM Feature #2038 (Rejected): mon: can't currently do commands/get status when not in quorum
- For obvious reasons, the MonClient has to authenticate with a monitor before talking to it. Right now this is accompl...
- 10:05 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
- Made a new bug for that issue anyway. #2037
- 04:50 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
- oh, that was meant for #2032!
- 04:49 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
- I think that could happen, so I'll check and fix it if so, but it's not what happened here.
- 04:44 PM Bug #2031: paxos: failed assert (begin->last_committed == last_committed)
- oh.. maybe it was slurping, and crashed before it stashed. when it restarted it didn't go back into slurp, because t...
- 03:03 PM Bug #2031 (Can't reproduce): paxos: failed assert (begin->last_committed == last_committed)
- ...
- 10:04 PM Bug #2037 (Resolved): mon: a crash in the middle of slurping is unrecoverable
- If a monitor comes up and starts slurping, it will start adding incremental maps to its store and update [first|last]...
- 09:39 PM Bug #1547 (Resolved): client log doesn't go to stderr unless 'log file' specified
- fixed this a few releases back
- 09:38 PM Bug #1688: Benjamin: pg stuck in scrub
- is this old/fixed? haven't seen it in a while
- 09:03 PM Feature #2024 (Resolved): make gitbuilders time out when github is sucking
- 04:04 PM rgw Cleanup #2036 (Resolved): rgw: bucket index tree contains the same info 3 times
- This is apparent by running strings on the index objects. We should be able to reduce the excessive information (whic...
- 04:01 PM rgw Bug #2035 (Resolved): rgw: bucket removal fails
- bucket removal sometimes either return 'access denied' or 'bucket not empty'
- 03:45 PM Bug #2033: osd: segfault in OSD::update_heartbeat_peers()
- ...
- 03:32 PM Bug #2033 (Closed): osd: segfault in OSD::update_heartbeat_peers()
- just hit this twice, on two different clusters, both under testrados workloads....
- 03:32 PM Feature #2034 (Resolved): osd: refactor push code
- 03:07 PM Bug #2032 (Resolved): paxos: somehow didn't update stash alongside new states
- lxo reported that on one monitor, after seeing #2031 and bringing the monitor back up (much later), the monitor faile...
- 02:28 PM Bug #1909 (Resolved): Two mons crash after starting the third one
- this really looks like the bug fixed in commit:bfbeae68c045de76ede86ca4f72d2a760a19c84b... the sender sent a message ...
- 02:19 PM Bug #1789: mon: failed assert(paxosv == pg_map.version)
- We only saw this the once, but we believe the bug and want to keep it open.
- 02:18 PM Messengers Bug #1747: msgr: osd connection originates from wrong port
- We only saw this the once, but we believe the bug and want to keep it open.
- 02:14 PM Bug #1631: osd: failed assert(repop_queue.front() == repop)
- We haven't seen this, but hope that the messenger tests now being designed will flush it out again.
- 02:11 PM CephFS Bug #1947 (Duplicate): mds: SIGBUS during _mark_dirty
- #1549
- 02:06 PM RADOS Feature #1639: osd: guard against bad objects in cls map functions
- the specific instance was fixed. can we in general catch any exception in the class methods? safely?
- 02:02 PM Bug #1530 (Can't reproduce): osd crash during build_inc_scrub_map
- 11:28 AM Feature #2030: osd: clean up mark_unfound api
ceph pg 1.2 mark_unfound_revert foo
NOT ceph tell osd.12 mark_unfound revert pgid objectname
- 11:27 AM Feature #2030 (Resolved): osd: clean up mark_unfound api
- 11:27 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations
ceph pg 1.2 list_missing|list_unfound
- list of missing objects, lcoators, and known locations (if !unfound)
- 11:19 AM Feature #2007: osd: enumerate unfound, lost objects, possible locations
- PGLS_MISSING
(new pg op) using rados - 11:27 AM Feature #2006: osd: report what is blocking peering completion
ceph pg 1.2 status|query
- peering status
- recovery status
- another interseting status
- 11:11 AM Feature #2006: osd: report what is blocking peering completion
- ceph ...
ceph tell <who> ....
ceph pg query 1.2
map pg, query osd directly with
['pg', 'query', '1.2']
- 11:07 AM Feature #2005: mon: track timestamps on pg states
- query list of stale/unpeered/whatever pgs
ceph pg dump_stuck [--format=json|plain]
- 10:06 AM Bug #1974: osd: radosmodel crash on thrashing
- Summary: An object was deleted, but after a recovery was found to be back ... which is almost surely indicative of a ...
02/06/2012
- 05:48 PM Bug #1975: btrfs: EINVAL on snap create
- RATIONALE:
We seem to be able to make this happen, and believe it to be a btrfs bug.
We are not calling it u... - 05:44 PM Feature #1932: mon: before accepting a new crushmap, monitor should validate and test some inputs
- Users can create their own rules, so bad rules will happen, and we must do a better job of making the Monitors robust...
- 04:22 PM rgw Bug #2025 (Resolved): rgw: objects starting with underscore are badly listed
- Fixed, commit:c23d217c93bb6ed21c1b07e347710e18446a3abc.
- 04:22 PM rgw Bug #2029 (Resolved): rgw: space in object name is turned into a different character
- Fixed, commit:6df25e53abe37b19b38e5657dbf3b4c37f03d8e3.
- 02:37 PM rgw Bug #2029 (Resolved): rgw: space in object name is turned into a different character
- looks like we fail to use the url-decoded object name.
- 02:11 PM Feature #2028 (Resolved): qa: allocate disks to btrfs on new hardware
- root isn't consistently on /dev/sda, it seems. or on a consistent /dev/disk/by-path on the plana nodes.
- 02:04 PM Feature #1970 (Resolved): osd: migrate to new encoding schemes
- this is all done, but unmerged; it'll get pulling into a release with a bunch of other encoding updates.
- 10:52 AM rgw Bug #2027 (Can't reproduce): rgw -> apache miscommunication
- There were some mystery failures, where we've seen rgw getting requests from apache, processing them, sending respons...
- 10:10 AM Bug #1973 (Can't reproduce): osd: segfault in ReplicatedPG::remove_object_with_snap_hardlinks
- let's chalk this up to the bad object_info_t
- 10:06 AM Bug #1984 (Can't reproduce): osd: failed assert, got into finish_recovery_ops without any recover...
- 10:00 AM Bug #1490 (Resolved): cfuse assert failure: assert(ob->last_commit_tid < tid)
- 03:36 AM Bug #2026 (Can't reproduce): osd: ceph::HeartbeatMap::check_touch_file
- After my data loss due to a btrfs bug I re-installed my whole cluster with 0.41 and kernel 3.2 (ceph-client with btrf...
02/05/2012
- 08:54 PM Bug #1975: btrfs: EINVAL on snap create
- ...
- 07:11 PM rgw Bug #2025 (Resolved): rgw: objects starting with underscore are badly listed
02/04/2012
- 06:02 PM Feature #2024 (Resolved): make gitbuilders time out when github is sucking
- 03:32 PM CephFS Bug #1945: blogbench hang on caps
- ubuntu@teuthology:/a/nightly_coverage_2012-02-04-a/10600
- 12:01 PM Cleanup #2023 (Resolved): btrfs: Use btrfs device scan instead of btrfsctl -a
- I justed upgraded my btrfs userland tools and saw:...
02/03/2012
- 11:00 PM Bug #2022 (Resolved): osd: misdirectect request
- from rados_api_tests.yaml:
[WRN] client.4292 10.3.14.128:0/3016298 misdirected client.4292.0:4 0.0 to osd.1 not [0,1... - 10:48 AM Cleanup #2021 (Resolved): fix signal handlers
- 10:45 AM Feature #2008 (Resolved): mon: include full/nearfull in health check
- 10:31 AM Feature #2004 (Resolved): qa: make deb gitbuilder faster
- 10:11 AM Feature #2020 (Duplicate): collectd: submit plugin upstream
Also available in: Atom