Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-06-12T16:33:02ZCeph
Redmine RADOS - Bug #20256 (Resolved): "ceph osd df" is broken; asserts out on Luminous-enabled clustershttps://tracker.ceph.com/issues/202562017-06-12T16:33:02ZGreg Farnumgfarnum@redhat.com
<p>I got a private email report:</p>
<p>When do ‘ceph osd df’, ceph-mon always crush. The stack info as following:<br /><pre>0> 2017-06-08 04:56:51.647510 7f91b9972700 -1 *** Caught signal (Aborted) **
in thread 7f91b9972700 thread_name:ms_dispatch
ceph version 12.0.2-2454-g853ae30 (853ae30b1560fe23274c01003c9aa8161638978b) luminous (dev)
1: (()+0x7f8bf2) [0x5570b35edbf2]
2: (()+0x115c0) [0x7f91c2a2d5c0]
3: (gsignal()+0x9f) [0x7f91bfa3f91f]
4: (abort()+0x16a) [0x7f91bfa4151a]
5: (()+0x4475b9) [0x5570b323c5b9]
6: (OSDMonitor::print_utilization(std::ostream&, ceph::Formatter*, bool) const+0x1760) [0x5570b317e320]
7: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0xaa8) [0x5570b31b4138]
8: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2c0) [0x5570b31bbc10]
9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x7e8) [0x5570b31678d8]
10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x205b) [0x5570b311ef7b]
11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x956) [0x5570b3124bd6]
12: (Monitor::_ms_dispatch(Message*)+0x5d3) [0x5570b3125b43]
13: (Monitor::ms_dispatch(Message*)+0x23) [0x5570b314ee63]
14: (DispatchQueue::entry()+0xeca) [0x5570b3593cda]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5570b33f360d]
16: (()+0x76ca) [0x7f91c2a236ca]
17: (clone()+0x5f) [0x7f91bfb11f7f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre></p>
<p>W/ gdb:<br /><pre>Thread 10 (Thread 0x7f26189f9700 (LWP 134531)):
#0 0x00007f26200b2c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f26200b6028 in __GI_abort () at abort.c:89
#2 0x000055f5ed4afc59 in PGStatService::get_osd_stat (this=<optimized out>, osd=<optimized out>) at /root/ceph/src/mon/PGStatService.h:45
#3 0x000055f5ed3fb9f8 in get_osd_utilization (this=<optimized out>, kb_avail=<synthetic pointer>, kb_used=<synthetic pointer>, kb=<synthetic pointer>, id=0) at /root/ceph/src/mon/OSDMonitor.cc:662
#4 average_utilization (this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:652
#5 OSDUtilizationDumper (tree_=<optimized out>, pgs_=<optimized out>, osdmap_=0x7f26189f4ad0, crush=<optimized out>, this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:585
#6 OSDUtilizationPlainDumper (tree=<optimized out>, pgs=<optimized out>, osdmap=0x7f26189f4ad0, crush=<optimized out>, this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:715
#7 OSDMonitor::print_utilization (this=this@entry=0x7f261f58a800, out=..., f=f@entry=0x0, tree=<optimized out>) at /root/ceph/src/mon/OSDMonitor.cc:883
#8 0x000055f5ed42f3b9 in OSDMonitor::preprocess_command (this=this@entry=0x7f261f58a800, op=...) at /root/ceph/src/mon/OSDMonitor.cc:4147
#9 0x000055f5ed4360f6 in OSDMonitor::preprocess_query (this=0x7f261f58a800, op=...) at /root/ceph/src/mon/OSDMonitor.cc:1581
#10 0x000055f5ed3ed62e in PaxosService::dispatch (this=0x7f261f58a800, op=...) at /root/ceph/src/mon/PaxosService.cc:74
#11 0x000055f5ed3abf6a in Monitor::handle_command (this=this@entry=0x7f261f589400, op=...) at /root/ceph/src/mon/Monitor.cc:2940
#12 0x000055f5ed3afcaf in Monitor::dispatch_op (this=this@entry=0x7f261f589400, op=...) at /root/ceph/src/mon/Monitor.cc:3854
#13 0x000055f5ed3b0e52 in Monitor::_ms_dispatch (this=this@entry=0x7f261f589400, m=m@entry=0x7f2613271400) at /root/ceph/src/mon/Monitor.cc:3749
#14 0x000055f5ed3d51f3 in Monitor::ms_dispatch (this=0x7f261f589400, m=0x7f2613271400) at /root/ceph/src/mon/Monitor.h:851
#15 0x000055f5ed7a624b in ms_deliver_dispatch (m=0x7f2613271400, this=0x7f261e125500) at /root/ceph/src/msg/Messenger.h:617
#16 DispatchQueue::entry (this=0x7f261e125658) at /root/ceph/src/msg/DispatchQueue.cc:197
#17 0x000055f5ed62cc9d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /root/ceph/src/msg/DispatchQueue.h:102
#18 0x00007f262189e184 in start_thread (arg=0x7f26189f9700) at pthread_create.c:312
#19 0x00007f2620179bed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111</pre></p>
<p>git bisect says it's from 459ec61901a3e7d58e971b96a06eb99b43e19571.</p> teuthology - Bug #15711 (Resolved): "Failed to schedule" without including run namehttps://tracker.ceph.com/issues/157112016-05-03T16:57:29ZGreg Farnumgfarnum@redhat.com
<p>We're getting a lot of emails which don't include the run name, and whose body text is<br /><pre>At least one job needs packages that don't exist. See above.</pre></p>
<p>This contrasts with the more useful "Failed to schedule teuthology-2016-05-02_23:14:02-samba-master---basic-smithi" and<br /><pre>Packages for ceph hash 'bb7d9c15576affda5a53f60acca6543fc0d267ec' not found</pre></p> Ceph - Bug #14995 (Resolved): "ceph version 10.0.3 was not installed, found 10.0.4."https://tracker.ceph.com/issues/149952016-03-07T08:00:52ZGreg Farnumgfarnum@redhat.com
<p>We're seeing this in a lot of runs. It might just be a consequence of scheduling across release tags, but I'm not sure. For instance, <a class="external" href="http://pulpito.ceph.com/teuthology-2016-03-05_18:04:02-fs-master---basic-smithi/42528/">http://pulpito.ceph.com/teuthology-2016-03-05_18:04:02-fs-master---basic-smithi/42528/</a></p>
<pre>2016-03-05T19:17:09.492 INFO:teuthology.packaging:Looking for package version: http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-basic/sha1/6018ccd6c4405c6014c65dd92660898adbc29c03/version
2016-03-05T19:17:09.498 INFO:teuthology.packaging:Package found...
2016-03-05T19:17:09.498 INFO:teuthology.packaging:Found version: 10.0.3
2016-03-05T19:17:09.499 INFO:teuthology.orchestra.run.smithi001:Running: "rpm -q ceph --qf '%{VERSION}'"
2016-03-05T19:17:09.601 INFO:teuthology.packaging:The installed version of ceph is 10.0.4
2016-03-05T19:17:09.601 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
vars.append(enter())
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 645, in install
install_packages(ctx, install_info, config)
File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 390, in install_packages
verify_package_version(ctx, config, remote)
File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 326, in verify_package_version
pkg=pkg_to_check
RuntimeError: ceph version 10.0.3 was not installed, found 10.0.4.</pre> CephFS - Bug #11481 (Resolved): "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->rep...https://tracker.ceph.com/issues/114812015-04-27T21:09:52ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-22_11:17:57-fs-hammer-testing-basic-typica/2417/">http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-22_11:17:57-fs-hammer-testing-basic-typica/2417/</a></p>
<pre>2015-04-22 15:36:47.891526 7fde5b0cd700 1 -- 172.20.133.65:6811/16012 <== mon.1 172.20.133.69:6789/0 54 ==== mdsmap(e 23) v1 ==== 867+0+0 (3456897465 0 0) 0x5a88480 con 0x39ef6e0
2015-04-22 15:36:47.891561 7fde5b0cd700 5 mds.-1.0 handle_mds_map epoch 23 from mon.1
2015-04-22 15:36:47.891578 7fde5b0cd700 10 mds.-1.0 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table}
2015-04-22 15:36:47.891584 7fde5b0cd700 10 mds.-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
2015-04-22 15:36:47.891587 7fde5b0cd700 10 mds.0.5 map says i am 172.20.133.65:6811/16012 mds.0.5 state up:replay
2015-04-22 15:36:47.891591 7fde5b0cd700 1 mds.0.5 handle_mds_map i am now mds.0.5
2015-04-22 15:36:47.891593 7fde5b0cd700 1 mds.0.5 handle_mds_map state change up:standby --> up:replay
2015-04-22 15:36:47.891594 7fde5b0cd700 10 mds.0.5 set_want_state up:standby -> up:replay
2015-04-22 15:36:47.891597 7fde5b0cd700 1 mds.0.5 replay_start
2015-04-22 15:36:47.891599 7fde5b0cd700 7 mds.0.cache set_recovery_set
2015-04-22 15:36:47.891601 7fde5b0cd700 1 mds.0.5 recovery set is
2015-04-22 15:36:47.891604 7fde5b0cd700 2 mds.0.5 boot_start 0: opening inotable
2015-04-22 15:36:47.891608 7fde5b0cd700 10 mds.0.inotable: load
2015-04-22 15:36:47.894085 7fde5b0cd700 -1 mds/MDSTable.cc: In function 'void MDSTable::load(MDSInternalContextBase*)' thread 7fde5b0cd700 time 2015-04-22 15:36:47.891613
mds/MDSTable.cc: 146: FAILED assert(is_undef())
ceph version 0.94.1-6-g8a58d83 (8a58d83b0d039d2c2be353fee9c57c4e6181b662)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x966faf]
2: (MDSTable::load(MDSInternalContextBase*)+0x4a6) [0x7b7366]
3: (MDS::boot_start(MDS::BootStep, int)+0x381) [0x5bfc91]
4: (MDS::replay_start()+0x127) [0x5c0ea7]
5: (MDS::handle_mds_map(MMDSMap*)+0x2fb7) [0x5c4107]
6: (MDS::handle_core_message(Message*)+0x26b) [0x5c5bbb]
7: (MDS::_dispatch(Message*)+0x2b) [0x5c626b]
8: (MDS::ms_dispatch(Message*)+0x1e4) [0x5c7704]
9: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa5a357]
10: (DispatchQueue::entry()+0x44a) [0xa574fa]
11: (DispatchQueue::DispatchThread::entry()+0xd) [0x94facd]
12: (()+0x7e9a) [0x7fde60197e9a]
13: (clone()+0x6d) [0x7fde5eb5e2ed]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre>
<p>This probably is because of our recent changes to MDSTable handling.</p> teuthology - Bug #11308 (Resolved): "mkfs.btrfs: invalid option -- 'f'"https://tracker.ceph.com/issues/113082015-04-02T03:48:31ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="http://pulpito.ceph.com/gregf-2015-04-01_15:56:32-fs-greg-fs-testing---basic-multi/832166/">http://pulpito.ceph.com/gregf-2015-04-01_15:56:32-fs-greg-fs-testing---basic-multi/832166/</a></p>
<p>There are several other examples from that run.<br /><pre>2015-04-01T19:00:21.009 INFO:tasks.ceph:Running mkfs on osd nodes...
2015-04-01T19:00:21.010 INFO:tasks.ceph:ctx.disk_config.remote_to_roles_to_dev: {Remote(name='ubuntu@plana26.front.sepia.ceph.com'): {'1': '/dev/sdb', '0': '/dev/sdd'}, Remote(name='ubuntu@plana96.front.sepia.ceph.com'): {'3': '/dev/sdb', '2': '/dev/sdd'}}
2015-04-01T19:00:21.010 INFO:teuthology.orchestra.run.plana26:Running: 'sudo mkdir -p /var/lib/ceph/osd/ceph-0'
2015-04-01T19:00:21.023 INFO:tasks.ceph:{}
2015-04-01T19:00:21.023 INFO:tasks.ceph:0
2015-04-01T19:00:21.023 INFO:tasks.ceph:['mkfs.btrfs', '-f', '-m', 'single', '-l', '32768', '-n', '32768'] on /dev/sdd on ubuntu@plana26.front.sepia.ceph.com
2015-04-01T19:00:21.023 INFO:teuthology.orchestra.run.plana26:Running: 'yes | sudo mkfs.btrfs -f -m single -l 32768 -n 32768 /dev/sdd'
2015-04-01T19:00:21.139 INFO:teuthology.orchestra.run.plana26.stderr:mkfs.btrfs: invalid option -- 'f'
2015-04-01T19:00:21.139 INFO:teuthology.orchestra.run.plana26.stderr:usage: mkfs.btrfs [options] dev [ dev ... ]
2015-04-01T19:00:21.139 INFO:teuthology.orchestra.run.plana26.stderr:options:
2015-04-01T19:00:21.139 INFO:teuthology.orchestra.run.plana26.stderr: -A --alloc-start the offset to start the FS
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -b --byte-count total number of bytes in the FS
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -d --data data profile, raid0, raid1, raid10 or single
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -l --leafsize size of btree leaves
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -L --label set a label
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -m --metadata metadata profile, values like data profile
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -n --nodesize size of btree nodes
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr: -s --sectorsize min block allocation
2015-04-01T19:00:21.140 INFO:teuthology.orchestra.run.plana26.stderr:Btrfs Btrfs v0.19</pre></p>
<p>This is an Ubuntu Precise machine, so my naive assumption is that it's just too old to have the "-f" flag?</p> Ceph - Feature #9320 (Rejected): "ceph osd dump" does not flag full OSDshttps://tracker.ceph.com/issues/93202014-09-02T13:40:05ZGreg Farnumgfarnum@redhat.com
<p>Right now, when there are full OSDs in the cluster, the only way to find them is by running "ceph health detail". The data should also be marked somewhere when dumping OSD or PG data.</p> rgw - Feature #5219 (New): "radosgw-admin user check" should handle non-existent buckets in indexhttps://tracker.ceph.com/issues/52192013-05-31T13:17:13ZGreg Farnumgfarnum@redhat.com
<p>Right now, if "radosgw-admin user check" encounters a bucket whose object doesn't exist it uses default values (because RGWRados::get_bucket_info() lies), and so the index gets set to those instead of removing the item. If that function behavior changes (it will change with the DR/geo work we're putting in, and I may backport it as well), then user check just skips the bucket and doesn't try to fix it at all.</p>
<p>Instead, we probably want it to remove the user index's omap entry.</p> CephFS - Cleanup #2378 (Resolved): "ceph -s" MDS output is confusinghttps://tracker.ceph.com/issues/23782012-05-02T23:39:20ZGreg Farnumgfarnum@redhat.com
<p>If you're running an RBD/RGW cluster without an MDS daemon, having output like<br /><pre>mds e1: 0/0/1 up</pre><br />is confusing. It seems to say that 1 MDS is up!</p>
<p>Some combination of: not displaying the MDS log until an MDS has booted, spacing out the different statuses and not appending "up" to the end of the line, and not defaulting to max_mds of one would probably be sufficient.</p> CephFS - Bug #1200 (Duplicate): 4-MDS fsstress remote ino lookup cyclehttps://tracker.ceph.com/issues/12002011-06-17T15:19:08ZGreg Farnumgfarnum@redhat.com
<p>Notice how it's missing an expected ino in a dir that's marked complete, and then it just tries to do the lookup agan.</p>
<pre>2011-06-16 17:52:39.329752 7f3f8629e710 mds0.server dispatch_client_request client_request(client4115:10004 lookup #40000000022/f6f) v1
2011-06-16 17:52:39.329757 7f3f8629e710 mds0.server rdlock_path_pin_ref request(client4115:10004 cr=0x1602480) #40000000022/f6f
2011-06-16 17:52:39.329762 7f3f8629e710 mds0.cache traverse: opening base ino 40000000022 snap head
2011-06-16 17:52:39.329766 7f3f8629e710 mds0.cache traverse: path seg depth 0 'f6f' snapid head
2011-06-16 17:52:39.329771 7f3f8629e710 mds0.cache.dir(40000000022) lookup (head, 'f6f')
2011-06-16 17:52:39.329776 7f3f8629e710 mds0.cache.dir(40000000022) hit -> (f6f,head)
2011-06-16 17:52:39.329781 7f3f8629e710 mds0.cache remote link to 4000000002e, which i don't have
2011-06-16 17:52:39.329785 7f3f8629e710 mds0.cache _get_waiter retryrequest
2011-06-16 17:52:39.329789 7f3f8629e710 mds0.cache open_remote_ino on 4000000002e
2011-06-16 17:52:39.329800 7f3f8629e710 -- 10.0.1.205:6803/14232 --> mds0 10.0.1.205:6803/14232 -- mds_table_request(anchortable query 8 bytes) v1 -- ?+0 0x158ba00
2011-06-16 17:52:39.329813 7f3f8629e710 -- 10.0.1.205:6803/14232 <== mds0 10.0.1.205:6803/14232 0 ==== mds_table_request(anchortable query 8 bytes) v1 ==== 0+0+0 (0 0 0) 0x158ba00 con 0x1137280
2011-06-16 17:52:39.329820 7f3f8629e710 mds0.anchorserver handle_lookup mds_table_request(anchortable query 8 bytes) v1 ino 4000000002e
2011-06-16 17:52:39.329827 7f3f8629e710 mds0.anchorserver handle_lookup adding a(4000000002e 40000000026/2685483254 1 v99)
2011-06-16 17:52:39.329832 7f3f8629e710 mds0.anchorserver handle_lookup adding a(40000000026 40000000022/3293366682 5 v525)
2011-06-16 17:52:39.329837 7f3f8629e710 mds0.anchorserver handle_lookup adding a(40000000022 1000000001c/3920418541 8 v525)
2011-06-16 17:52:39.329842 7f3f8629e710 mds0.anchorserver handle_lookup adding a(1000000001c 1000000000d/2876722588 18 v525)
2011-06-16 17:52:39.329847 7f3f8629e710 mds0.anchorserver handle_lookup adding a(1000000000d 1/3591908529 18 v525)
2011-06-16 17:52:39.329856 7f3f8629e710 -- 10.0.1.205:6803/14232 --> 10.0.1.205:6803/14232 -- mds_table_request(anchortable query_reply tid 536 177 bytes) v1 -- ?+0 0x135ea00 con 0x1137280
2011-06-16 17:52:39.329868 7f3f8629e710 -- 10.0.1.205:6803/14232 <== mds0 10.0.1.205:6803/14232 0 ==== mds_table_request(anchortable query_reply tid 536 177 bytes) v1 ==== 0+0+0 (0 0 0) 0x135ea00 con 0x1137280
2011-06-16 17:52:39.329875 7f3f8629e710 mds0.tableclient(anchortable) handle_request mds_table_request(anchortable query_reply tid 536 177 bytes) v1
2011-06-16 17:52:39.329880 7f3f8629e710 mds0.anchorclient handle_anchor_reply mds_table_request(anchortable query_reply tid 536 177 bytes) v1
2011-06-16 17:52:39.329886 7f3f8629e710 mds0.cache open_remote_ino_2 on 4000000002e, trace depth is 5
2011-06-16 17:52:39.329891 7f3f8629e710 mds0.cache 5: a(4000000002e 40000000026/2685483254 1 v99)
2011-06-16 17:52:39.329896 7f3f8629e710 mds0.cache 4: a(40000000026 40000000022/3293366682 5 v525)
2011-06-16 17:52:39.329900 7f3f8629e710 mds0.cache 3: a(40000000022 1000000001c/3920418541 8 v525)
2011-06-16 17:52:39.329919 7f3f8629e710 mds0.cache deepest cached inode at 3 is [inode 40000000022 [...2,head] /p7/d3/d24/ auth{1=1,2=1,3=1} v552 na=2 f(v7 m2011-06-16 17:52:09.020522 11=8+3) n(v18 rc2011-06-16 17:52:14.318441 b2420
5204 a10 100=70+30) (inest mix) (ifile excl) (iversion lock) caps={4115=pAsLsXsFsx/-@10},l=4115 | dirfrag caps replicated dirty 0x16e43a0]
2011-06-16 17:52:39.329933 7f3f8629e710 mds0.cache expected ino 40000000026 in complete dir [dir 40000000022 /p7/d3/d24/ [2,head] auth{1=1,2=1,3=1} v=386 cv=0/0 na=2 state=1610612738|complete f(v7 m2011-06-16 17:52:09.020522 11=8+3)
n(v18 rc2011-06-16 17:52:14.318441 b24205204 a10 99=70+29) hs=11+6,ss=0+0 dirty=14 | child replicated dirty 0x19a8000], requerying anchortable
2011-06-16 17:52:39.329937 7f3f8629e710 mds0.cache open_remote_ino on 4000000002e
2011-06-16 17:52:39.329950 7f3f8629e710 -- 10.0.1.205:6803/14232 --> mds0 10.0.1.205:6803/14232 -- mds_table_request(anchortable query 8 bytes) v1 -- ?+0 0x158ba00
2011-06-16 17:52:39.329963 7f3f8629e710 -- 10.0.1.205:6803/14232 <== mds0 10.0.1.205:6803/14232 0 ==== mds_table_request(anchortable query 8 bytes) v1 ==== 0+0+0 (0 0 0) 0x158ba00 con 0x1137280
2011-06-16 17:52:39.329969 7f3f8629e710 mds0.anchorserver handle_lookup mds_table_request(anchortable query 8 bytes) v1 ino 4000000002e
2011-06-16 17:52:39.329975 7f3f8629e710 mds0.anchorserver handle_lookup adding a(4000000002e 40000000026/2685483254 1 v99)
2011-06-16 17:52:39.329979 7f3f8629e710 mds0.anchorserver handle_lookup adding a(40000000026 40000000022/3293366682 5 v525)
2011-06-16 17:52:39.329984 7f3f8629e710 mds0.anchorserver handle_lookup adding a(40000000022 1000000001c/3920418541 8 v525)
2011-06-16 17:52:39.329989 7f3f8629e710 mds0.anchorserver handle_lookup adding a(1000000001c 1000000000d/2876722588 18 v525)
2011-06-16 17:52:39.329993 7f3f8629e710 mds0.anchorserver handle_lookup adding a(1000000000d 1/3591908529 18 v525)
2011-06-16 17:52:39.330002 7f3f8629e710 -- 10.0.1.205:6803/14232 --> 10.0.1.205:6803/14232 -- mds_table_request(anchortable query_reply tid 536 177 bytes) v1 -- ?+0 0x135ea00 con 0x1137280
2011-06-16 17:52:39.330012 7f3f8629e710 -- 10.0.1.205:6803/14232 <== mds0 10.0.1.205:6803/14232 0 ==== mds_table_request(anchortable query_reply tid 536 177 bytes) v1 ==== 0+0+0 (0 0 0) 0x135ea00 con 0x1137280
2011-06-16 17:52:39.330018 7f3f8629e710 mds0.tableclient(anchortable) handle_request mds_table_request(anchortable query_reply tid 536 177 bytes) v1
2011-06-16 17:52:39.330023 7f3f8629e710 mds0.anchorclient handle_anchor_reply mds_table_request(anchortable query_reply tid 536 177 bytes) v1
2011-06-16 17:52:39.330028 7f3f8629e710 mds0.cache open_remote_ino_2 on 4000000002e, trace depth is 5
2011-06-16 17:52:39.330033 7f3f8629e710 mds0.cache 5: a(4000000002e 40000000026/2685483254 1 v99)
2011-06-16 17:52:39.330037 7f3f8629e710 mds0.cache 4: a(40000000026 40000000022/3293366682 5 v525)
2011-06-16 17:52:39.330042 7f3f8629e710 mds0.cache 3: a(40000000022 1000000001c/3920418541 8 v525)
2011-06-16 17:52:39.330057 7f3f8629e710 mds0.cache deepest cached inode at 3 is [inode 40000000022 [...2,head] /p7/d3/d24/ auth{1=1,2=1,3=1} v552 na=2 f(v7 m2011-06-16 17:52:09.020522 11=8+3) n(v18 rc2011-06-16 17:52:14.318441 b2420
5204 a10 100=70+30) (inest mix) (ifile excl) (iversion lock) caps={4115=pAsLsXsFsx/-@10},l=4115 | dirfrag caps replicated dirty 0x16e43a0]
2011-06-16 17:52:39.330071 7f3f8629e710 mds0.cache expected ino 40000000026 in complete dir [dir 40000000022 /p7/d3/d24/ [2,head] auth{1=1,2=1,3=1} v=386 cv=0/0 na=2 state=1610612738|complete f(v7 m2011-06-16 17:52:09.020522 11=8+3)
n(v18 rc2011-06-16 17:52:14.318441 b24205204 a10 99=70+29) hs=11+6,ss=0+0 dirty=14 | child replicated dirty 0x19a8000], got same anchor a(40000000026 40000000022/3293366682 5 v525) 2x in a row
2011-06-16 17:52:39.330077 7f3f8629e710 mds0.server dispatch_client_request client_request(client4115:10004 lookup #40000000022/f6f) v1
2011-06-16 17:52:39.330082 7f3f8629e710 mds0.server rdlock_path_pin_ref request(client4115:10004 cr=0x1602480) #40000000022/f6f</pre>
<p>Logs in kai:~gregf/logs/fsstress/odd_lookup_bug</p> CephFS - Bug #1199 (Resolved): 4-MDS fsstress: remote ino lookup asserthttps://tracker.ceph.com/issues/11992011-06-17T15:12:32ZGreg Farnumgfarnum@redhat.com
<pre>mds/MDCache.cc: 6861: FAILED assert(r == 0)
ceph version 0.29.1-281-g425b644 (commit:425b644ff803e9925f370b547b68dd0c8c3c8648)
1: ./cmds() [0x5b89c5]
2: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, int)+0xc4) [0x59fbf4]
3: (MDCache::handle_discover_reply(MDiscoverReply*)+0x602) [0x591472]
4: (MDCache::dispatch(Message*)+0x115) [0x595f65]
5: (MDS::handle_deferrable_message(Message*)+0x5ef) [0x49960f]
6: (MDS::_dispatch(Message*)+0x18e4) [0x4ae864]
7: (MDS::ms_dispatch(Message*)+0x57) [0x4aee37]
8: (SimpleMessenger::dispatch_entry()+0x8e3) [0x6e4d63]
9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48ba3c]
10: (()+0x68ba) [0x7ff1c8c098ba]
11: (clone()+0x6d) [0x7ff1c789702d]</pre> CephFS - Bug #1170 (Closed): 2-MDS fsstress: SR fails subtree asserthttps://tracker.ceph.com/issues/11702011-06-10T14:28:00ZGreg Farnumgfarnum@redhat.com
<p>This is on mds_rename branch.<br /><pre>mds/MDCache.cc: 1188: FAILED assert(bounds == subtrees[dir])
ceph version 0.28.2-275-g6332a78 (commit:6332a78b0c6c67e18d877458f291febc6e5d4bd6)
1: (MDCache::verify_subtree_bounds(CDir*, std::set<CDir*, std::less<CDir*>, std::allocator<CDir*> > const&)+0x4ee) [0x53d6fe]
2: (MDCache::adjust_bounded_subtree_auth(CDir*, std::set<CDir*, std::less<CDir*>, std::allocator<CDir*> >&, std::pair<int, int>)+0xa2f) [0x56f12f]
3: (EExport::replay(MDS*)+0x54b) [0x4c9a4b]
4: (MDLog::_replay_thread()+0xcd6) [0x68eb36]
5: (MDLog::ReplayThread::entry()+0xd) [0x4b1a6d]
6: (()+0x68ba) [0x7feba28708ba]
7: (clone()+0x6d) [0x7feba14fe02d]</pre></p>
<p>Logs in kai:~gregf/logs/fsstress/adjust_subtree_auth_assert. This is mds.bs.</p> CephFS - Bug #1169 (Closed): 2-MDS fsstress: Active fails adjust_subtree_authhttps://tracker.ceph.com/issues/11692011-06-10T14:27:38ZGreg Farnumgfarnum@redhat.com
<p>This is on mds_rename branch<br /><pre>mds/MDCache.cc: In function 'void MDCache::adjust_subtree_auth(CDir*, std::pair<int, int>, bool)', in thread '0x7f8a2d695710'
mds/MDCache.cc: 707: FAILED assert(subtrees.count(root))
ceph version 0.28.2-275-g6332a78 (commit:6332a78b0c6c67e18d877458f291febc6e5d4bd6)
1: (MDCache::adjust_subtree_auth(CDir*, std::pair<int, int>, bool)+0x289) [0x53b129]
2: (MDCache::adjust_bounded_subtree_auth(CDir*, std::set<CDir*, std::less<CDir*>, std::allocator<CDir*> >&, std::pair<int, int>)+0x9fa) [0x56f0fa]
3: (MDCache::adjust_bounded_subtree_auth(CDir*, std::vector<dirfrag_t, std::allocator<dirfrag_t> >&, std::pair<int, int>)+0x15e) [0x56f33e]
4: (EImportStart::replay(MDS*)+0x36b) [0x4c869b]
5: (MDLog::_replay_thread()+0xcd6) [0x68eb36]
6: (MDLog::ReplayThread::entry()+0xd) [0x4b1a6d]
7: (()+0x68ba) [0x7f8a9b8cb8ba]
8: (clone()+0x6d) [0x7f8a9a55902d]</pre></p>
<p>Logs in kai:~gregf/logs/fsstress/adjust_subtree_auth_assert. This is mds.a</p> Ceph - Bug #372 (Rejected): 2-Monitor election fighthttps://tracker.ceph.com/issues/3722010-08-23T10:19:42ZGreg Farnumgfarnum@redhat.com
<p>Was running tests on a local branch with a 2 monitor setup and eventually got to where no system work was completing because the losing monitor kept calling a new election. Sage thinks there might be an issue where the monitors will check that the winner has a lower ID than them, since we changed monitors to have names.</p>
<p>Logs are in the failure dir, so find out if that's what happened and fix it.</p> rgw - Feature #9 (Won't Fix): Access unimported datahttps://tracker.ceph.com/issues/92010-04-09T12:50:41ZGreg Farnumgfarnum@redhat.com
<p>Right now, rgw can only access data which has been added through rgw, even if a user's auid is set in rgw and matches that used on the data.<br />Due to architectural issues with rgw, fixing this may be a bit messy, but if demand comes up rgw should be able to auto-generate default ACLs and allow access to data which a user is known to own (both pools and objects).</p> Ceph - Bug #5 (Closed): ./rados lspools sometimes hangs after listing all pools?https://tracker.ceph.com/issues/52010-04-09T12:35:23ZGreg Farnumgfarnum@redhat.com
<p>This is rare and intermittent, but happens occasionally.<br />Best guess so far is that this locks the OSDMap but doesn't explicitly wake up before exiting so if an OSDMap update comes in that thread never wakes and the program hangs.</p>