Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2016-01-19T02:44:32ZCeph
Redmine Ceph - Bug #14405 (Resolved): ceph-mon process crash when wrong message type in ceph.confhttps://tracker.ceph.com/issues/144052016-01-19T02:44:32Zbo cai
<ol>
<li>I set a wrong message type in my ceph.conf<br />cat /etc/ceph/ceph.conf<br />[global]<br />fsid = b4c0f945-f80a-4092-b059-d7b7b8fbb4ca<br />mon_initial_members = cas-meta3, ubuntu, ubuntu-meta2<br />mon_host = 172.16.53.19,172.16.53.52,172.16.53.53<br />ms_type = caibo</li>
</ol>
<ol>
<li>then I try to start my ceph monitor, but the program crash<br />root@ubuntu:/deb-test/test/DEBIAN# ceph-mon -i ubuntu<br />2016-01-06 12:22:08.464863 7f956b0ab8c0 -1 unrecognized ms_type 'caibo'
<ul>
<li>Caught signal (Segmentation fault) <b><br /> in thread 7f956b0ab8c0<br /> ceph version 0.94.5-V100R001B01D007 (6a8648811103835d57927149b419b0e037cc53c6)<br /> 1: ceph-mon() [0x9aec6a]<br /> 2: (()+0x10340) [0x7f956a1aa340]<br /> 3: (main()+0x1c28) [0x577008]<br /> 4: (__libc_start_main()+0xf5) [0x7f9568634ec5]<br /> 5: ceph-mon() [0x5993f7]<br />2016-01-06 12:22:08.466383 7f956b0ab8c0 -1 <strong></b> Caught signal (Segmentation fault) *</strong><br /> in thread 7f956b0ab8c0
<p>ceph version 0.94.5-V100R001B01D007 (6a8648811103835d57927149b419b0e037cc53c6)<br /> 1: ceph-mon() [0x9aec6a]<br /> 2: (()+0x10340) [0x7f956a1aa340]<br /> 3: (main()+0x1c28) [0x577008]<br /> 4: (__libc_start_main()+0xf5) [0x7f9568634ec5]<br /> 5: ceph-mon() [0x5993f7]<br /> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</p>
<pre><code>-1> 2016-01-06 12:22:08.464863 7f956b0ab8c0 -1 unrecognized ms_type 'caibo'<br /> 0> 2016-01-06 12:22:08.466383 7f956b0ab8c0 -1 *<strong>* Caught signal (Segmentation fault) *</strong><br /> in thread 7f956b0ab8c0</code></pre>
<p>ceph version 0.94.5-V100R001B01D007 (6a8648811103835d57927149b419b0e037cc53c6)<br /> 1: ceph-mon() [0x9aec6a]<br /> 2: (()+0x10340) [0x7f956a1aa340]<br /> 3: (main()+0x1c28) [0x577008]<br /> 4: (__libc_start_main()+0xf5) [0x7f9568634ec5]<br /> 5: ceph-mon() [0x5993f7]<br /> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</p></li>
</ul></li>
</ol>
<p>[24581]: (33) Numerical argument out of domain</p> Ceph - Bug #14354 (Resolved): an IOError happened when use 'ceph report | less' commandhttps://tracker.ceph.com/issues/143542016-01-13T09:04:11Zbo cai
<ol>
<li>Error occurs when I use the following command</li>
</ol>
<p>ceph report | less</p>
<ol>
<li>I press q to exit</li>
</ol>
<p>root@compiler:/home/caibo/repo/ceph/src# ./ceph report | less<br />Traceback (most recent call last):<br /> File "./ceph", line 936, in <module><br /> retval = main()<br /> File "./ceph", line 922, in main<br /> raw_stdout.write(outbuf)<br />IOError: [Errno 32] Broken pipe<br />root@compiler:/home/caibo/repo/ceph/src#</p> Ceph - Bug #14074 (Duplicate): OSD daemon can not start after cluster deploymenthttps://tracker.ceph.com/issues/140742015-12-14T08:56:34Zbo cai
I cloned the latest code and compile packaged into deb, then to the respective host DEB install a new package, version 10.0.1.<br />Next, I use ceph-deploy (1.5.28) to deploy the cluster, in the increase OSD, without any errors.<br />But ceph osd tree does not show any osd information.<br />We use the following command:<br />ceph-deploy new node1<br />ceph-deploy mon --overwrite-conf create-initial<br />ceph-deploy --overwrite-conf gatherkeys node1
<ol>
<li>Then I copy ceph config file to node2<br />ceph-deploy --overwrite-conf osd create node2: / dev / sdb</li>
</ol>
<p>I also tried a manual deployment, osd successful deployment, but can not start osd process.<br />I checked the log information is as follows:</p>
<p>2015-12-14 01:49:53.783910 7f03883cd7c0 1 filestore(/var/lib/ceph/osd/ceph-0) mkfs done in /var/lib/ceph/osd/ceph-0<br />2015-12-14 01:49:53.797667 7f03883cd7c0 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)<br />2015-12-14 01:49:53.798726 7f03883cd7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option<br />2015-12-14 01:49:53.798759 7f03883cd7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option<br />2015-12-14 01:49:53.798807 7f03883cd7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported<br />2015-12-14 01:49:53.799981 7f03883cd7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)<br />2015-12-14 01:49:53.800105 7f03883cd7c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: extsize is supported and your kernel >= 3.5<br />2015-12-14 01:49:53.917696 7f03883cd7c0 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled<br />2015-12-14 01:49:53.917897 7f03883cd7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway<br />2015-12-14 01:49:53.917899 7f03883cd7c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 15: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0<br />2015-12-14 01:49:53.918260 7f03883cd7c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 15: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0<br />2015-12-14 01:49:53.918749 7f03883cd7c0 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade<br />2015-12-14 01:49:53.918948 7f03883cd7c0 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory<br />2015-12-14 01:49:53.970546 7f03883cd7c0 1 journal close /var/lib/ceph/osd/ceph-0/journal<br />2015-12-14 01:49:53.971605 7f03883cd7c0 -1 created object store /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal for osd.0 fsid 34704ace-c2c0-47f6-8540-9e8ce2d04c5a<br />2015-12-14 01:49:53.971696 7f03883cd7c0 -1 auth: error reading file: /var/lib/ceph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory<br />2015-12-14 01:49:53.971949 7f03883cd7c0 -1 created new key in keyring /var/lib/ceph/osd/ceph-0/keyring<br />2015-12-14 03:38:34.975412 7f5bc0a787c0 0 ceph version 10.0.0-1055-g7f627e0 (7f627e04c8c939a1ddb8f01f74b9e7043ba54e42), process ceph-osd, pid 30504<br />2015-12-14 03:38:35.001311 7f5bc0a787c0 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)<br />2015-12-14 03:38:35.002549 7f5bc0a787c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option<br />2015-12-14 03:38:35.002570 7f5bc0a787c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option<br />2015-12-14 03:38:35.002602 7f5bc0a787c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported<br />2015-12-14 03:38:35.003382 7f5bc0a787c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)<br />2015-12-14 03:38:35.003565 7f5bc0a787c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: extsize is supported and your kernel >= 3.5<br />2015-12-14 03:38:35.023805 7f5bc0a787c0 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled<br />2015-12-14 03:38:35.024167 7f5bc0a787c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway<br />2015-12-14 03:38:35.024184 7f5bc0a787c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 19: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0<br />2015-12-14 03:38:35.037433 7f5bc0a787c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 19: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0<br />2015-12-14 03:38:35.038067 7f5bc0a787c0 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade<br />2015-12-14 03:38:35.083760 7f5bc0a787c0 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello<br />2015-12-14 03:38:35.137222 7f5bc0a787c0 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan<br />2015-12-14 03:38:35.187230 7f5bc0a787c0 0 osd.0 0 crush map has features 33816576, adjusting msgr requires for clients<br />2015-12-14 03:38:35.187261 7f5bc0a787c0 0 osd.0 0 crush map has features 33816576 was 8705, adjusting msgr requires for mons<br />2015-12-14 03:38:35.187267 7f5bc0a787c0 0 osd.0 0 crush map has features 33816576, adjusting msgr requires for osds<br />2015-12-14 03:38:35.187355 7f5bc0a787c0 0 osd.0 0 load_pgs<br />2015-12-14 03:38:35.187400 7f5bc0a787c0 0 osd.0 0 load_pgs opened 0 pgs<br />2015-12-14 03:38:35.190445 7f5bc0a787c0 -1 osd.0 0 log_to_monitors {default=true}<br />2015-12-14 03:38:35.206336 7f5bc0a787c0 0 osd.0 0 done with init, starting boot process<br />2015-12-14 03:38:35.553742 7f5bae6ac700 0 osd.0 5 crush map has features 1107558400, adjusting msgr requires for clients<br />2015-12-14 03:38:35.553769 7f5bae6ac700 0 osd.0 5 crush map has features 1107558400 was 33825281, adjusting msgr requires for mons<br />2015-12-14 03:38:35.553781 7f5bae6ac700 0 osd.0 5 crush map has features 1107558400, adjusting msgr requires for osds</p> Ceph-deploy - Bug #13841 (Closed): ceph-deploy failed on ubuntu14.04.1 because of apt-get don't h...https://tracker.ceph.com/issues/138412015-11-20T07:11:54Zbo cai
<p>Is this a bug or something? Our Teuthology platform report this:<br />2015-11-20T11:11:37.815 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][WARNING]<br />2015-11-20T11:11:37.815 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][INFO ] Running command: sudo apt-key add release.asc<br />2015-11-20T11:11:37.882 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] OK<br />2015-11-20T11:11:37.883 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] add deb repo to /etc/apt/sources.list.d/<br />2015-11-20T11:11:37.885 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][INFO ] Running command: sudo env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes -q update<br />2015-11-20T11:11:37.920 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Ign <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty InRelease<br />2015-11-20T11:11:37.920 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Ign <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty Release.gpg<br />2015-11-20T11:11:37.921 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Get:1 <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty Release [2,202 B]<br />2015-11-20T11:11:37.927 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Get:2 <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty/main amd64 Packages [7,300 B]<br />2015-11-20T11:11:37.931 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Get:3 <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty/main i386 Packages [387 B]<br />2015-11-20T11:11:37.995 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Ign <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty/main Translation-en_US<br />2015-11-20T11:11:37.995 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Ign <a class="external" href="http://gitbuilder.ceph.com">http://gitbuilder.ceph.com</a> trusty/main Translation-en<br />2015-11-20T11:11:37.995 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Fetched 9,889 B in 0s (138 kB/s)<br />2015-11-20T11:11:38.011 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Reading package lists...<br />2015-11-20T11:11:38.020 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][INFO ] Running command: sudo env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph ceph-mds radosgw<br />2015-11-20T11:11:38.029 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Reading package lists...<br />2015-11-20T11:11:38.037 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Building dependency tree...<br />2015-11-20T11:11:38.037 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Reading state information...<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Recommended packages:<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] ceph-fs-common<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] The following NEW packages will be installed:<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] ceph ceph-mds radosgw<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] 0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] Need to get 0 B/20.8 MB of archives.<br />2015-11-20T11:11:38.072 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] After this operation, 99.5 MB of additional disk space will be used.<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] WARNING: The following packages cannot be authenticated!<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][DEBUG ] ceph ceph-mds radosgw<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][WARNING] E: There are problems and -y was used without --force-yes<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:[plana032][ERROR ] RuntimeError: command returned non-zero exit status: 100<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes -q --no-install-recommends install -o Dpkg::Options::=--force-confnew ceph ceph-mds radosgw<br />2015-11-20T11:11:38.073 INFO:teuthology.orchestra.run.plana032.stderr:<br />2015-11-20T11:11:38.091 INFO:tasks.ceph_deploy:Error encountered, logging exception before tearing down ceph-deploy<br />2015-11-20T11:11:38.101 INFO:tasks.ceph_deploy:Traceback (most recent call last):<br /> File "/home/teuthworker/src/ceph-qa-suite_master/tasks/ceph_deploy.py", line 227, in build_ceph_cluster<br /> raise RuntimeError("ceph-deploy: Failed to install ceph")<br />RuntimeError: ceph-deploy: Failed to install ceph</p> teuthology - Bug #13700 (New): Command failed with status 139: ''sudo adjust-ulimits ceph-coverag...https://tracker.ceph.com/issues/137002015-11-05T07:39:46Zbo cai
<p>Many of my tasks in the implementation of all failed due to the following reasons .</p>
<p>2015-11-05T14:17:05.103 INFO:tasks.rados.rados.0.dtod003.stdout:3998: left oid 57 (ObjNum 1694 snap 377 seq_num 1694)<br />2015-11-05T14:17:05.103 INFO:tasks.rados.rados.0.dtod003.stdout:3998: done (0 left)<br />2015-11-05T14:17:05.137 INFO:tasks.rados.rados.0.dtod003.stderr:0 errors.<br />2015-11-05T14:17:05.137 INFO:tasks.rados.rados.0.dtod003.stderr:<br />2015-11-05T14:17:05.158 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds<br />2015-11-05T14:17:05.158 INFO:tasks.thrashosds:joining thrashosds<br />2015-11-05T14:17:05.163 ERROR:teuthology.run_tasks:Manager failed: thrashosds<br />Traceback (most recent call last):<br /> File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks<br /> suppress = manager.__exit__(*exc_info)<br /> File "/usr/lib/python2.7/contextlib.py", line 24, in <i>exit</i><br /> self.gen.next()<br /> File "/home/teuthworker/src/ceph-qa-suite_master/tasks/thrashosds.py", line 179, in task<br /> thrash_proc.do_join()<br /> File "/home/teuthworker/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 435, in do_join<br /> self.thread.get()<br /> File "/home/teuthworker/src/teuthology_master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 308, in get<br /> raise self._exception<br />CommandFailedError: Command failed on dtod003 with status 139: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'<br />2015-11-05T14:17:05.168 DEBUG:teuthology.run_tasks:Unwinding manager ceph<br />2015-11-05T14:17:05.169 INFO:teuthology.orchestra.run.dtod003:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'<br />2015-11-05T14:17:05.832 INFO:teuthology.orchestra.run.dtod003.stderr:2015-11-05 14:17:06.940572 7fe9e77d8700 -1 WARNING: the following dangerous and experimental features are enabled: keyvaluestore,ms-type-async<br />2015-11-05T14:17:05.916 INFO:teuthology.orchestra.run.dtod003.stderr:2015-11-05 14:17:07.018080 7fe9e77d8700 -1 WARNING: the following dangerous and experimental features are enabled: keyvaluestore,ms-type-async<br />2015-11-05T14:17:05.916 INFO:teuthology.orchestra.run.dtod003.stderr:2015-11-05 14:17:07.019019 7fe9e77d8700 -1 WARNING: experimental feature 'ms-type-async' is enabled<br />2015-11-05T14:17:05.917 INFO:teuthology.orchestra.run.dtod003.stderr:Please be aware that this feature is experimental, untested,<br />2015-11-05T14:17:05.917 INFO:teuthology.orchestra.run.dtod003.stderr:unsupported, and may result in data corruption, data loss,<br />2015-11-05T14:17:05.918 INFO:teuthology.orchestra.run.dtod003.stderr:and/or irreparable damage to your cluster. Do not use<br />2015-11-05T14:17:05.918 INFO:teuthology.orchestra.run.dtod003.stderr:feature with important data.<br />2015-11-05T14:17:05.918 INFO:teuthology.orchestra.run.dtod003.stderr:</p>
<p>See detailed log。</p> teuthology - Bug #13492 (Closed): adjust-ulimits ceph-coverage command fails when try to kill pro...https://tracker.ceph.com/issues/134922015-10-14T23:59:48Zbo cai
<p>I have try many times, teuthology always failed when execute command "sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 3"</p>
<p>In fact , my command is ". ~/.profile ; sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 3"</p>
<p>I just want to load the user's environment , such as HTTP proxy.</p> rbd - Bug #13156 (Won't Fix): import/export do not support snapshotshttps://tracker.ceph.com/issues/131562015-09-18T12:19:36Zbo cai
<p>(1)I create a rbd block named abc;<br /> rbd create abc --size 10</p>
<p>(2)then I create a snapshoot of abc;<br /> rbd snap create --image abc --snap s1<br /> rbd snap create --image abc --snap s2</p>
<p>(3)then I export abc in a local file;<br /> rbd export abc local_file</p>
<p>(4)import local_file as a new rbd block;<br /> rbd import local_file new_rbd;</p>
<p>(5)but the snapshoots(s1, s2) are disappear!<br /> rbd snap ls new_rbd</p>
<p>Is this a bug or someting else?</p> Ceph - Bug #13078 (Resolved): ceph-monstore-tool still create a local file when get map failedhttps://tracker.ceph.com/issues/130782015-09-14T08:09:09Zbo cai
<p>when i try to get a map that dose not exist, it print an error, but still create a local file.</p>
<p><strong>here is the command I use:</strong></p>
<p>root@vm01:/test# clear<br />root@vm01:/test# ls<br />root@vm01:/test# ceph-monstore-tool /var/lib/ceph/mon/ceph-vm01/ get not-exist-map -- -o local-file<br />Error getting map: (2) No such file or directory<br />root@vm01:/test# ls<br />local-file</p> Ceph - Bug #12979 (Closed): Ceph lost it's repair ability after repeatedly flappinghttps://tracker.ceph.com/issues/129792015-09-07T02:29:03Zbo cai
<p><strong>I have a ceph performance test cluster, you can see the ceph network diagram in the attachments(ceph-network-diagram.png).</strong></p>
<p><strong>It's state show as blow in the event of a problem before:</strong></p>
<p>cluster 2f6f7e9e-9167-4ac5-889b-d18680fa4c04<br /> health HEALTH_OK<br /> monmap e1: 3 mons at {node145=172.16.38.145:6789/0,node146=172.16.38.146:6789/0,node147=172.16.38.147:6789/0}<br /> election epoch 16, quorum 0,1,2 node145,node146,node147<br /> osdmap e390: 63 osds: 63 up, 63 in<br /> pgmap v28044: 2048 pgs, 2 pools, 258 GB data, 74552 objects<br /> 638 GB used, 69432 GB / 70070 GB avail<br /> 2048 active+clean<br /> client io 822 MB/s wr, 2726 op/s</p>
<p><strong>Then I have a test about ceph flapping, you can see more info in the official website:</strong></p>
<p>I do it over and over again, make the cluster flapping by disable the cluster network card of one host, but the public network is enable.</p>
<p>my ceph.conf:<br />[global]<br />fsid = 2f6f7e9e-9167-4ac5-889b-d18680fa4c04<br />mon_initial_members = node145, node146, node147<br />mon_host = 172.16.38.145,172.16.38.146,172.16.38.147</p>
<p>auth_cluster_required = cephx<br />auth_service_required = cephx<br />auth_client_required = cephx<br />filestore_xattr_use_omap = true</p>
<p>public network = 192.168.38.0/24<br />cluster network = 172.16.138.0/24 // I disabled this network of one host to make the cluster flapping</p>
<p>[client]<br />admin socket=/var/run/ceph/rbd-$pid-$name-$cctid.asok<br />log file = /var/log/ceph/ceph.client.admin.log</p>
<p><strong>finally, I find the cluster can not repair it's self anymore.</strong><br />I restart ceph-all in all the host(node145,node146,node147).<br />(1) at the begining it's ok, all osd is up.<br />(2) and 3 minutes after, two osd(A,B) is down.<br />(3) and another 5 minutes after, maybe A is up, and other osd(C,F,H) is down.<br />(4) and it's flapping a long time, may be 10 hours.<br />(5) at last, most osd is down, only few osd is up.</p>
<p>like this:</p>
<p>ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY <br />-1 68.66977 root default <br />-2 22.88992 host node145 <br /> 0 1.09000 osd.0 down 0 1.00000 <br /> 1 1.09000 osd.1 down 0 1.00000 <br /> 2 1.09000 osd.2 down 0 1.00000 <br /> 3 1.09000 osd.3 down 0 1.00000 <br /> 4 1.09000 osd.4 down 0 1.00000 <br /> 5 1.09000 osd.5 down 0 1.00000 <br /> 6 1.09000 osd.6 down 0 1.00000 <br /> 7 1.09000 osd.7 down 0 1.00000 <br /> 8 1.09000 osd.8 down 0 1.00000 <br /> 9 1.09000 osd.9 down 0 1.00000 <br />10 1.09000 osd.10 down 0 1.00000 <br />11 1.09000 osd.11 down 0 1.00000 <br />12 1.09000 osd.12 down 0 1.00000 <br />13 1.09000 osd.13 down 0 1.00000 <br />14 1.09000 osd.14 down 0 1.00000 <br />15 1.09000 osd.15 down 0 1.00000 <br />16 1.09000 osd.16 down 0 1.00000 <br />17 1.09000 osd.17 down 0 1.00000 <br />18 1.09000 osd.18 down 0 1.00000 <br />19 1.09000 osd.19 down 0 1.00000 <br />20 1.09000 osd.20 down 0 1.00000 <br />-3 22.88992 host node146 <br />21 1.09000 osd.21 down 0 1.00000 <br />22 1.09000 osd.22 down 0 1.00000 <br />23 1.09000 osd.23 down 1.00000 1.00000 <br />24 1.09000 osd.24 down 1.00000 1.00000 <br />25 1.09000 osd.25 down 0 1.00000 <br />26 1.09000 osd.26 up 1.00000 1.00000 <br />27 1.09000 osd.27 down 0 1.00000 <br />28 1.09000 osd.28 down 0 1.00000 <br />29 1.09000 osd.29 up 1.00000 1.00000 <br />30 1.09000 osd.30 up 1.00000 1.00000 <br />31 1.09000 osd.31 down 0 1.00000 <br />32 1.09000 osd.32 up 1.00000 1.00000 <br />33 1.09000 osd.33 down 0 1.00000 <br />34 1.09000 osd.34 down 1.00000 1.00000 <br />35 1.09000 osd.35 down 0 1.00000 <br />36 1.09000 osd.36 down 0 1.00000 <br />37 1.09000 osd.37 down 0 1.00000 <br />38 1.09000 osd.38 down 0 1.00000 <br />39 1.09000 osd.39 down 0 1.00000 <br />40 1.09000 osd.40 down 1.00000 1.00000 <br />41 1.09000 osd.41 down 1.00000 1.00000 <br />-4 22.88992 host node147 <br />42 1.09000 osd.42 down 1.00000 1.00000 <br />43 1.09000 osd.43 down 0 1.00000 <br />44 1.09000 osd.44 up 1.00000 1.00000 <br />45 1.09000 osd.45 down 0 1.00000 <br />46 1.09000 osd.46 down 0 1.00000 <br />47 1.09000 osd.47 up 1.00000 1.00000 <br />48 1.09000 osd.48 up 1.00000 1.00000 <br />49 1.09000 osd.49 down 0 1.00000 <br />50 1.09000 osd.50 up 1.00000 1.00000 <br />51 1.09000 osd.51 down 0 1.00000 <br />52 1.09000 osd.52 up 1.00000 1.00000 <br />53 1.09000 osd.53 down 0 1.00000 <br />54 1.09000 osd.54 up 1.00000 1.00000 <br />55 1.09000 osd.55 up 1.00000 1.00000 <br />56 1.09000 osd.56 up 1.00000 1.00000 <br />57 1.09000 osd.57 up 1.00000 1.00000 <br />58 1.09000 osd.58 up 1.00000 1.00000 <br />59 1.09000 osd.59 up 1.00000 1.00000 <br />60 1.09000 osd.60 up 1.00000 1.00000 <br />61 1.09000 osd.61 up 1.00000 1.00000 <br />62 1.09000 osd.62 up 1.00000 1.00000</p>
<p>and the "ceph -s" print:<br /> cluster 2f6f7e9e-9167-4ac5-889b-d18680fa4c04<br /> health HEALTH_WARN<br /> 3 pgs degraded<br /> 898 pgs down<br /> 1021 pgs peering<br /> 446 pgs stale<br /> 3 pgs stuck degraded<br /> 1024 pgs stuck inactive<br /> 397 pgs stuck stale<br /> 1024 pgs stuck unclean<br /> 3 pgs stuck undersized<br /> 3 pgs undersized<br /> 5459 requests are blocked > 32 sec<br /> 8/26 in osds are down<br /> monmap e1: 3 mons at {node145=172.16.38.145:6789/0,node146=172.16.38.146:6789/0,node147=172.16.38.147:6789/0}<br /> election epoch 570, quorum 0,1,2 node145,node146,node147<br /> osdmap e32575: 63 osds: 18 up, 26 in; 4 remapped pgs<br /> pgmap v506758: 1024 pgs, 1 pools, 18 bytes data, 1 objects<br /> 30706 MB used, 21102 GB / 21132 GB avail<br /> 386 stale+down+peering<br /> 294 down+peering<br /> 218 creating+down+peering<br /> 56 stale+peering<br /> 41 creating+peering<br /> 22 peering<br /> 3 stale+remapped+peering<br /> 1 undersized+degraded+peered<br /> 1 activating+undersized+degraded<br /> 1 remapped+peering<br /> 1 stale+activating+undersized+degraded</p>
<p><strong>And I uploading the log of three host, hope that helps.</strong></p> Ceph - Bug #12941 (Can't reproduce): mon/OSDMonitor.cc: 204: FAILED assert(err == 0) 0.94https://tracker.ceph.com/issues/129412015-09-04T05:43:21Zbo cai
<p>Yesterday I found my cluster is broken, later found to be two monitor is broken(A total of three), I want to repair it, so I use the command:<br /><strong>ceph-mon -i vm13</strong>__</p>
<p>But the following error?</p>
<pre>
mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
6: (Monitor::preinit()+0x7d7) [0x5bf447]
7: (main()+0x22dd) [0x5819ed]
8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
9: ceph-mon() [0x5a3607]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-09-04 13:31:48.449369 7f7fa64248c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
6: (Monitor::preinit()+0x7d7) [0x5bf447]
7: (main()+0x22dd) [0x5819ed]
8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
9: ceph-mon() [0x5a3607]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2015-09-04 13:31:48.449369 7f7fa64248c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
6: (Monitor::preinit()+0x7d7) [0x5bf447]
7: (main()+0x22dd) [0x5819ed]
8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
9: ceph-mon() [0x5a3607]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 7f7fa64248c0
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: ceph-mon() [0x9b050a]
2: (()+0x10340) [0x7f7fa5526340]
3: (gsignal()+0x39) [0x7f7fa3871cc9]
4: (abort()+0x148) [0x7f7fa38750d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
6: (()+0x5e6d6) [0x7f7fa417a6d6]
7: (()+0x5e703) [0x7f7fa417a703]
8: (()+0x5e922) [0x7f7fa417a922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
14: (Monitor::preinit()+0x7d7) [0x5bf447]
15: (main()+0x22dd) [0x5819ed]
16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
17: ceph-mon() [0x5a3607]
2015-09-04 13:31:48.451603 7f7fa64248c0 -1 *** Caught signal (Aborted) **
in thread 7f7fa64248c0
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: ceph-mon() [0x9b050a]
2: (()+0x10340) [0x7f7fa5526340]
3: (gsignal()+0x39) [0x7f7fa3871cc9]
4: (abort()+0x148) [0x7f7fa38750d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
6: (()+0x5e6d6) [0x7f7fa417a6d6]
7: (()+0x5e703) [0x7f7fa417a703]
8: (()+0x5e922) [0x7f7fa417a922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
14: (Monitor::preinit()+0x7d7) [0x5bf447]
15: (main()+0x22dd) [0x5819ed]
16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
17: ceph-mon() [0x5a3607]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2015-09-04 13:31:48.451603 7f7fa64248c0 -1 *** Caught signal (Aborted) **
in thread 7f7fa64248c0
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
1: ceph-mon() [0x9b050a]
2: (()+0x10340) [0x7f7fa5526340]
3: (gsignal()+0x39) [0x7f7fa3871cc9]
4: (abort()+0x148) [0x7f7fa38750d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
6: (()+0x5e6d6) [0x7f7fa417a6d6]
7: (()+0x5e703) [0x7f7fa417a703]
8: (()+0x5e922) [0x7f7fa417a922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
14: (Monitor::preinit()+0x7d7) [0x5bf447]
15: (main()+0x22dd) [0x5819ed]
16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
17: ceph-mon() [0x5a3607]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[27101]: (33) Numerical argument out of domain
</pre>
<p>Is this a monitor related file corruption yet?</p> rbd - Bug #12743 (Duplicate): incomplete rbd delete(command:rbd rm) damaged the rbd blockhttps://tracker.ceph.com/issues/127432015-08-21T00:04:19Zbo cai
<p>#When I try to delete a rbd block(the size is 20G),I use the command<br />rbd -p rbd rm myblock</p>
<p>#When this command is executed, I used CTRL + c to cancel the order</p>
<p>#When I try again to remove the blocks, the following error happened<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.</p>
<p>#Here is my interface screenshots</p>
<p>root@vm13:~# rbd -p rbd create myblock --size 20480 <br />root@vm13:~# rbd -p rbd rm myblock<br />Removing image: 63% complete...^C<br />root@vm13:~# rbd -p rbd rm myblock<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.<br />root@vm13:~#</p> rbd - Bug #12708 (Resolved): Even if RBD export-diff failure, will still create a local filehttps://tracker.ceph.com/issues/127082015-08-17T02:36:59Zbo cai
<p>root@wfnd95:/caibo# clear<br />#there is no file in here<br />root@wfnd95:/caibo# ls<br />#i try to use export-diff,it must be fail because the snap don't exist<br />root@wfnd95:/caibo# rbd export-diff myblock@mysnap --from-snap not_exist_snap local_file<br />Exporting image: 0% complete...failed.<br />rbd: export-diff error: (2) No such file or directory<br />#but it still create a local file <br />root@wfnd95:/caibo# ls<br />local_file<br />root@wfnd95:/caibo#</p> rbd - Bug #12700 (Duplicate): incomplete rbd delete(command:rbd rm) damaged the rbd blockhttps://tracker.ceph.com/issues/127002015-08-15T03:15:05Zbo cai
<p>#When I try to delete a rbd block(the size is 20G),I use the command<br />rbd -p rbd rm myblock</p>
<p>#When this command is executed, I used CTRL + c to cancel the order</p>
<p>#When I try again to remove the blocks, the following error happened<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.</p>
<p>#Here is my interface screenshots</p>
<p>root@vm13:~# rbd -p rbd create myblock --size 20480 <br />root@vm13:~# rbd -p rbd rm myblock<br />Removing image: 63% complete...^C<br />root@vm13:~# rbd -p rbd rm myblock<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.<br />root@vm13:~#</p> rbd - Bug #12618 (Closed): incomplete rbd delete(command:rbd rm) damaged the rbd blockhttps://tracker.ceph.com/issues/126182015-08-05T12:16:31Zbo cai
<p>#When I try to delete a rbd block(the size is 20G),I use the command<br />rbd -p rbd rm myblock</p>
<p>#When this command is executed, I used CTRL + c to cancel the order</p>
<p>#When I try again to remove the blocks, the following error happened<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.</p>
<p>#Here is my interface screenshots</p>
<p>root@vm13:~# rbd -p rbd create myblock --size 20480 <br />root@vm13:~# rbd -p rbd rm myblock<br />Removing image: 63% complete...^C<br />root@vm13:~# rbd -p rbd rm myblock<br />2015-08-05 20:07:27.473964 7f93db26d840 -1 librbd: image has watchers - not removing<br />Removing image: 0% complete...failed.<br />rbd: error: image still has watchers<br />This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.<br />root@vm13:~#</p> Linux kernel client - Bug #12579 (Duplicate): rbd map leads to system crash!https://tracker.ceph.com/issues/125792015-08-03T23:54:22Zbo cai
<p>#I create a rbd block named r1<br />rbd -p rbd create r1 --size 1024</p>
<p>#then create a snapshot of r1<br />rbd -p rbd snap create --image r1 --snap r1-s1</p>
<p>#then protect this snapshot and clone a new block from this snapshot<br />rbd snap protect --image r1 --snap r1-s1<br />rbd clone --image r1 --snap r1-s1 --dest r2</p>
<p>#then create a snapshot of r2 , protect the snapshot , clone a new block named r3<br />rbd snap create --image r2 --snap r2-s1<br />rbd snap protect --image r2 --snap r2-s1<br />rbd clone --image r2 --snap r2-s1 --dest r3</p>
<p>#Do the same thing as above, loop 100 times (for example, you can cycle more)</p>
<p>#then I get a block named r101(get from 100 times clone)<br />#then I try to map this block?I use this command<br />rbd -p rbd map r101</p>
<p>#then my system crash<br />#In the below picture,You can see some infomation about system crash</p>
<p>This happens in the real machine, rather than the virtual machine, and I also tested on the virtual machine, found the same problem</p>