Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-10-22T16:34:33Z
Ceph
Redmine
RADOS - Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to par...
https://tracker.ceph.com/issues/47951
2020-10-22T16:34:33Z
Wido den Hollander
wido@42on.com
<p>I performed a test upgrade to 14.2.12 today on a cluster using IPv6 with Round Robin DNS for mon_host</p>
<pre>
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = 0d56dd8f-7ae0-4447-b51b-f8b818749307
mon_host = mon.objects.xxxx
ms_bind_ipv6 = true
</pre>
<p>Running 'ceph -s' now fails:</p>
<pre>
root@wido-standard-benchmark:~# ceph -s
unable to parse addrs in 'mon.objects.xxx.xxxx.xxxx'
[errno 22] error connecting to the cluster
root@wido-standard-benchmark:~#
</pre>
<p>The hostname is a Round Robin DNS entry pointing to IPv6 addresses:</p>
<pre>
root@wido-standard-benchmark:~# host mon.objects.ams02.cldin.net
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:84b5:85ff:zzzz:33bf
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:645f:97ff:zzzz:2b2a
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:3416:d5ff:zzzz:18db
root@wido-standard-benchmark:~#
</pre>
<p>I took a look with strace and I found this:</p>
<pre>
14980 socket(AF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxx:d:84b5:85ff:fe40:33bf", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(52258), inet_pton(AF_INET6, "2a05:xxx:xxx:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxx:d:645f:97ff:fe7f:2b2a", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(52850), inet_pton(AF_INET6, "2a05:xxx:xxx:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxxx:d:3416:d5ff:fe92:18db", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(35119), inet_pton(AF_INET6, "2a05:xxx:702:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 close(3) = 0
14980 write(2, "unable to parse addrs in '", 26) = 26
14980 write(2, "mon.objects.xxx.xxx.net", 27) = 27
14980 write(2, "'", 1) = 1
14980 write(2, "\n", 1)
</pre>
<p>It performs the DNS lookup, but then it doesn't know what to do with it it seems.</p>
<p>Setting this one to Urgent as it breaks existing cluster.</p>
Ceph - Bug #18638 (Resolved): OSD metadata reports filestore when using bluestore
https://tracker.ceph.com/issues/18638
2017-01-23T19:29:06Z
Wido den Hollander
wido@42on.com
<p>When testing with Kraken 11.2.0 I checked the metadata of my OSDs running BlueStore, but found out they report filestore as objectstore.</p>
<pre>root@alpha:~# ceph osd metadata 0
{
"id": 0,
"arch": "x86_64",
"back_addr": "[2001:db8::100]:6801\/3634",
"ceph_version": "ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)",
"cpu": "Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz",
"distro": "ubuntu",
"distro_description": "Ubuntu 16.04.1 LTS",
"distro_version": "16.04",
"front_addr": "[2001:db8::100]:6800\/3634",
"hb_back_addr": "[2001:db8::100]:6802\/3634",
"hb_front_addr": "[2001:db8::100]:6803\/3634",
"hostname": "alpha",
"kernel_description": "#50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016",
"kernel_version": "4.4.0-31-generic",
"mem_swap_kb": "522236",
"mem_total_kb": "500192",
"os": "Linux",
"osd_data": "\/var\/lib\/ceph\/osd\/ceph-0",
"osd_journal": "\/var\/lib\/ceph\/osd\/ceph-0\/journal",
"osd_objectstore": "filestore"
}
root@alpha:~#</pre>
<p>Seems that the OSD uses the osd_objectstore configuration directive and passes that in the metadata.</p>
Ceph - Bug #16672 (Resolved): OSD crash with Hammer to Jewel Upgrade: void FileStore::init_temp_c...
https://tracker.ceph.com/issues/16672
2016-07-13T08:33:19Z
Wido den Hollander
wido@42on.com
<p>During a upgrade from 0.94.7 to 10.2.2 I'm seeing multiple OSDs crash with this backtrace:</p>
<pre>-7> 2016-07-13 10:23:42.811650 7f3f5fd0c800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_features: splice is supported
-6> 2016-07-13 10:23:42.843643 7f3f5fd0c800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
-5> 2016-07-13 10:23:42.844736 7f3f5fd0c800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-476) detect_feature: extsize is disabled by conf
-4> 2016-07-13 10:23:42.847300 7f3f5fd0c800 1 leveldb: Recovering log #876
-3> 2016-07-13 10:23:42.894086 7f3f5fd0c800 1 leveldb: Delete type=0 #876
-2> 2016-07-13 10:23:42.894171 7f3f5fd0c800 1 leveldb: Delete type=3 #875
-1> 2016-07-13 10:23:42.895373 7f3f5fd0c800 0 filestore(/var/lib/ceph/osd/ceph-476) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
0> 2016-07-13 10:23:42.905145 7f3f5fd0c800 -1 os/filestore/FileStore.cc: In function 'void FileStore::init_temp_collections()' thread 7f3f5fd0c800 time 2016-07-13 10:23:42.902566
os/filestore/FileStore.cc: 1735: FAILED assert(r == 0)
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f3f607375b5]
2: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613]
3: (FileStore::mount()+0x3ed0) [0x7f3f604036d0]
4: (OSD::init()+0x27d) [0x7f3f600c704d]
5: (main()+0x2c55) [0x7f3f6002cbe5]
6: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15]
7: (()+0x353009) [0x7f3f60077009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 0 rados
0/ 0 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 5 optracker
0/ 5 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
0/ 0 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.476.log
--- end dump of recent events ---
2016-07-13 10:23:42.909150 7f3f5fd0c800 -1 *** Caught signal (Aborted) **
in thread 7f3f5fd0c800 thread_name:ceph-osd
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x91341a) [0x7f3f6063741a]
2: (()+0xf100) [0x7f3f5e66f100]
3: (gsignal()+0x37) [0x7f3f5cc325f7]
4: (abort()+0x148) [0x7f3f5cc33ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f3f60737797]
6: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613]
7: (FileStore::mount()+0x3ed0) [0x7f3f604036d0]
8: (OSD::init()+0x27d) [0x7f3f600c704d]
9: (main()+0x2c55) [0x7f3f6002cbe5]
10: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15]
11: (()+0x353009) [0x7f3f60077009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2016-07-13 10:23:42.909150 7f3f5fd0c800 -1 *** Caught signal (Aborted) **
in thread 7f3f5fd0c800 thread_name:ceph-osd
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x91341a) [0x7f3f6063741a]
2: (()+0xf100) [0x7f3f5e66f100]
3: (gsignal()+0x37) [0x7f3f5cc325f7]
4: (abort()+0x148) [0x7f3f5cc33ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f3f60737797]
6: (FileStore::init_temp_collections()+0xa63) [0x7f3f603ff613]
7: (FileStore::mount()+0x3ed0) [0x7f3f604036d0]
8: (OSD::init()+0x27d) [0x7f3f600c704d]
9: (main()+0x2c55) [0x7f3f6002cbe5]
10: (__libc_start_main()+0xf5) [0x7f3f5cc1eb15]
11: (()+0x353009) [0x7f3f60077009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 0 rados
0/ 0 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 5 optracker
0/ 5 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
0/ 0 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.476.log
--- end dump of recent events ---</pre>
<p>If I try to start them again it works, but sometimes requires two additional tries.</p>
<pre>systemctl reset-failed ceph-osd@476.service
systemctl start ceph-osd@476.service</pre>
<p>It is a bit hard to diagnose since it doesn't happen on all OSDs and it doesn't always happen a second time on the same OSD.</p>
Ceph - Bug #14028 (Won't Fix): PG stats are not refresh within osd_pg_stat_report_interval_max
https://tracker.ceph.com/issues/14028
2015-12-09T16:14:00Z
Wido den Hollander
wido@42on.com
<p>OSDs do not seem to refresh their PG stats at least every osd_pg_stat_report_interval_max seconds.</p>
<p>On a test cluster which is idle, no I/O at all:</p>
<pre>root@bravo:~# cat /var/log/ceph/ceph-osd.1.log|grep publish_stats_to_osd|tail -n 10 && date
2015-12-09 16:58:27.347286 7f824a9c0700 15 osd.1 pg_epoch: 77 pg[0.20( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,0,2] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77:100
2015-12-09 16:58:27.352735 7f8257b4d700 15 osd.1 pg_epoch: 77 pg[0.21( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77: no change since
2015-12-09 16:58:27.370986 7f8257b4d700 15 osd.1 pg_epoch: 77 pg[0.23( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77: no change since
2015-12-09 16:58:27.380638 7f8257b4d700 15 osd.1 pg_epoch: 77 pg[0.25( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,0,2] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77: no change since
2015-12-09 16:58:27.380816 7f8257b4d700 15 osd.1 pg_epoch: 77 pg[0.1c( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77: no change since
2015-12-09 16:58:27.382065 7f8257b4d700 15 osd.1 pg_epoch: 77 pg[0.20( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,0,2] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77: no change since
2015-12-09 16:58:30.896598 7f82469b8700 15 osd.1 pg_epoch: 77 pg[0.34( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] publish_stats_to_osd 77:101
2015-12-09 16:58:30.904211 7f82491bd700 15 osd.1 pg_epoch: 77 pg[0.34( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77:102
2015-12-09 16:58:33.907928 7f82481bb700 15 osd.1 pg_epoch: 77 pg[0.13( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] publish_stats_to_osd 77:103
2015-12-09 16:58:33.913326 7f82481bb700 15 osd.1 pg_epoch: 77 pg[0.13( empty local-les=77 n=0 ec=1 les/c/f 77/77/0 76/76/76) [1,2,0] r=0 lpr=76 crt=0'0 mlcod 0'0 active+clean] publish_stats_to_osd 77:104
Wed Dec 9 17:10:40 CET 2015
root@bravo:~#</pre>
<p>After 12 minutes there have been no PG stat updates.</p>
<p>pg query shows:</p>
<pre> "stats": {
"version": "0'0",
"reported_seq": "97",
"reported_epoch": "75",
"state": "active+undersized+degraded",
"last_fresh": "2015-12-09 16:58:15.469734",
"last_change": "2015-12-09 16:58:15.468556",
"last_active": "2015-12-09 16:58:15.469734",</pre>
<p>Ceph version in this case is running from master:</p>
<pre>root@bravo:~# ceph -v
ceph version 10.0.0-723-g8ce9302 (8ce9302a975d683c2e6fd21214c96e974b4a71fc)
root@bravo:~#</pre>
<p>Even after waiting 15 minutes, the PG was not refreshed.</p>
Ceph - Feature #6507 (Resolved): librados shouldn't block indefinitely when cluster doesn't respond
https://tracker.ceph.com/issues/6507
2013-10-10T13:41:38Z
Wido den Hollander
wido@42on.com
<p>Currently calls like rados_connect(), rados_ioctx_create(), rados_read() block for ever if the Ceph cluster isn't responding.</p>
<p>A parameter which configures a timeout would be useful in a lot of situations so applications can fail when the Ceph cluster is not responding rather then simply hang for ever.</p>
<p>My use-case here is libvirt, when it calls a storagePoolRefresh it tries to query for RBD images, but if the Ceph cluster isn't responding libvirt will eventually lock up.</p>
Ceph - Bug #4851 (Resolved): leveldb: hang on leveldb::DBImpl::MakeRoomForWrite(bool)
https://tracker.ceph.com/issues/4851
2013-04-29T06:01:29Z
Wido den Hollander
wido@42on.com
<p>While testing with the next branch (50e58b9f49382d690f5a22af80f6981f1c12d4c3) I stumbled upon the problem that creating for example a auth key took a long time:</p>
<pre>ceph auth get-or-create XXXX</pre>
<p>That took 5 minutes, but I wasn't able to record it with logs, so later on I tried to remove the same key which took 4 minutes:</p>
<pre>ceph auth del client.openstack</pre>
<p>In this case I have 3 monitors where this command went to "mon2":</p>
<pre>2013-04-29 14:22:15.299854 7fd523711700 1 -- 10.23.24.9:6789/0 <== client.? 10.23.24.4:0/29190 5 ==== mon_command(auth del client.openstack v 0) v1 ==== 73+0+0 (2842674342 0 0) 0xc2af000 con 0xad8a580
2013-04-29 14:22:15.299991 7fd523711700 0 mon.mon2@1(peon) e1 handle_command mon_command(auth del client.openstack v 0) v1
2013-04-29 14:22:15.376928 7fd523711700 10 mon.mon2@1(peon).auth v764 preprocess_query mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190
2013-04-29 14:22:15.396140 7fd523711700 10 mon.mon2@1(peon) e1 forward_request 828 request mon_command(auth del client.openstack v 0) v1
2013-04-29 14:22:15.396161 7fd523711700 1 -- 10.23.24.9:6789/0 --> mon.0 10.23.24.8:6789/0 -- forward(mon_command(auth del client.openstack v 0) v1) to leader v1 -- ?+0 0xc2ab300
2013-04-29 14:26:09.397449 7fd523f12700 10 mon.mon2@1(leader) e1 requeue for self tid 828 mon_command(auth del client.openstack v 0) v1
2013-04-29 14:26:09.432538 7fd523f12700 0 mon.mon2@1(leader) e1 handle_command mon_command(auth del client.openstack v 0) v1
2013-04-29 14:26:12.245758 7fd523711700 10 mon.mon2@1(leader).auth v764 preprocess_query mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190
2013-04-29 14:26:12.261239 7fd523711700 10 mon.mon2@1(leader).auth v764 prepare_update mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190</pre>
<p>Here you see that "mon2" forwarded the request to mon1 (10.23.24.8).</p>
<p>Looking in mon1's logs I see:</p>
<pre>2013-04-29 14:27:19.907293 7fdf507ac700 1 -- 10.23.24.8:6789/0 <== mon.1 10.23.24.9:6789/0 5337 ==== forward(mon_command(auth del client.openstack v 0) v1) to leader v1 ==== 343+0+0 (3205198275 0 0) 0x116d9000 con 0x2e409a0
2013-04-29 14:27:19.907431 7fdf507ac700 0 mon.mon1@0(leader) e1 handle_command mon_command(auth del client.openstack v 0) v1</pre>
<p>Looking at the timestamps this is very weird. How can mon1 report that this happened at 14:27 while mon2 reports it happend at 14:22 ~ 14:26?</p>
<p>I verified, the clocks on all 3 monitors are synced, there is no difference between them.</p>
<p>The logs from both mon1 and mon2 are attached.</p>
Ceph - Bug #4837 (Resolved): mon: FAILED assert(!(sync_role & SYNC_ROLE_REQUESTER))
https://tracker.ceph.com/issues/4837
2013-04-26T12:50:42Z
Wido den Hollander
wido@42on.com
<p>I just upgraded 3 monitors from 0.56.4 to 0.60 (next branch) and saw a monitor crash when I ran:</p>
<p>$ ceph osd unset noout</p>
<p>I upgraded the three mons first to 0.60 and after that was done I set noout:</p>
<p>$ ceph osd set noout</p>
<p>After all OSDs were up and running again I wanted to unset the flag, which caused the monitor to crash.</p>
<p>The logs showed:</p>
<pre> -11> 2013-04-26 21:33:07.030062 7f52043f5700 2 -- 10.23.24.8:6789/0 >> 10.23.24.58:6812/30099 pipe(0x312ff280 sd=7 :6789 s=2 pgs=117 cs=1 l=1).fault 0: Success
-10> 2013-04-26 21:33:07.056466 7f52043f5700 1 -- 10.23.24.8:6789/0 >> :/0 pipe(0x2c9f8500 sd=7 :6789 s=0 pgs=0 cs=0 l=0).accept sd=7 10.23.24.57:53299/0
-9> 2013-04-26 21:33:07.077770 7f52029e2700 1 -- 10.23.24.8:6789/0 >> :/0 pipe(0x389ab000 sd=30 :6789 s=0 pgs=0 cs=0 l=0).accept sd=30 10.23.24.58:56868/0
-8> 2013-04-26 21:33:07.148088 7f5207315700 1 -- 10.23.24.8:6789/0 --> mon.1 10.23.24.9:6789/0 -- mon_sync( start ) v1 -- ?+0 0x26da580
-7> 2013-04-26 21:33:07.148337 7f5207315700 1 -- 10.23.24.8:6789/0 <== mon.2 10.23.24.10:6789/0 119568 ==== mon_probe(reply 4a5a7836-1d75-4e09-b3dc-80e5ae13d681 name mon3
quorum 0,1,2 paxos( fc 175909 lc 175933 )) v4 ==== 579+0+0 (2730281254 0 0) 0x7a95900 con 0x278f9a0
-6> 2013-04-26 21:33:07.148442 7f5207315700 1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( start_reply ) v1 ==== 0+0+0 (0 0 0) 0x2e7252c0 con 0x278f2c
0
-5> 2013-04-26 21:33:07.148541 7f5207315700 1 -- 10.23.24.8:6789/0 --> mon.0 10.23.24.8:6789/0 -- mon_sync( heartbeat ) v1 -- ?+0 0x2ef7f580
-4> 2013-04-26 21:33:07.148586 7f5207315700 1 -- 10.23.24.8:6789/0 --> mon.0 10.23.24.8:6789/0 -- mon_sync( start_chunks ) v1 -- ?+0 0x2ef7f840
-3> 2013-04-26 21:33:07.148622 7f5207315700 1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( heartbeat ) v1 ==== 0+0+0 (0 0 0) 0x2ef7f580 con 0x278f2c0
-2> 2013-04-26 21:33:07.148667 7f5207315700 1 mon.mon1@0(synchronizing sync( requester state chunks )) e1 handle_sync_heartbeat ignored stray message mon_sync( heartbeat ) v1
-1> 2013-04-26 21:33:07.148698 7f5207315700 1 -- 10.23.24.8:6789/0 <== mon.0 10.23.24.8:6789/0 0 ==== mon_sync( start_chunks ) v1 ==== 0+0+0 (0 0 0) 0x2ef7f840 con 0x278f2c0
0> 2013-04-26 21:33:07.150441 7f5207315700 -1 mon/Monitor.cc: In function 'void Monitor::handle_sync_start_chunks(MMonSync*)' thread 7f5207315700 time 2013-04-26 21:33:07.148735
mon/Monitor.cc: 1136: FAILED assert(!(sync_role & SYNC_ROLE_REQUESTER))
ceph version 0.60-669-ga2a23cc (a2a23ccd959f6e7ebe1533b27e7320902624523b)
1: (Monitor::handle_sync_start_chunks(MMonSync*)+0x84d) [0x4b060d]
2: (Monitor::handle_sync(MMonSync*)+0x2cb) [0x4c1b4b]
3: (Monitor::_ms_dispatch(Message*)+0xd90) [0x4c2ae0]
4: (Monitor::ms_dispatch(Message*)+0x32) [0x4d96f2]
5: (DispatchQueue::entry()+0x3f1) [0x6aef01]
6: (DispatchQueue::DispatchThread::entry()+0xd) [0x63d55d]
7: (()+0x7e9a) [0x7f520c078e9a]
8: (clone()+0x6d) [0x7f520a628ccd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre>
<p>The full log is attached.</p>
<p>I set the target version to 0.61 since this was on the next branch which will be the cuttlefish release soon.</p>
Ceph - Bug #4572 (Resolved): osd crash with: 0 == "we got a bad state machine event"
https://tracker.ceph.com/issues/4572
2013-03-28T11:06:25Z
Wido den Hollander
wido@42on.com
<p>This seems like <a class="issue tracker-1 status-3 priority-6 priority-high2 closed" title="Bug: osd crash in recovery state: FAILED assert(0 == "we got a bad state machine event") (Resolved)" href="https://tracker.ceph.com/issues/4042">#4042</a> but the backtrace seems different.</p>
<p>After resolving <a class="issue tracker-1 status-3 priority-6 priority-high2 closed" title="Bug: OSDs crash with OSD::handle_op during recovery (Resolved)" href="https://tracker.ceph.com/issues/4556">#4556</a> I tried to recover the cluster, but in the end 18 out of the 40 OSDs survived and were running.</p>
<p>The other 22 seem to have crash with <strong>almost</strong> similair backtraces.</p>
<p>For example osd.2:</p>
<pre>(gdb) bt
#0 0x00007f2fa5229b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000078910e in reraise_fatal (signum=6) at global/signal_handler.cc:58
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3 <signal handler called>
#4 0x00007f2fa3be8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f2fa3bebb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007f2fa453a69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007f2fa4538846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007f2fa4538873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007f2fa453896e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000008343af in ceph::__ceph_assert_fail (assertion=0x9123b0 "0 == \"we got a bad state machine event\"", file=<optimized out>, line=5250,
func=0x916ca0 "PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)") at common/assert.cc:77
#11 0x000000000068866b in PG::RecoveryState::Crashed::Crashed (this=0x37b59b0, ctx=...) at osd/PG.cc:5250
#12 0x00000000006b4496 in shallow_construct (outermostContextBase=..., pContext=<optimized out>) at /usr/include/boost/statechart/state.hpp:89
#13 deep_construct (outermostContextBase=..., pContext=<optimized out>) at /usr/include/boost/statechart/state.hpp:79
#14 boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Crashed, boost::mpl::l_end>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator> >::construct (outermostContextBase=..., pContext=<optimized out>)
at /usr/include/boost/statechart/detail/constructor.hpp:93
#15 0x00000000006d2643 in transit_impl<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::statechart::detail::no_transition_function> (this=0x37b59b0, transitionAction=...)
at /usr/include/boost/statechart/simple_state.hpp:798
#16 transit<PG::RecoveryState::Crashed> (this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:314
#17 react_without_action (stt=...) at /usr/include/boost/statechart/transition.hpp:38
#18 react (stt=...) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:47
#19 react (stt=..., evt=...) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:68
#20 react (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:109
#21 react<PG::RecoveryState::Reset, boost::statechart::event_base, void const*> (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/transition.hpp:59
#22 local_react_impl<boost::mpl::list1<boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:816
#23 local_react<boost::mpl::list1<boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (this=0x37b59b0, evt=..., eventType=<optimized out>)
at /usr/include/boost/statechart/simple_state.hpp:851
#24 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:820
#25 0x00000000006d2794 in local_react<boost::mpl::list2<boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (
eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#26 local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=0xc2b5c0, evt=...,
stt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#27 local_react<boost::mpl::list3<boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#28 local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., eventType=0xc2b5c0, evt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#29 local_react<boost::mpl::list4<boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#30 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::st---Type <return> to continue, or q <return> to quit---
atechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=...,
eventType=0xc2b5c0) at /usr/include/boost/statechart/simple_state.hpp:820
#31 0x00000000006d28ce in local_react<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0)
at /usr/include/boost/statechart/simple_state.hpp:851
#32 local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., eventType=0xc2b5c0, evt=...)
at /usr/include/boost/statechart/simple_state.hpp:820
#33 local_react<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=...,
this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#34 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x37b59b0, evt=..., eventType=0xc2b5c0)
at /usr/include/boost/statechart/simple_state.hpp:489
#35 0x00000000006bb03b in operator() (this=<synthetic pointer>) at /usr/include/boost/statechart/state_machine.hpp:87
#36 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, const void*>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial>::exception_event_handler> (action=..., this=<optimized out>)
at /usr/include/boost/statechart/null_exception_translator.hpp:33
#37 boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x37c6768,
evt=...) at /usr/include/boost/statechart/state_machine.hpp:885
#38 0x00000000006bb311 in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event (this=0x37c6768, evt=...) at /usr/include/boost/statechart/state_machine.hpp:275
#39 0x000000000067b3f7 in handle_event (rctx=0x7f2f93c008e0, evt=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x37c6768) at osd/PG.h:1717
#40 PG::handle_peering_event (this=0x37c5400, evt=..., rctx=0x7f2f93c008e0) at osd/PG.cc:5114
#41 0x0000000000625118 in OSD::process_peering_events (this=0x2ffc000, pgs=..., handle=...) at osd/OSD.cc:6230
#42 0x000000000065b5d0 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:748
#43 0x00000000008297e6 in ThreadPool::worker (this=0x2ffc458, wt=0x9e67c20) at common/WorkQueue.cc:119
#44 0x000000000082b610 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:316
#45 0x00007f2fa5221e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#46 0x00007f2fa3ca5cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#47 0x0000000000000000 in ?? ()
(gdb)</pre>
<p>I'm running 0.56.4 and it seems the fix for <a class="issue tracker-7 status-6 priority-4 priority-default closed" title="Fix: osd: pg delete (Rejected)" href="https://tracker.ceph.com/issues/4202">#4202</a> went into the bobtail branch, so that shouldn't be the case.</p>
<p>I'll upload the logs of osd.2 and osd.7 to the cephdrop sftp account since they are both quite big (~150MB each).</p>
<p>I tried to start the OSDs and some of them survived. By starting them host by host I'm now at 21/40 and I'll continue to start OSDs.</p>
Ceph - Bug #4556 (Resolved): OSDs crash with OSD::handle_op during recovery
https://tracker.ceph.com/issues/4556
2013-03-26T12:18:07Z
Wido den Hollander
wido@42on.com
<p>While tracking down <a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: osd/OSD.cc: 3318: FAILED assert(osd_lock.is_locked()) (Resolved)" href="https://tracker.ceph.com/issues/3816">#3816</a> I stumbled upon this one multiple times.</p>
<p>I tried the upgrade to 0.56.4 to be sure, but that didn't change anything.</p>
<pre> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
1: /usr/bin/ceph-osd() [0x788fba]
2: (()+0xfcb0) [0x7f083e63ecb0]
3: (gsignal()+0x35) [0x7f083cffd425]
4: (abort()+0x17b) [0x7f083d000b8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f083d94f69d]
6: (()+0xb5846) [0x7f083d94d846]
7: (()+0xb5873) [0x7f083d94d873]
8: (()+0xb596e) [0x7f083d94d96e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8343af]
10: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x12d8) [0x624668]
11: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0xe9) [0x62cba9]
12: (OSD::do_waiters()+0x1a5) [0x62d105]
13: (OSD::ms_dispatch(Message*)+0x1c2) [0x636a82]
14: (DispatchQueue::entry()+0x349) [0x8c7399]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x81fbad]
16: (()+0x7e9a) [0x7f083e636e9a]
17: (clone()+0x6d) [0x7f083d0bacbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre>
<p>This cluster has already sustained a lot of issues and some OSDs have been down and out for quite some time now.</p>
<p>I added the logs of two OSDs:<br />- osd.2<br />- osd.38</p>
<p>It goes wrong during the peering process. All 40 OSDs are active and trying to recover, but one by one they keep falling down until I eventually end up with 11 OSDs and a cluster in a very bad state.</p>
<pre>osdmap e21977: 40 osds: 11 up, 11 in</pre>
<p>In the end it's just 11 OSDs surviving. I added the output of "ceph osd tree". As you can see it's 11 OSDs surviving, but not always the same ones.</p>
<p>The attached logs were produced with debug osd = 20</p>
<p>From what I make up out of the logs it goes wrong when OSDs transition to the Primary state for a PG, that seems to go wrong and they crash.</p>
Ceph - Bug #4042 (Resolved): osd crash in recovery state: FAILED assert(0 == "we got a bad state ...
https://tracker.ceph.com/issues/4042
2013-02-07T08:40:25Z
Wido den Hollander
wido@42on.com
<p>I just rebooted a couple of my 0.56.2 nodes and out of 12 OSDs one went down with:</p>
<pre>2013-02-07 16:56:43.187856 7fb172131700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread
7fb172131700 time 2013-02-07 16:56:43.164424
osd/PG.cc: 5198: FAILED assert(0 == "we got a bad state machine event")</pre>
<p>I've attached the logs of the OSD, but the debug is on default levels.</p>
<p>I tried to restart the OSD with higher debugging, but it then recovered just fine.</p>
<p>What I noticed is this:</p>
<pre>filestore(/var/lib/ceph/osd/ceph-11) waiting 51 > 50 ops || 67697 > 104857600</pre>
<p>The only occurrence of that line is on osd.11's log, it doesn't show up in any other log.</p>
Ceph - Bug #666 (Resolved): OSD: ReplicatedPG::eval_repop
https://tracker.ceph.com/issues/666
2010-12-21T02:33:39Z
Wido den Hollander
wido@42on.com
<p>On my 'noisy' machine I wanted to see how fast writing a 10GB file would go, but this caused some crashes.</p>
<p>This machine hosts 4 OSD's, each with a 200MB journal on a tmpfs.</p>
<pre>root@noisy:~# dd if=/dev/zero bs=1024k count=10240|pv > /mnt/ceph/10GB.bin
9.8GB 0:06:47 [31.5MB/s] [ <=> </pre>
<p>The write stalled there and didn't go any further.</p>
<p>Around 3 ~ 4GB the OSD's started to crash, eventually 3 out of 4 OSD's crashed with the same backtrace:</p>
<pre>2010-12-21 10:38:51.771608 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) async snap create 'snap_5545' transid 79 got 0 Success
2010-12-21 10:38:51.771673 7fe3e3eb2700 filestore(/var/lib/ceph/osd.0) waiting for transid 79 to complete
2010-12-21 10:38:51.771699 7fe3de3a4700 filestore(/var/lib/ceph/osd.0) queue_op 0x44986c0 seq 5546 1306 bytes (queue has 1 ops and 1306 bytes)
2010-12-21 10:38:51.771731 7fe3de3a4700 osd0 17 dequeue_op 0x413eb80 finish
2010-12-21 10:38:51.771747 7fe3de3a4700 osd0 17 dequeue_op osd_sub_op_reply(client4104.1:890 0.51 10000000000.00000373/head [] ondisk = 0) v1 pg pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean], 28 more pending
2010-12-21 10:38:51.771795 7fe3de3a4700 osd0 17 take_peer_stat peer osd1 stat(2010-12-21 10:38:51.014427 oprate=2.67588 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-21 10:38:51.771835 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] repop_ack rep_tid 478 op osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]) result 0 ack_type 4 from osd1
2010-12-21 10:38:51.771863 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] eval_repop repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) wants=d
2010-12-21 10:38:51.771888 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'1 active+clean] sending commit on repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[])) 0x43838c0
2010-12-21 10:38:51.771934 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.771959 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
2010-12-21 10:38:51.771983 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] removing repgather(0x444ce10 applied 17'5 rep_tid=478 wfack= wfdisk= op=osd_op(client4104.1:890 10000000000.00000373 [write 0~4194304 [1@-1]] 0.ba51 snapc 1=[]))
2010-12-21 10:38:51.772004 7fe3de3a4700 osd0 17 pg[0.51( v 17'6 (17'1,17'6] n=6 ec=2 les=14 12/12/2) [0,2,1] r=0 mlcod 17'4 active+clean] q front is repgather(0x4819f00 applied 17'3 rep_tid=461 wfack= wfdisk=1 op=osd_op(client4104.1:833 10000000000.0000033b [write 0~4194304 [1@-1]] 0.2451 snapc 1=[]))
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)':
osd/ReplicatedPG.cc:2147: FAILED assert(repop_queue.front() == repop)
ceph version 0.24 (commit:180a4176035521940390f4ce24ee3eb7aa290632)
1: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x852) [0x484cb2]
2: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x168) [0x4865b8]
3: (ReplicatedPG::sub_op_modify_reply(MOSDSubOpReply*)+0x13c) [0x4868dc]
4: (OSD::dequeue_op(PG*)+0x122) [0x4e4172]
5: (ThreadPool::worker()+0x28f) [0x5dd40f]
6: (ThreadPool::WorkThread::entry()+0xd) [0x508e1d]
7: (Thread::_entry_func(void*)+0xa) [0x479baa]
8: (()+0x69ca) [0x7fe3e9d2e9ca]
9: (clone()+0x6d) [0x7fe3e89ba70d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre>
<p>It's easily reproducible, just setup a similar cluster and start writing the 10GB file.</p>
<p>What I did notice is that the write speeds dropped pretty hard as soon as my journal filled up, but this is the same as <a class="issue tracker-1 status-3 priority-7 priority-highest closed" title="Bug: Journaling Causes System Hang (Resolved)" href="https://tracker.ceph.com/issues/531">#531</a> and <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: osd: improve op+journal throttling (Resolved)" href="https://tracker.ceph.com/issues/615">#615</a> I assume?</p>
<p>I ran cdebugpack and uploaded the data to <strong>logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_eval_repop</strong>. During the test, debug osd and filestore was at 20.</p>
<p>After restarting the crashed OSD's the cluster fully recovered, but the FS stayed unresponsive, a 'sync' stalled:</p>
<pre>libceph: tid 2530 timed out on osd1, will reset osd</pre>
<p>But the logs didn't show the OSD was stalled or something, it's logs showed it was running and doing stuff.</p>
<p>I then restarted <strong>osd1</strong> and after a few minutes the 'sync' finished.</p>
Ceph - Bug #631 (Won't Fix): OSD: FileJournal::committed_thru
https://tracker.ceph.com/issues/631
2010-12-05T01:55:50Z
Wido den Hollander
wido@42on.com
<p>On a small cluster (1 MDS, 1 MON, 3 OSD's) I just saw 2 OSD's crashing with the same backtrace:</p>
<pre>2010-12-04 22:01:01.823801 7f7bfd2f1710 osd0 96 update_osd_stat osd_stat(3258 MB used, 99141 MB avail, 102400 MB total, peers []/[])
2010-12-04 22:01:01.823864 7f7bfd2f1710 osd0 96 heartbeat: stat(2010-12-04 22:01:01.823671 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2010-12-04 22:01:01.823898 7f7bfd2f1710 osd0 96 heartbeat: osd_stat(3258 MB used, 99141 MB avail, 102400 MB total, peers []/[])
os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)':
os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq)
ceph version 0.24~rc (commit:23f370436eeb20f02c0d20699f53071a1424cebf)
1: (FileJournal::committed_thru(unsigned long)+0xb1) [0x58a931]
2: (JournalingObjectStore::commit_finish()+0x8c) [0x58110c]
3: (FileStore::sync_entry()+0xd03) [0x573853]
4: (FileStore::SyncThread::entry()+0xd) [0x507bed]
5: (Thread::_entry_func(void*)+0xa) [0x4797fa]
6: (()+0x69ca) [0x7f7c0aa7d9ca]
7: (clone()+0x6d) [0x7f7c0970970d]</pre>
<p>This is on the <strong>RC</strong> branch.</p>
<p>I've ran cdebugpack to collect data, it's stored at <strong>logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_committed_thru</strong></p>
<p>Note that the logs were already rotated before I found out the OSD's crashed, so you should the the *.1.gz files.</p>
CephFS - Bug #623 (Resolved): MDS: MDSTable::load_2
https://tracker.ceph.com/issues/623
2010-12-02T05:12:37Z
Wido den Hollander
wido@42on.com
<p>On a small test machine I have a Ceph RC cluster running (Which was running a old unstable before), after my upgrade I saw a MDS crash.</p>
<p>I saw:</p>
<pre>2010-12-02 13:57:08.952173 7f9d92cbe710 mds0.8 MDS::ms_get_authorizer type=osd
2010-12-02 13:57:08.952242 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045
2010-12-02 13:57:08.952494 7f9d94fc5710 mds0.8 ms_handle_connect on [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128
2010-12-02 13:57:08.953187 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd2 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6807/2128 1 ==== osd_op_reply(5 200.00000000 [read 0~0] = -23 (Too many open files in system)) v1 ==== 98+0+0 (2432835435 0 0) 0x153b1c0
2010-12-02 13:57:09.325890 7f9d94fc5710 -- [2a00:f10:113:1:230:48ff:fe8d:a21f]:6800/2831 <== osd0 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6801/1975 1 ==== osd_op_reply(1 mds0_inotable [read 0~0] = -23 (Too many open files in system)) v1 ==== 99+0+0 (1732481521 0 0) 0x153bc40
2010-12-02 13:57:09.325971 7f9d94fc5710 mds0.inotable: load_2 found no table
mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)':
mds/MDSTable.cc:148: FAILED assert(0)
ceph version 0.24~rc (commit:78a14622438addcd5c337c4924cce1f67d053ee9)
1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x5be) [0x61582e]
2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x674) [0x665bc4]
3: (MDS::_dispatch(Message*)+0x20b4) [0x4ab924]
4: (MDS::ms_dispatch(Message*)+0x6d) [0x4abefd]
5: (SimpleMessenger::dispatch_entry()+0x759) [0x4812c9]
6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4790bc]
7: (Thread::_entry_func(void*)+0xa) [0x48d96a]
8: (()+0x69ca) [0x7f9d977299ca]
9: (clone()+0x6d) [0x7f9d966e170d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>
<p><strong>-23 (Too many open files in system))</strong>, that caught my attention, but trying to raise it to 64.000 wouldn't help.</p>
<pre>root@noisy:~# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 64000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
root@noisy:~# </pre>
<p>The cluster isn't busy at all, and not much data / objects on it:</p>
<pre>root@noisy:~# ceph -s
2010-12-02 14:19:04.984055 pg v1008: 792 pgs: 792 active+clean; 5672 MB data, 10782 MB used, 283 GB / 300 GB avail
2010-12-02 14:19:04.986335 mds e29: 1/1/1 up {0=up:replay(laggy or crashed)}
2010-12-02 14:19:04.986376 osd e48: 3 osds: 3 up, 3 in
2010-12-02 14:19:04.986444 log 2010-12-02 14:17:39.411756 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21f]:6804/2045 50 : [INF] 3.1p1 scrub ok
2010-12-02 14:19:04.986555 class rbd (v1.3 [x86-64])
2010-12-02 14:19:04.986578 mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21f]:6789/0}
root@noisy:~# </pre>
<p>Is this due to the number of open files?</p>
Ceph - Bug #563 (Closed): osd: btrfs, warning at inode.c ( btrfs_orphan_commit_root )
https://tracker.ceph.com/issues/563
2010-11-08T11:05:11Z
Wido den Hollander
wido@42on.com
<p>I'm running the unstable branch and I'm seeing in my dmesg:</p>
<pre>[ 9786.475700] ------------[ cut here ]------------
[ 9786.475718] WARNING: at /build/buildd/linux-lts-backport-natty-2.6.37/fs/btrfs/inode.c:2143 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 9786.475720] Hardware name: PDSMU
[ 9786.475721] Modules linked in: bonding btrfs zlib_deflate crc32c libcrc32c radeon ttm led_class serio_raw drm_kms_helper drm lp parport shpchp i2c_algo_bit i3000_edac edac_core arcmsr e1000e floppy
[ 9786.475739] Pid: 10, comm: kworker/0:1 Not tainted 2.6.37-2-server #9~lucid2-Ubuntu
[ 9786.475741] Call Trace:
[ 9786.475748] [<ffffffff8106377f>] warn_slowpath_common+0x7f/0xc0
[ 9786.475752] [<ffffffff810637da>] warn_slowpath_null+0x1a/0x20
[ 9786.475761] [<ffffffffa0249bd0>] btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]
[ 9786.475771] [<ffffffffa0245391>] commit_fs_roots+0xa1/0x140 [btrfs]
[ 9786.475781] [<ffffffffa02467c0>] btrfs_commit_transaction+0x350/0x730 [btrfs]
[ 9786.475785] [<ffffffff810851f0>] ? autoremove_wake_function+0x0/0x40
[ 9786.475789] [<ffffffff815c109e>] ? schedule+0x44e/0xa70
[ 9786.475799] [<ffffffffa0246e40>] ? do_async_commit+0x0/0x30 [btrfs]
[ 9786.475808] [<ffffffffa0246e5f>] do_async_commit+0x1f/0x30 [btrfs]
[ 9786.475812] [<ffffffff8107e025>] process_one_work+0x125/0x440
[ 9786.475815] [<ffffffff810807e0>] worker_thread+0x170/0x410
[ 9786.475817] [<ffffffff81080670>] ? worker_thread+0x0/0x410
[ 9786.475820] [<ffffffff81084c96>] kthread+0x96/0xa0
[ 9786.475824] [<ffffffff8100cf64>] kernel_thread_helper+0x4/0x10
[ 9786.475827] [<ffffffff81084c00>] ? kthread+0x0/0xa0
[ 9786.475829] [<ffffffff8100cf60>] ? kernel_thread_helper+0x0/0x10
[ 9786.475831] ---[ end trace 29ffdec9a7921eec ]---</pre>
<p>I joined the #btrfs channel and they said:</p>
<pre>20:00 < widodh> I've just upgraded to 2.6.37-rc1 to test Ceph, but I'm seeing this right now in my dmesg:
http://pastebin.com/3HiGiNgy ( inode.c / btrfs_orphan_commit_root )
20:01 < josef> hrm looks like sage screwed something up with async commit
20:02 < widodh> aha, file it over there then?
20:03 < josef> well tell sage about it, he's the one who wrote the code so he'll probably be able to fix it quicker
20:03 < widodh> yes, sure. I thought it was a btrfs bug, that's why I came along here
20:03 < widodh> thanks!
20:05 < josef> well it is a bug in btrfs, but he will be able to fix it quicker than i will</pre>
<p>I've seen this while I had one VM running (qemu-rbd) with bonnie++ in it.</p>
CephFS - Bug #385 (Resolved): Failed assertion in Locker::scatter_nudge
https://tracker.ceph.com/issues/385
2010-08-27T11:37:43Z
Wido den Hollander
wido@42on.com
<p>I updated issue <a class="issue tracker-1 status-9 priority-4 priority-default closed" title="Bug: MDS crash: LogSegment::try_to_expire(MDS*) (Can't reproduce)" href="https://tracker.ceph.com/issues/312">#312</a> but Gregory told me that it was another issue.</p>
<pre>19:47 < gregaf> wido: your recent MDS crash is actually a different issue from #312, involving the distributed lock manager
19:48 < gregaf> are your MDSes just refusing to come up now, or is your cluster working again?
19:50 < gregaf> and what version of the code were you running when it crashed the first time?</pre>
<p>The last log lines:</p>
<pre>10.08.27_08:33:54.023625 7f33ea334710 mds0.journal try_to_expire waiting for nest flush on [inode 10000058e7b [...2,head] /static/kernel/linux/kernel/people/lenb/acpi/ auth v358 f(v5 m10.08.06_21:49:46.000183 3=0+3) n(v47 rc10.08.09_13:39:19.000312 b75920353 3260=3160+100) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7631e40]
10.08.27_08:33:54.023664 7f33ea334710 mds0.locker scatter_nudge auth, scatter/unscattering (inest sync dirty) on [inode 10000058e7b [...2,head] /static/kernel/linux/kernel/people/lenb/acpi/ auth v358 f(v5 m10.08.06_21:49:46.000183 3=0+3) n(v47 rc10.08.09_13:39:19.000312 b75920353 3260=3160+100) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7631e40]
10.08.27_08:33:54.023690 7f33ea334710 mds0.locker simple_lock on (inest sync dirty) on [inode 10000058e7b [...2,head] /static/kernel/linux/kernel/people/lenb/acpi/ auth v358 f(v5 m10.08.06_21:49:46.000183 3=0+3) n(v47 rc10.08.09_13:39:19.000312 b75920353 3260=3160+100) (inest sync dirty) (ifile sync dirty) (iversion lock) | dirtyscattered dirfrag dirty 0x7631e40]
10.08.27_08:33:54.023716 7f33ea334710 mds0.locker scatter_nudge oh, stable again already.
mds/Locker.cc: In function 'void Locker::scatter_nudge(ScatterLock*, Context*, bool)':
mds/Locker.cc:3290: FAILED assert(!c)
1: (LogSegment::try_to_expire(MDS*)+0x10f0) [0x636770]
2: (MDLog::try_expire(LogSegment*)+0x1d) [0x62ec2d]
3: (MDLog::trim(int)+0x628) [0x62f598]
4: (MDS::tick()+0x552) [0x498372]
5: (SafeTimer::EventWrapper::finish(int)+0x269) [0x6b27d9]
6: (Timer::timer_entry()+0x7bc) [0x6b4bac]
7: (Timer::TimerThread::entry()+0xd) [0x4777cd]
8: (Thread::_entry_func(void*)+0xa) [0x48a73a]
9: (()+0x69ca) [0x7f33edc9c9ca]
10: (clone()+0x6d) [0x7f33ecc546fd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.</pre>
<p>The cores, binaries and logfiles are uploaded to <strong>logger.ceph.widodh.nl:/srv/ceph/issues/mds_crash_locker_scatter_nudge</strong></p>
<p>The timestamps of all the files were preserved.</p>