Project

General

Profile

Actions

Bug #9768

closed

ceph-osd mkfs hangs

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workaround for Firefly <= 0.80.7

If it shows with

# dpkg -l | grep aio
ii  libaio1:amd64                            0.3.109-4                 amd64                     Linux kernel AIO access library - shared library
$ ceph --version
ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)
$ uname -a
Linux g3 3.14-2-amd64 #1 SMP Debian 3.14.15-2 (2014-08-09) x86_64 GNU/Linux

It can be worked around with the following in ceph.conf
journal_aio = false

It can be removed once the ceph-disk activate has run. Restarting the OSD will have it use AIO.

Description

$ ceph-disk prepare --mkfs ext4 /dev/sdc
$ ps fauwwwx | grep mkfs
root       366  0.0  0.0  40672  1548 ?        Ss   Oct10   0:00 udevd --daemon
root     21963  0.0  0.0  40644  1360 ?        S    13:57   0:00  \_ udevd --daemon
root     21988  0.0  0.0   4308   636 ?        S    13:57   0:00  |   \_ /bin/sh /usr/sbin/ceph-disk-activate /dev/sdc1
root     21991  0.2  0.0  48736  9968 ?        S    13:57   0:00  |       \_ /usr/bin/python /usr/sbin/ceph-disk activate /dev/sdc1
root     22082  0.0  0.0 192040  8820 ?        Sl   13:57   0:00  |           \_ /usr/bin/ceph-osd --cluster ceph --mkfs --mkkey -i 6 --monmap /var/lib/ceph/tmp/mnt.5mDa2j/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.5mDa2j --osd-journal /var/lib/ceph/tmp/mnt.5mDa2j/journal --osd-uuid 0b27732f-bee9-4212-9395-bad5eaeffc37 --keyring /var/lib/ceph/tmp/mnt.5mDa2j/keyring

The log with debug_filestore=20
2014-10-14 14:23:05.600868 7fd3f68c47c0  0 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae), process ceph-osd, pid 1609
2014-10-14 14:23:05.602648 7fd3f68c47c0  1 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) mkfs in /var/lib/ceph/tmp/mnt.9lLCnp
2014-10-14 14:23:05.602666 7fd3f68c47c0  1 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) mkfs fsid is already set to cd654a0a-08d7-4ff8-a79d-372c73ae1f3a
2014-10-14 14:23:05.647609 7fd3f68c47c0  1 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) leveldb db exists/created
2014-10-14 14:23:05.649577 7fd3f68c47c0  1 journal _open /var/lib/ceph/tmp/mnt.9lLCnp/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:23:05.649934 7fd3f68c47c0 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected cd654a0a-08d7-4ff8-a79d-372c73ae1f3a, invalid (someone else's?) journal
2014-10-14 14:23:05.651841 7fd3f68c47c0  1 journal _open /var/lib/ceph/tmp/mnt.9lLCnp/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:23:05.655205 7fd3f68c47c0  0 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) mkjournal created journal on /var/lib/ceph/tmp/mnt.9lLCnp/journal
2014-10-14 14:23:05.655236 7fd3f68c47c0  1 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) mkfs done in /var/lib/ceph/tmp/mnt.9lLCnp
2014-10-14 14:23:05.662726 7fd3f68c47c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.9lLCnp) detect_features: FIEMAP ioctl is supported and appears to work
2014-10-14 14:23:05.662746 7fd3f68c47c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.9lLCnp) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-10-14 14:23:05.667236 7fd3f68c47c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.9lLCnp) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-10-14 14:23:05.691436 7fd3f68c47c0  0 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) limited size xattrs
2014-10-14 14:23:05.704333 7fd3f68c47c0  0 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-10-14 14:23:05.706694 7fd3f68c47c0  1 journal _open /var/lib/ceph/tmp/mnt.9lLCnp/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:23:05.709404 7fd3f68c47c0  1 journal _open /var/lib/ceph/tmp/mnt.9lLCnp/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:23:05.709841 7fd3f68c47c0 -1 filestore(/var/lib/ceph/tmp/mnt.9lLCnp) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-10-14 14:25:14.161524 7f95042a87c0  0 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae), process ceph-osd, pid 3052
2014-10-14 14:25:14.163866 7f95042a87c0  1 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) mkfs in /var/lib/ceph/tmp/mnt.8GTDCM
2014-10-14 14:25:14.163884 7f95042a87c0  1 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) mkfs fsid is already set to 5a9ad874-ee9e-4b92-8607-c32c23eb7e83
2014-10-14 14:25:14.199386 7f95042a87c0  1 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) leveldb db exists/created
2014-10-14 14:25:14.201252 7f95042a87c0  1 journal _open /var/lib/ceph/tmp/mnt.8GTDCM/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:25:14.201574 7f95042a87c0 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 5a9ad874-ee9e-4b92-8607-c32c23eb7e83, invalid (someone else's?) journal
2014-10-14 14:25:14.203518 7f95042a87c0  1 journal _open /var/lib/ceph/tmp/mnt.8GTDCM/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:25:14.206768 7f95042a87c0  0 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) mkjournal created journal on /var/lib/ceph/tmp/mnt.8GTDCM/journal
2014-10-14 14:25:14.206801 7f95042a87c0  1 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) mkfs done in /var/lib/ceph/tmp/mnt.8GTDCM
2014-10-14 14:25:14.213720 7f95042a87c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8GTDCM) detect_features: FIEMAP ioctl is supported and appears to work
2014-10-14 14:25:14.213748 7f95042a87c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8GTDCM) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-10-14 14:25:14.225341 7f95042a87c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8GTDCM) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-10-14 14:25:14.245278 7f95042a87c0  0 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) limited size xattrs
2014-10-14 14:25:14.258212 7f95042a87c0  0 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-10-14 14:25:14.260042 7f95042a87c0  1 journal _open /var/lib/ceph/tmp/mnt.8GTDCM/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:25:14.261695 7f95042a87c0  1 journal _open /var/lib/ceph/tmp/mnt.8GTDCM/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 14:25:14.261955 7f95042a87c0 -1 filestore(/var/lib/ceph/tmp/mnt.8GTDCM) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-10-14 15:22:59.330720 7f5f022c17c0  0 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae), process ceph-osd, pid 28482
2014-10-14 15:22:59.332796 7f5f022c17c0  1 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) mkfs in /var/lib/ceph/tmp/mnt.GbLkrx
2014-10-14 15:22:59.332819 7f5f022c17c0  1 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) mkfs fsid is already set to 9b224aad-1365-4d35-95a6-3eb59041c4ab
2014-10-14 15:22:59.363696 7f5f022c17c0  1 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) leveldb db exists/created
2014-10-14 15:22:59.365460 7f5f022c17c0  1 journal _open /var/lib/ceph/tmp/mnt.GbLkrx/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:22:59.365789 7f5f022c17c0 -1 journal check: ondisk fsid 5a9ad874-ee9e-4b92-8607-c32c23eb7e83 doesn't match expected 9b224aad-1365-4d35-95a6-3eb59041c4ab, invalid (someone else's?) journal
2014-10-14 15:22:59.367804 7f5f022c17c0  1 journal _open /var/lib/ceph/tmp/mnt.GbLkrx/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:22:59.371068 7f5f022c17c0  0 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) mkjournal created journal on /var/lib/ceph/tmp/mnt.GbLkrx/journal
2014-10-14 15:22:59.371095 7f5f022c17c0  1 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) mkfs done in /var/lib/ceph/tmp/mnt.GbLkrx
2014-10-14 15:22:59.385444 7f5f022c17c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.GbLkrx) detect_features: FIEMAP ioctl is supported and appears to work
2014-10-14 15:22:59.385466 7f5f022c17c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.GbLkrx) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-10-14 15:22:59.398287 7f5f022c17c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.GbLkrx) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-10-14 15:22:59.415874 7f5f022c17c0  0 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) limited size xattrs
2014-10-14 15:22:59.427902 7f5f022c17c0  0 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-10-14 15:22:59.429929 7f5f022c17c0  1 journal _open /var/lib/ceph/tmp/mnt.GbLkrx/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:22:59.432652 7f5f022c17c0  1 journal _open /var/lib/ceph/tmp/mnt.GbLkrx/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:22:59.432908 7f5f022c17c0 -1 filestore(/var/lib/ceph/tmp/mnt.GbLkrx) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-10-14 15:27:04.082152 7f5efdff2700  1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f5ef9fea700' had timed out after 4
2014-10-14 15:27:04.082187 7f5efdff2700  1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f5efa7eb700' had timed out after 4
2014-10-14 15:27:04.084983 7f5ef9fea700  1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f5ef9fea700' had timed out after 4
2014-10-14 15:27:04.085017 7f5efa7eb700  1 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f5efa7eb700' had timed out after 4
2014-10-14 15:35:39.632678 7f32b27be7c0  0 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae), process ceph-osd, pid 530
2014-10-14 15:35:39.634200 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) dump_stop
2014-10-14 15:35:39.634648 7f32b27be7c0  1 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mkfs in /var/lib/ceph/tmp/mnt.8uRLPI
2014-10-14 15:35:39.634676 7f32b27be7c0  1 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mkfs fsid is already set to 3043b7af-d7a0-45d7-bcd1-5fa8e01def54
2014-10-14 15:35:39.668382 7f32b27be7c0  1 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) leveldb db exists/created
2014-10-14 15:35:39.668400 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) open_journal at /var/lib/ceph/tmp/mnt.8uRLPI/journal
2014-10-14 15:35:39.668450 7f32b27be7c0 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056)
2014-10-14 15:35:39.670276 7f32b27be7c0 20 journal _check_disk_write_cache: disk write cache is on, but your kernel is new enough to handle it correctly. (fn:/var/lib/ceph/tmp/mnt.8uRLPI/journal)
2014-10-14 15:35:39.670454 7f32b27be7c0  1 journal _open /var/lib/ceph/tmp/mnt.8uRLPI/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:35:39.670471 7f32b27be7c0 10 journal read_header
2014-10-14 15:35:39.670779 7f32b27be7c0 10 journal header: block_size 4096 alignment 4096 max_size 5367660544
2014-10-14 15:35:39.670795 7f32b27be7c0 10 journal header: start 4096
2014-10-14 15:35:39.670798 7f32b27be7c0 10 journal  write_pos 0
2014-10-14 15:35:39.670802 7f32b27be7c0 -1 journal check: ondisk fsid 9b224aad-1365-4d35-95a6-3eb59041c4ab doesn't match expected 3043b7af-d7a0-45d7-bcd1-5fa8e01def54, invalid (someone else's?) journal
2014-10-14 15:35:39.670896 7f32b27be7c0  2 journal create /var/lib/ceph/tmp/mnt.8uRLPI/journal fsid 3043b7af-d7a0-45d7-bcd1-5fa8e01def54
2014-10-14 15:35:39.670917 7f32b27be7c0 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056)
2014-10-14 15:35:39.672612 7f32b27be7c0 20 journal _check_disk_write_cache: disk write cache is on, but your kernel is new enough to handle it correctly. (fn:/var/lib/ceph/tmp/mnt.8uRLPI/journal)
2014-10-14 15:35:39.672805 7f32b27be7c0  1 journal _open /var/lib/ceph/tmp/mnt.8uRLPI/journal fd 11: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:35:39.672815 7f32b27be7c0 10 journal header: block_size 4096 alignment 4096 max_size 5367660544
2014-10-14 15:35:39.672818 7f32b27be7c0 10 journal header: start 4096
2014-10-14 15:35:39.672824 7f32b27be7c0 10 journal  write_pos 0
2014-10-14 15:35:39.675994 7f32b27be7c0  2 journal create done
2014-10-14 15:35:39.676052 7f32b27be7c0  0 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mkjournal created journal on /var/lib/ceph/tmp/mnt.8uRLPI/journal
2014-10-14 15:35:39.676094 7f32b27be7c0  1 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mkfs done in /var/lib/ceph/tmp/mnt.8uRLPI
2014-10-14 15:35:39.676105 7f32b27be7c0  5 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) basedir /var/lib/ceph/tmp/mnt.8uRLPI journal /var/lib/ceph/tmp/mnt.8uRLPI/journal
2014-10-14 15:35:39.676121 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mount fsid is 3043b7af-d7a0-45d7-bcd1-5fa8e01def54
2014-10-14 15:35:39.683189 7f32b27be7c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8uRLPI) detect_features: FIEMAP ioctl is supported and appears to work
2014-10-14 15:35:39.683211 7f32b27be7c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8uRLPI) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-10-14 15:35:39.688255 7f32b27be7c0  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.8uRLPI) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-10-14 15:35:39.705087 7f32b27be7c0  0 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) limited size xattrs
2014-10-14 15:35:39.705166 7f32b27be7c0  5 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mount op_seq is 1
2014-10-14 15:35:39.723568 7f32b27be7c0 20 filestore (init)dbobjectmap: seq is 1
2014-10-14 15:35:39.723580 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) open_journal at /var/lib/ceph/tmp/mnt.8uRLPI/journal
2014-10-14 15:35:39.723602 7f32b27be7c0  0 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-10-14 15:35:39.723606 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) list_collections
2014-10-14 15:35:39.723656 7f32b27be7c0 10 journal journal_replay fs op_seq 1
2014-10-14 15:35:39.723660 7f32b27be7c0  2 journal open /var/lib/ceph/tmp/mnt.8uRLPI/journal fsid 3043b7af-d7a0-45d7-bcd1-5fa8e01def54 fs_op_seq 1
2014-10-14 15:35:39.723676 7f32b27be7c0 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056)
2014-10-14 15:35:39.723688 7f32ad4ed700 20 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) sync_entry waiting for max_interval 5.000000
2014-10-14 15:35:39.725499 7f32b27be7c0 20 journal _check_disk_write_cache: disk write cache is on, but your kernel is new enough to handle it correctly. (fn:/var/lib/ceph/tmp/mnt.8uRLPI/journal)
2014-10-14 15:35:39.725688 7f32b27be7c0  1 journal _open /var/lib/ceph/tmp/mnt.8uRLPI/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:35:39.725704 7f32b27be7c0 10 journal read_header
2014-10-14 15:35:39.725952 7f32b27be7c0 10 journal header: block_size 4096 alignment 4096 max_size 5367660544
2014-10-14 15:35:39.725971 7f32b27be7c0 10 journal header: start 4096
2014-10-14 15:35:39.725974 7f32b27be7c0 10 journal  write_pos 4096
2014-10-14 15:35:39.725977 7f32b27be7c0 10 journal open header.fsid = 3043b7af-d7a0-45d7-bcd1-5fa8e01def54
2014-10-14 15:35:39.726066 7f32b27be7c0  2 journal No further valid entries found, journal is most likely valid
2014-10-14 15:35:39.726068 7f32b27be7c0 10 journal open reached end of journal.
2014-10-14 15:35:39.726076 7f32b27be7c0  2 journal No further valid entries found, journal is most likely valid
2014-10-14 15:35:39.726079 7f32b27be7c0  3 journal journal_replay: end of journal, done.
2014-10-14 15:35:39.726080 7f32b27be7c0 10 journal make_writeable
2014-10-14 15:35:39.726377 7f32b27be7c0 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056)
2014-10-14 15:35:39.728080 7f32b27be7c0 20 journal _check_disk_write_cache: disk write cache is on, but your kernel is new enough to handle it correctly. (fn:/var/lib/ceph/tmp/mnt.8uRLPI/journal)
2014-10-14 15:35:39.728281 7f32b27be7c0  1 journal _open /var/lib/ceph/tmp/mnt.8uRLPI/journal fd 17: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-14 15:35:39.728354 7f32b27be7c0 10 journal journal_start
2014-10-14 15:35:39.728376 7f32ac4eb700 10 journal write_thread_entry start
2014-10-14 15:35:39.728407 7f32ac4eb700 20 journal prepare_multi_write queue_pos now 4096
2014-10-14 15:35:39.728407 7f32abcea700 10 journal write_finish_thread_entry enter
2014-10-14 15:35:39.728445 7f32abcea700 20 journal write_finish_thread_entry sleeping
2014-10-14 15:35:39.728481 7f32ac4eb700 15 journal do_aio_write writing 4096~0 + header
2014-10-14 15:35:39.728495 7f32ac4eb700 20 journal write_aio_bl 0~4096 seq 0
2014-10-14 15:35:39.728510 7f32ac4eb700 20 journal write_aio_bl .. 0~4096 in 1
2014-10-14 15:35:39.728509 7f32b27be7c0 15 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) read meta/23c2fcde/osd_superblock/0//-1 0~0
2014-10-14 15:35:39.728549 7f32ac4eb700 20 journal write_aio_bl 4096~0 seq 0
2014-10-14 15:35:39.728553 7f32ac4eb700  5 journal put_throttle finished 0 ops and 0 bytes, now 0 ops and 0 bytes
2014-10-14 15:35:39.728556 7f32ac4eb700 20 journal write_thread_entry going to sleep
2014-10-14 15:35:39.728581 7f32abcea700 20 journal write_finish_thread_entry waiting for aio(s)
2014-10-14 15:35:39.728574 7f32b27be7c0 -1 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-10-14 15:35:39.728602 7f32b27be7c0 10 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No such file or directory
2014-10-14 15:35:39.728643 7f32b27be7c0  5 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) queue_transactions new osr(default 0x7f32b4c38b98)/0x7f32b4c38b98
2014-10-14 15:35:39.728655 7f32b27be7c0 10 journal op_submit_start 2
2014-10-14 15:35:39.728656 7f32b27be7c0  5 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) queue_transactions (writeahead) 2 0x7fff6ba941f0
2014-10-14 15:35:39.728659 7f32b27be7c0 10 journal op_journal_transactions 2 0x7fff6ba941f0
2014-10-14 15:35:39.728674 7f32b27be7c0  5 journal submit_entry seq 2 len 505 (0x7f32b4bf58d0)
2014-10-14 15:35:39.728685 7f32b27be7c0 10 journal op_submit_finish 2
2014-10-14 15:35:39.728716 7f32ac4eb700 20 journal write_thread_entry woke up
2014-10-14 15:35:39.728724 7f32ac4eb700 20 journal write_thread_entry aio throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
2014-10-14 15:35:39.728730 7f32ac4eb700 20 journal write_thread_entry deferring until more aios complete: 1 aios with 4096 bytes needs 4 bytes to start a new aio (currently 0 pending)
2014-10-14 15:35:39.730226 7f32abcea700 10 journal write_finish_thread_entry aio 0~4096 done
2014-10-14 15:35:39.730240 7f32abcea700 20 journal check_aio_completion
2014-10-14 15:35:39.730242 7f32abcea700 20 journal check_aio_completion completed seq 0 0~4096
2014-10-14 15:35:39.730247 7f32abcea700 20 journal write_finish_thread_entry sleeping
2014-10-14 15:35:44.724070 7f32ad4ed700 20 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) sync_entry woke after 5.000381
2014-10-14 15:35:44.724098 7f32ad4ed700 10 journal commit_start max_applied_seq 1, open_ops 0
2014-10-14 15:35:44.724102 7f32ad4ed700 10 journal commit_start blocked, all open_ops have completed
2014-10-14 15:35:44.724105 7f32ad4ed700 10 journal commit_start nothing to do
2014-10-14 15:35:44.724108 7f32ad4ed700 10 journal commit_start
2014-10-14 15:35:44.724115 7f32ad4ed700 20 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) sync_entry waiting for max_interval 5.000000
2014-10-14 15:35:49.724191 7f32ad4ed700 20 filestore(/var/lib/ceph/tmp/mnt.8uRLPI) sync_entry woke after 5.000076
2014-10-14 15:35:49.724221 7f32ad4ed700 10 journal commit_start max_applied_seq 1, open_ops 0
2014-10-14 15:35:49.724225 7f32ad4ed700 10 journal commit_start blocked, all open_ops have completed
2014-10-14 15:35:49.724227 7f32ad4ed700 10 journal commit_start nothing to do
2014-10-14 15:35:49.724236 7f32ad4ed700 10 journal commit_start
...


stack trace
(gdb) thread apply all bt

Thread 14 (Thread 0x7f3177cd8700 (LWP 23461)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b67413b in ceph::log::Log::entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f3177cd8700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 13 (Thread 0x7f3176d4e700 (LWP 23462)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f317b74021b in CephContextServiceThread::entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f3176d4e700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 12 (Thread 0x7f317654d700 (LWP 23463)):
#0  0x00007f3178c370ed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f317b726433 in AdminSocket::entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f317654d700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 11 (Thread 0x7f317554b700 (LWP 23470)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b595dbf in WBThrottle::get_next_should_flush(boost::tuples::tuple<ghobject_t, std::tr1::shared_ptr<FDCache::FD>, WBThrottle::PendingWB, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>*) ()
#2  0x00007f317b596c83 in WBThrottle::entry() ()
#3  0x00007f317a4dc0a4 in start_thread (arg=0x7f317554b700) at pthread_create.c:309
#4  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 10 (Thread 0x7f3175d4c700 (LWP 23471)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f317b51fc0d in Cond::WaitInterval(CephContext*, Mutex&, utime_t) ()
#2  0x00007f317b4ea490 in FileStore::sync_entry() ()
#3  0x00007f317b52076d in FileStore::SyncThread::entry() ()
#4  0x00007f317a4dc0a4 in start_thread (arg=0x7f3175d4c700) at pthread_create.c:309
#5  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 9 (Thread 0x7f3174d4a700 (LWP 23476)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b613286 in FileJournal::write_thread_entry() ()
#2  0x00007f317b51ee8d in FileJournal::Writer::entry() ()
#3  0x00007f317a4dc0a4 in start_thread (arg=0x7f3174d4a700) at pthread_create.c:309
#4  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 8 (Thread 0x7f3174549700 (LWP 23477)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b2bee06 in Cond::Wait(Mutex&) ()
#2  0x00007f317b610eea in FileJournal::write_finish_thread_entry() ()
#3  0x00007f317b51efad in FileJournal::WriteFinisher::entry() ()
#4  0x00007f317a4dc0a4 in start_thread (arg=0x7f3174549700) at pthread_create.c:309
#5  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 7 (Thread 0x7f3173d48700 (LWP 23478)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b6625f8 in Finisher::finisher_thread_entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f3173d48700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 6 (Thread 0x7f3173547700 (LWP 23479)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f317b71cb32 in ThreadPool::worker(ThreadPool::WorkThread*) ()
#2  0x00007f317b71dd90 in ThreadPool::WorkThread::entry() ()
#3  0x00007f317a4dc0a4 in start_thread (arg=0x7f3173547700) at pthread_create.c:309
#4  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x7f3172d46700 (LWP 23480)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f317b71cb32 in ThreadPool::worker(ThreadPool::WorkThread*) ()
#2  0x00007f317b71dd90 in ThreadPool::WorkThread::entry() ()
#3  0x00007f317a4dc0a4 in start_thread (arg=0x7f3172d46700) at pthread_create.c:309
#4  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7f3172545700 (LWP 23481)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b6625f8 in Finisher::finisher_thread_entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f3172545700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7f3171d44700 (LWP 23482)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b6625f8 in Finisher::finisher_thread_entry() ()
#2  0x00007f317a4dc0a4 in start_thread (arg=0x7f3171d44700) at pthread_create.c:309
#3  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f3171543700 (LWP 23483)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b717d84 in SafeTimer::timer_thread() ()
#2  0x00007f317b71892d in SafeTimerThread::entry() ()
#3  0x00007f317a4dc0a4 in start_thread (arg=0x7f3171543700) at pthread_create.c:309
#4  0x00007f3178c3fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f317b01d7c0 (LWP 23460)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f317b35bbf4 in ObjectStore::apply_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, Context*) ()
#2  0x00007f317b2c0d08 in ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) ()
#3  0x00007f317b27374a in OSD::mkfs(CephContext*, ObjectStore*, std::string const&, uuid_d, int) ()
#4  0x00007f317b24a872 in main ()

Files

works.txt (8.8 KB) works.txt log of ceph-disk-activate Loïc Dachary, 10/14/2014 06:00 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Bug #9545: filestore stuck in journal->should_commit_now() loop on shutdownResolvedSage Weil09/19/2014

Actions
Related to Ceph - Bug #9851: crash on journal/filestore shutdown on fireflyResolvedLoïc Dachary10/21/2014

Actions
Is duplicate of Ceph - Bug #9073: OSD with device/partition journals down after fresh deploy or upgrade to 0.83 ResolvedLoïc Dachary08/11/2014

Actions
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary over 9 years ago

  • Subject changed from ceph-osd mkfs loops forever on sync_entry (firefly 0.80.6) to ceph-osd mkfs loops forever on sync_entry
  • Description updated (diff)
Actions #3

Updated by Loïc Dachary over 9 years ago

Although https://github.com/ceph/ceph/commit/2f11631f3144f2cc0e04d718e40e716540c8af19 seems related, the log shows FileStore::umount() is not called.

Actions #4

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #5

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #6

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #7

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #8

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #9

Updated by Loïc Dachary over 9 years ago

$ git log --patch tags/v0.80.5..tags/v0.80.6
...
commit f7498124325fe2269c33e45fae4c8181c8f7be14
Author: Samuel Just <sam.just@inktank.com>
Date:   Mon Jun 9 11:00:32 2014 -0700

    os/: add async flush_commit() method to Sequencer and implementations

    Signed-off-by: Samuel Just <sam.just@inktank.com>
...
commit 3bbe29eb5cca7cc42266a6c5561ce68feb4072de
Author: Samuel Just <sam.just@inktank.com>
Date:   Tue May 20 16:05:01 2014 -0700

    common/Finisher: add queue(list<Context*>&)

    Signed-off-by: Samuel Just <sam.just@inktank.com>

Actions #10

Updated by Loïc Dachary over 9 years ago

I browsed the patches to aio in 3.12.7 until now and saw nothing that could related to this problem https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.12.7 . Significant problems were fixed before 3.12 but not after.

Actions #11

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to 12
  • Assignee set to Loïc Dachary
Actions #12

Updated by Loïc Dachary over 9 years ago

  • File ceph-osd-ok.txt added

On ubuntu-14.04 the logs of a ceph-osd mkfs on 0.80.5 that completes successfully.

Actions #13

Updated by Loïc Dachary over 9 years ago

  • Subject changed from ceph-osd mkfs loops forever on sync_entry to ceph-osd mkfs hangs
Actions #14

Updated by Loïc Dachary over 9 years ago

  • File deleted (ceph-osd-ok.txt)
Actions #15

Updated by Loïc Dachary over 9 years ago

Created with ceph-disk prepare --fs-type=ext4 and ceph-disk activate /dev/loop3p1

Actions #16

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to Duplicate
Actions #17

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #18

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF