Bug #479
closedceph/mount crash badly when writing
0%
Description
ceph version 0.23~rc (a7ed2ee05dc7453942018d7876401c28d3918214)
kclient master-backport
Linux ss1 2.6.36-020636rc7-generic #201010070908 SMP Thu Oct 7 09:10:00 UTC 2010 x86_64 GNU/Linux
ceph (osd0/mounts) crashes as soon as files are written to the disk.
The setup is at the minimal. one high-end machine with 1xMDS, 1xMON and 1xOSD. using ext4
btrfs shows a similar symptom, in short, any moderate IOs crashes.
It seems to crash as soon as journal parts have been filled up.
When crashed, ceph could be stopped, but mounting can not be redone without rebooting the PC.
CTRL-C is unresponsive.
- start mkbtrfs, start ceph, mount, and dd command.
attached tar.gz contains the full log.
[global]
pid file = /var/run/ceph/$name.pid
logger dir = /cephlog
log dir = /cephlog
[mon]
mon data = /data/mon$id
debug ms = 1
debug mon = 20
debug paxos = 20
[mon0]
host = ss1
mon addr = 192.168.1.2:6789
[mds]
debug ms = 1 ; message traffic
debug mds = 20 ; mds
debug mds balancer = 20 ; load balancing
debug mds log = 20 ; mds journaling
debug mds_migrator = 20 ; metadata migration
debug monc = 20 ; monitor interaction, startup
[mds0]
host = ss1
[osd]
osd journal = /data/osd$id/journal
osd journal size = 100
osd data = /data/osd$id
debug ms = 1 ; message traffic
debug osd = 20
debug filestore = 20 ; local object storage
debug journal = 20 ; local journaling
debug monc = 20 ; monitor interaction, startup
[osd0]
host = ss1
Files
Updated by DongJin Lee over 13 years ago
- File syslog.tar.gz syslog.tar.gz added
- File debug1.tar.gz debug1.tar.gz added
Update.
re-ran again, this time capturing sys/kernel/debug/ceph/*/
briefly,
10:55:00 - start the ceph (sudo mkcephfs -c /etc/ceph/ceph.conf --allhosts --clobber_old_data -v ; sudo /etc/init.d/ceph start)
10:55:20 - start the mount (sudo mount -t 192.168.1.2:/ /media/ceph)
10:56:00 - start the dd sudo dd if=/devzero of=/media/ceph/afile bs=1M count=3000
... (no response)
10:57:00 - CTRL-C hangs (no response)
10:57:30 - (other terminal) sudo /etc/init.d/ceph stop (all cephs stopped/unmounted, but the dd command still hanged, can't be killed -9 either.
Another issue is that once system rebooted, the osd0 disk needs to be formatted again (ext4), or else the ceph mount hangs.
I think the previous run must've have corrupted the disk.
Thanks
Updated by Sage Weil over 13 years ago
From the osdc.txt, it looks as if none of the IOs are actually flushing to disk. Can you do a simple test like
echo asdf > foo.txt sync
and verify that that completes? Similarly, can you try
rados -p casdata put test_object_1 /bin/ls
and verify that that successfully writes an object?
Updated by DongJin Lee over 13 years ago
Thanks.
I've ran the above lines, no crash. But There was nothing in the osdc.
I'm unsure what output to expect from the rados command, but there was no file or output shown.
- I've also copied a small tar files to it, and I could read/copy them back
- It seems that small files (~10MB) are okay. (or maybe the sync)
After that, I've drag-dropped 1GB blank file to the ceph mount, it copied okay, nothing too fast or slow.
I then used the terminal go to to ceph mount, this time ran the command dd if=/dev/zero of=ddFile bs=1M count=1000
This time, it crashed, no outputs, after CTRL-C, the mount is inaccessible similar to last time, i.e., all other mounts are frozen, reboot needed.
Updated by DongJin Lee over 13 years ago
- File cephlog.tar.gz cephlog.tar.gz added
- File debug1.tar.gz debug1.tar.gz added
Update: more concise setup :)
I created simple four files; file1 (1MiB), file10 (10MiB), file100 (100MiB), file1000 (1000MiB)
After the ceph mount, I did the following.
root@ss1:/media# cp file1 ceph
root@ss1:/media# sync
root@ss1:/media# cp file10 ceph
root@ss1:/media# sync
root@ss1:/media# cp file100 ceph
root@ss1:/media# sync
^C^C^C^C
The terminal returns immediately after the 'cp' command, the 'sync' took a few seconds for file1, and more for file10, but freezes for file100.
At this point, I stopped the ceph, and got the debugs (both kernel/debug and ceph)
osdc now contains one line. 29 osd0 0.49d2 10000000002.00000018 write
Thanks
Updated by Yehuda Sadeh over 13 years ago
Looks like some issue with the journal:
2010-10-19 11:42:43.918144 7ffae0acf710 journal room 3928063 max_size 104857600 pos 10276864 header.start 14204928 top 4096
2010-10-19 11:42:43.918166 7ffae0acf710 journal check_for_full at 10276864 : JOURNAL FULL 10276864 >= 3928063 (max_size 104857600 start 14204928)
2010-10-19 11:42:43.918179 7ffae0acf710 journal prepare_multi_write full on first entry, need to wait
2010-10-19 11:42:43.918189 7ffae0acf710 journal write_thread_entry full, going to sleep (waiting for commit)
It doesn't seem to ever wake up.
Also note the following in the log, not sure if related (or is normal) or not:
2010-10-19 11:42:43.148742 7ffada9c2710 osd0 3 pg[0.23( empty n=0 ec=2 les=3 2/2/2) [0] r=0 mlcod 0'0 active+clean+degraded] truncate_seq 1 > current 0, truncating to 18446744073709551615
...
2010-10-19 11:42:43.918269 7ffadf2cc710 filestore(/data/osd0) truncate /data/osd0/current/0.23_head/10000000002.00000016_head size 18446744073709551615
2010-10-19 11:42:43.918281 7ffadf2cc710 filestore(/data/osd0) truncate /data/osd0/current/0.23_head/10000000002.00000016_head size 18446744073709551615 = -22
Updated by DongJin Lee over 13 years ago
Continuing with the ext4
I have set no-journal mode by commenting out the two lines in the ceph
; osd journal = /data/osd$id/journal
; osd journal size = 100
Ceph starts with the warning (About no-journaling)
The mount fails, it hangs (but I can CTRL-C), at the 'wait4' later returns "mount error 5 = Input/output error"
getcwd("/home/aaa/run", 4095) = 14 readlink("/home/aaa/run/192.168.1.2:", 0x7fffa74e5f20, 4096) = -1 ENOENT (No such file or directory) stat("/sbin/mount.ceph", {st_mode=S_IFREG|0755, st_size=45894, ...}) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fda24d5eab0) = 4203 wait4(-1, mount error 5 = Input/output error [{WIFEXITED(s) && WEXITSTATUS(s) == 5}], 0, NULL) = 4203 --- SIGCHLD (Child exited) @ 0 (0) --- exit_group(5) = ?
So I have tried btrfs no-jounral. This works. It can actually write with a reasonable speed.
7.2K rpm 2TB HDD, dd goes about 80MiB/s, and does not hang.
With journaling on, the mount hangs often, and even when successfully mounts (after the reboot), it crashes as with ext4.
Also, I'm unsure of proper way to fully-stop/restart the ceph system.
At this stage, I use the script, but it basically does this. Below initializes/stops.
umount /media/ceph -l /etc/init.d/ceph stop rm -rf /data mkdir -p /data/osd0/
mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs --clobber_old_data -v /etc/init.d/ceph start mount -t ceph 192.168.1.2:/ /media/ceph
The problem is that I often cannot mount again from the previous shut off.
I get similarly the mount error 5 = Input/output error
Thanks
Updated by Yehuda Sadeh over 13 years ago
What's the exact client version and kernel that you're running on?
Please do the following under ceph-client-standalong/:
git-rev-parse HEAD
Also, having a backtrace of the running osd when it's hung (e.g., after the journal was filled like in the previous cases) could be helpful. Something along the lines of:
gdb cosd `pgrep cosd`
...
(gdb) thread apply all bt
Updated by Sage Weil over 13 years ago
Hi DongJin,
Any luck on this issue? Has the problem gone away, or do you have time to help us track it down?
Thanks-
Updated by Sage Weil over 13 years ago
- Priority changed from Immediate to Normal
Updated by Sage Weil over 13 years ago
- Target version changed from v0.23 to v0.23.1
Updated by DongJin Lee over 13 years ago
Sorry Sage and Yehuda for the late update..
I was spending time experimenting, and just using the default btrfs with no-journal as it just worked.
It was just the ext4 or journaling that failed.
The version
2.6.36-020636rc7-generic
ceph version 0.23~rc (a7ed2ee05dc7453942018d7876401c28d3918214)
git rev-parse HEAD
0184d86d147911efe080dcbbfba40f0b3617659f
Updated by Sage Weil over 13 years ago
- Target version changed from v0.23.1 to v0.24
Updated by Sage Weil over 13 years ago
- Status changed from New to Can't reproduce
Updated by DongJin Lee over 13 years ago
- File journal_sys.tar.gz journal_sys.tar.gz added
- File journal_cephlog.tar.gz journal_cephlog.tar.gz added
- File no_journal_cephlog.tar.gz no_journal_cephlog.tar.gz added
Hi all:
Ok, so I gitted again, original/unstable,
- Linux ss1 2.6.36-02063601-generic #201011231330 SMP Tue Nov 23 13:35:03 UTC 2010 x86_64 GNU/Linux
- Ubuntu 10.04 x64
- ceph version shows 0.23 (5d1d8d0c4602be9819cc9f7aea562fccbb005a56)
- system 8 cored Server, single mds, mon, osd, and all default setup, except the repl is set to x1.
- osd is a single HW RAID0 6xSSD, dd reports 500MB/s (4k) to 670MB/s (1M) real.
So I again tried journal mode and no-journal mode. just by commenting out the two lines in ceph.conf. All btrfs
The result is the same. Journal mode crashes badly as soon as I tried to copy, and I can't restart unless I do hard-reset button.
Attached is the sys log and the cephlog.
ceph.conf is as below
[global] pid file = /var/run/ceph/$name.pid logger dir = /cephlog log dir = /cephlog [mon] mon data = /data/mon$id debug ms = 1 debug mon = 20 debug paxos = 20 [mon0] host = ss1 mon addr = 192.168.1.2:6789 [mds] debug ms = 1 ; message traffic debug mds = 20 ; mds debug mds balancer = 20 ; load balancing debug mds log = 20 ; mds journaling debug mds_migrator = 20 ; metadata migration debug monc = 20 ; monitor interaction, startup [mds0] host = ss1 [osd] ; osd journal = /data/osd$id/journal ; osd journal size = 1000 ; filestore journal writeahead = true osd data = /data/osd$id debug ms = 1 ; message traffic debug osd = 20 debug filestore = 20 ; local object storage debug journal = 20 ; local journaling debug monc = 20 ; monitor interaction, startup [osd0] btrfs devs = /dev/sdb host = ss1
Further question about performance, in no-journal mode, the dd shows 150MB/s - 300MB/s (about 25% - 50% of the original performance).
This is using the single replication mode. Whiling dd coping, the iostat disk activity shows 0% mostly on the disk, and every occasion it peaks to 100% on/off. Is this expected? normally dd direct to the disk shows 100% all the time.
Also, I have tried Yehuda's method, gdb cosd `pgrep cosd` and run -i 0 -c /etc/ceph/ceph.conf manually (to start the cosd), and set the tread apply all bt. But where is the output/dump? Sorry I have a little knowledge in these areas..
Many thanks in advance.