Project

General

Profile

Bug #479

ceph/mount crash badly when writing

Added by DongJin Lee about 10 years ago. Updated almost 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
qa
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

ceph version 0.23~rc (a7ed2ee05dc7453942018d7876401c28d3918214)
kclient master-backport
Linux ss1 2.6.36-020636rc7-generic #201010070908 SMP Thu Oct 7 09:10:00 UTC 2010 x86_64 GNU/Linux

ceph (osd0/mounts) crashes as soon as files are written to the disk.
The setup is at the minimal. one high-end machine with 1xMDS, 1xMON and 1xOSD. using ext4
btrfs shows a similar symptom, in short, any moderate IOs crashes.
It seems to crash as soon as journal parts have been filled up.
When crashed, ceph could be stopped, but mounting can not be redone without rebooting the PC.
CTRL-C is unresponsive.

- start mkbtrfs, start ceph, mount, and dd command.

attached tar.gz contains the full log.

[global]
pid file = /var/run/ceph/$name.pid
logger dir = /cephlog
log dir = /cephlog
[mon]
mon data = /data/mon$id
debug ms = 1
debug mon = 20
debug paxos = 20
[mon0]
host = ss1
mon addr = 192.168.1.2:6789
[mds]
debug ms = 1 ; message traffic
debug mds = 20 ; mds
debug mds balancer = 20 ; load balancing
debug mds log = 20 ; mds journaling
debug mds_migrator = 20 ; metadata migration
debug monc = 20 ; monitor interaction, startup
[mds0]
host = ss1
[osd]
osd journal = /data/osd$id/journal
osd journal size = 100
osd data = /data/osd$id
debug ms = 1 ; message traffic
debug osd = 20
debug filestore = 20 ; local object storage
debug journal = 20 ; local journaling
debug monc = 20 ; monitor interaction, startup
[osd0]
host = ss1

debug.tar.gz (548 KB) DongJin Lee, 10/13/2010 03:21 AM

syslog.tar.gz (6.21 KB) DongJin Lee, 10/13/2010 03:34 PM

debug1.tar.gz (502 KB) DongJin Lee, 10/13/2010 03:34 PM

cephlog.tar.gz (441 KB) DongJin Lee, 10/18/2010 03:58 PM

debug1.tar.gz (696 Bytes) DongJin Lee, 10/18/2010 03:58 PM

journal_sys.tar.gz (6.25 KB) DongJin Lee, 12/02/2010 05:14 PM

journal_cephlog.tar.gz (523 KB) DongJin Lee, 12/02/2010 05:14 PM

no_journal_cephlog.tar.gz (4.87 MB) DongJin Lee, 12/02/2010 05:14 PM


Related issues

Related to Ceph - Bug #598: osd: journal reset in parallel mode acts weird Resolved 11/22/2010

History

#1 Updated by DongJin Lee about 10 years ago

Update.

re-ran again, this time capturing sys/kernel/debug/ceph/*/

briefly,
10:55:00 - start the ceph (sudo mkcephfs -c /etc/ceph/ceph.conf --allhosts --clobber_old_data -v ; sudo /etc/init.d/ceph start)
10:55:20 - start the mount (sudo mount -t 192.168.1.2:/ /media/ceph)
10:56:00 - start the dd sudo dd if=/devzero of=/media/ceph/afile bs=1M count=3000
... (no response)
10:57:00 - CTRL-C hangs (no response)
10:57:30 - (other terminal) sudo /etc/init.d/ceph stop (all cephs stopped/unmounted, but the dd command still hanged, can't be killed -9 either.

Another issue is that once system rebooted, the osd0 disk needs to be formatted again (ext4), or else the ceph mount hangs.
I think the previous run must've have corrupted the disk.

Thanks

#2 Updated by Sage Weil about 10 years ago

From the osdc.txt, it looks as if none of the IOs are actually flushing to disk. Can you do a simple test like

echo asdf > foo.txt
sync

and verify that that completes? Similarly, can you try
rados -p casdata put test_object_1 /bin/ls

and verify that that successfully writes an object?

#3 Updated by DongJin Lee about 10 years ago

Thanks.
I've ran the above lines, no crash. But There was nothing in the osdc.
I'm unsure what output to expect from the rados command, but there was no file or output shown.
- I've also copied a small tar files to it, and I could read/copy them back
- It seems that small files (~10MB) are okay. (or maybe the sync)
After that, I've drag-dropped 1GB blank file to the ceph mount, it copied okay, nothing too fast or slow.
I then used the terminal go to to ceph mount, this time ran the command dd if=/dev/zero of=ddFile bs=1M count=1000
This time, it crashed, no outputs, after CTRL-C, the mount is inaccessible similar to last time, i.e., all other mounts are frozen, reboot needed.

#4 Updated by DongJin Lee about 10 years ago

Update: more concise setup :)
I created simple four files; file1 (1MiB), file10 (10MiB), file100 (100MiB), file1000 (1000MiB)
After the ceph mount, I did the following.

root@ss1:/media# cp file1 ceph
root@ss1:/media# sync
root@ss1:/media# cp file10 ceph
root@ss1:/media# sync
root@ss1:/media# cp file100 ceph
root@ss1:/media# sync
^C^C^C^C

The terminal returns immediately after the 'cp' command, the 'sync' took a few seconds for file1, and more for file10, but freezes for file100.
At this point, I stopped the ceph, and got the debugs (both kernel/debug and ceph)
osdc now contains one line. 29 osd0 0.49d2 10000000002.00000018 write

Thanks

#5 Updated by Sage Weil about 10 years ago

  • Assignee set to Yehuda Sadeh

#6 Updated by Yehuda Sadeh about 10 years ago

Looks like some issue with the journal:

2010-10-19 11:42:43.918144 7ffae0acf710 journal room 3928063 max_size 104857600 pos 10276864 header.start 14204928 top 4096
2010-10-19 11:42:43.918166 7ffae0acf710 journal check_for_full at 10276864 : JOURNAL FULL 10276864 >= 3928063 (max_size 104857600 start 14204928)
2010-10-19 11:42:43.918179 7ffae0acf710 journal prepare_multi_write full on first entry, need to wait
2010-10-19 11:42:43.918189 7ffae0acf710 journal write_thread_entry full, going to sleep (waiting for commit)

It doesn't seem to ever wake up.

Also note the following in the log, not sure if related (or is normal) or not:
2010-10-19 11:42:43.148742 7ffada9c2710 osd0 3 pg[0.23( empty n=0 ec=2 les=3 2/2/2) [0] r=0 mlcod 0'0 active+clean+degraded] truncate_seq 1 > current 0, truncating to 18446744073709551615
...
2010-10-19 11:42:43.918269 7ffadf2cc710 filestore(/data/osd0) truncate /data/osd0/current/0.23_head/10000000002.00000016_head size 18446744073709551615
2010-10-19 11:42:43.918281 7ffadf2cc710 filestore(/data/osd0) truncate /data/osd0/current/0.23_head/10000000002.00000016_head size 18446744073709551615 = -22

#7 Updated by DongJin Lee about 10 years ago

Continuing with the ext4
I have set no-journal mode by commenting out the two lines in the ceph
; osd journal = /data/osd$id/journal
; osd journal size = 100

Ceph starts with the warning (About no-journaling)
The mount fails, it hangs (but I can CTRL-C), at the 'wait4' later returns "mount error 5 = Input/output error"

getcwd("/home/aaa/run", 4095)           = 14
readlink("/home/aaa/run/192.168.1.2:", 0x7fffa74e5f20, 4096) = -1 ENOENT (No such file or directory)
stat("/sbin/mount.ceph", {st_mode=S_IFREG|0755, st_size=45894, ...}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fda24d5eab0) = 4203
wait4(-1, mount error 5 = Input/output error
[{WIFEXITED(s) && WEXITSTATUS(s) == 5}], 0, NULL) = 4203
--- SIGCHLD (Child exited) @ 0 (0) ---
exit_group(5)                           = ?

So I have tried btrfs no-jounral. This works. It can actually write with a reasonable speed.
7.2K rpm 2TB HDD, dd goes about 80MiB/s, and does not hang.
With journaling on, the mount hangs often, and even when successfully mounts (after the reboot), it crashes as with ext4.

Also, I'm unsure of proper way to fully-stop/restart the ceph system.
At this stage, I use the script, but it basically does this. Below initializes/stops.

umount /media/ceph -l
/etc/init.d/ceph stop
rm -rf /data
mkdir -p /data/osd0/

mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs  --clobber_old_data  -v
/etc/init.d/ceph start
mount -t ceph 192.168.1.2:/ /media/ceph

The problem is that I often cannot mount again from the previous shut off.
I get similarly the mount error 5 = Input/output error

Thanks

#8 Updated by Yehuda Sadeh about 10 years ago

What's the exact client version and kernel that you're running on?
Please do the following under ceph-client-standalong/:

git-rev-parse HEAD

Also, having a backtrace of the running osd when it's hung (e.g., after the journal was filled like in the previous cases) could be helpful. Something along the lines of:

gdb cosd `pgrep cosd`
...
(gdb) thread apply all bt

#9 Updated by Sage Weil almost 10 years ago

Hi DongJin,

Any luck on this issue? Has the problem gone away, or do you have time to help us track it down?

Thanks-

#10 Updated by Sage Weil almost 10 years ago

  • Priority changed from High to Immediate

#11 Updated by Sage Weil almost 10 years ago

  • Priority changed from Immediate to Normal

#12 Updated by Sage Weil almost 10 years ago

  • Target version changed from v0.23 to v0.23.1

#13 Updated by DongJin Lee almost 10 years ago

Sorry Sage and Yehuda for the late update..
I was spending time experimenting, and just using the default btrfs with no-journal as it just worked.
It was just the ext4 or journaling that failed.

The version
2.6.36-020636rc7-generic
ceph version 0.23~rc (a7ed2ee05dc7453942018d7876401c28d3918214)

git rev-parse HEAD
0184d86d147911efe080dcbbfba40f0b3617659f

#14 Updated by Sage Weil almost 10 years ago

  • Target version changed from v0.23.1 to v0.24

#15 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Can't reproduce

#16 Updated by DongJin Lee almost 10 years ago

Hi all:
Ok, so I gitted again, original/unstable,

- Linux ss1 2.6.36-02063601-generic #201011231330 SMP Tue Nov 23 13:35:03 UTC 2010 x86_64 GNU/Linux
- Ubuntu 10.04 x64
- ceph version shows 0.23 (5d1d8d0c4602be9819cc9f7aea562fccbb005a56)
- system 8 cored Server, single mds, mon, osd, and all default setup, except the repl is set to x1.
- osd is a single HW RAID0 6xSSD, dd reports 500MB/s (4k) to 670MB/s (1M) real.

So I again tried journal mode and no-journal mode. just by commenting out the two lines in ceph.conf. All btrfs
The result is the same. Journal mode crashes badly as soon as I tried to copy, and I can't restart unless I do hard-reset button.
Attached is the sys log and the cephlog.

ceph.conf is as below

[global]
    pid file = /var/run/ceph/$name.pid
logger dir = /cephlog
log dir = /cephlog
[mon]
    mon data = /data/mon$id
debug ms = 1 
debug mon = 20 
debug paxos = 20
[mon0]
    host = ss1
    mon addr = 192.168.1.2:6789
[mds]
debug ms = 1            ; message traffic
debug mds = 20          ; mds
debug mds balancer = 20 ; load balancing
debug mds log = 20      ; mds journaling
debug mds_migrator = 20 ; metadata migration
debug monc = 20         ; monitor interaction, startup
[mds0]
    host = ss1
[osd]
;    osd journal = /data/osd$id/journal
;    osd journal size = 1000
;    filestore journal writeahead = true
    osd data = /data/osd$id
debug ms = 1         ; message traffic
debug osd = 20   
debug filestore = 20 ; local object storage
debug journal = 20   ; local journaling
debug monc = 20      ; monitor interaction, startup
[osd0]
    btrfs devs = /dev/sdb
    host = ss1


Further question about performance, in no-journal mode, the dd shows 150MB/s - 300MB/s (about 25% - 50% of the original performance).
This is using the single replication mode. Whiling dd coping, the iostat disk activity shows 0% mostly on the disk, and every occasion it peaks to 100% on/off. Is this expected? normally dd direct to the disk shows 100% all the time.

Also, I have tried Yehuda's method, gdb cosd `pgrep cosd` and run -i 0 -c /etc/ceph/ceph.conf manually (to start the cosd), and set the tread apply all bt. But where is the output/dump? Sorry I have a little knowledge in these areas..

Many thanks in advance.

Also available in: Atom PDF