Project

General

Profile

Actions

Bug #531

closed

Journaling Causes System Hang

Added by Bryan Tong over 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
OSD
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

It seems that when doing a large write once the journal fills up the system goes into a state of lock and has trouble deciding how to proceed.

Steps to reproduce

1) Create a Cluster using separate drives as the Journal. In our case 32GB SSD x 4.
2) Write a file from the client large enough to fill the journals. 60GB will do since replication will add another 60GB to the write.

Expected results

1) The write will continue to write through until completed at the speed of a smaller write. In our case 300Mb/s+

Actual results

1) The write stalls when it hits the limit of the journals and hangs for up to 15 minutes while the journals slowly commit to disk. Write speed drops to below 40MB/s.

Workaround

1) Running without journals causes the problem to go away.

Caveats

1) When running without journals smaller block sizes will not write correctly. For example, creating a filesystem on an image file takes hours for a simple 40GB file.

Theoretical Solution

The journaler needs to start committing before it fills up and should throttle the incoming data before the system locks up completely.

Thanks

Actions

Also available in: Atom PDF