Feature #16562: rados put: use the FULL_TRY flag to report errors when cluster is full - Ceph - Ceph

Actions

Copy link

Feature #16562

open

rados put: use the FULL_TRY flag to report errors when cluster is full

Added by Matthew Sure almost 8 years ago. Updated almost 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

ceph cli

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Hello,
I'm brand new to Ceph. I have 3 OSDs (These are slow Pine64 devices using USB thumb drives. Just some testing I'm doing to learn CEPH.) I created a single pool with a max quota of 1GB (ceph osd pool set-quota ubuntublockdev max_bytes 1073741824). I then attempted to copy a 500MB data file, 3 times, into the pool. I was curious as to the response when the quota was reached. I expected some sort of error message, similar to what mv/cp/rsync would give when a disk is out of space. What I got, instead, was a hanging process.

root@pine1:~/my-cluster# time rados put 500mbdata1 /root/500mb.data --pool=ubuntublockdev

real    4m37.586s
user    0m0.480s
sys    0m1.740s
root@pine1:~/my-cluster# time rados put 500mbdata1 /root/500mb.data --pool=ubuntublockdev

real    4m34.522s
user    0m0.620s
sys    0m1.600s

root@pine1:~/my-cluster# time rados put 500mbdata2 /root/500mb.data --pool=ubuntublockdev
<has not returned in over 14 hours>

root@pine2:/var/log/ceph# ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.01176 root default
-2 0.00269     host pine4
 0 0.00269         osd.0       up  1.00000          1.00000
-3 0.00639     host pine3
 1 0.00639         osd.1       up  1.00000          1.00000
-4 0.00269     host pine2
 2 0.00269         osd.2       up  1.00000          1.00000

root@pine2:/var/log/ceph# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
 0 0.00269  1.00000  2814M 34696k 2513M  1.20 0.07  67
 1 0.00639  1.00000  6676M  1035M 5375M 15.51 0.92 113
 2 0.00269  1.00000  2814M  1003M 1545M 35.63 2.12  76
              TOTAL 12306M  2072M 9434M 16.84
MIN/MAX VAR: 0.07/2.12  STDDEV: 14.13

root@pine2:/var/log/ceph# ceph -s
    cluster 34c9464a-c307-4fa5-bbe8-0f654813982e
     health HEALTH_WARN
            pool 'ubuntublockdev' is full
     monmap e4: 4 mons at {pine1=10.10.10.21:6789/0,pine2=10.10.10.14:6789/0,pine3=10.10.10.13:6789/0,pine4=10.10.10.23:6789/0}
            election epoch 10, quorum 0,1,2,3 pine3,pine2,pine1,pine4
     osdmap e19: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v436: 128 pgs, 2 pools, 1032 MB data, 3 objects
            2072 MB used, 9434 MB / 12306 MB avail
                 128 active+clean

Logs:

2016-06-30 05:23:32.599870 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v418: 128 pgs: 128 active+clean; 1000 MB data, 2108 MB used, 9398 MB / 12306 MB avail
2016-06-30 05:23:33.667407 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v419: 128 pgs: 128 active+clean; 1004 MB data, 2108 MB used, 9398 MB / 12306 MB avail; 82068 B/s wr, 0 op/s
2016-06-30 05:23:37.598989 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v420: 128 pgs: 128 active+clean; 1004 MB data, 2108 MB used, 9398 MB / 12306 MB avail; 819 kB/s wr, 0 op/s
2016-06-30 05:23:38.674872 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v421: 128 pgs: 128 active+clean; 1016 MB data, 2108 MB used, 9398 MB / 12306 MB avail; 2458 kB/s wr, 0 op/s
2016-06-30 05:23:42.616913 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v422: 128 pgs: 128 active+clean; 1016 MB data, 2116 MB used, 9390 MB / 12306 MB avail; 2457 kB/s wr, 0 op/s
2016-06-30 05:23:43.682803 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v423: 128 pgs: 128 active+clean; 1024 MB data, 2128 MB used, 9378 MB / 12306 MB avail; 1632 kB/s wr, 0 op/s
2016-06-30 05:23:45.023717 7fa84073e0  0 log_channel(cluster) log [WRN] : pool 'ubuntublockdev' is full (reached quota's max_bytes: 1024M)
2016-06-30 05:23:45.092333 7fa94073e0  1 mon.pine3@0(leader).osd e19 e19: 3 osds: 3 up, 3 in
2016-06-30 05:23:45.118943 7fa94073e0  0 log_channel(cluster) log [INF] : osdmap e19: 3 osds: 3 up, 3 in
2016-06-30 05:23:45.180069 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v424: 128 pgs: 128 active+clean; 1024 MB data, 2128 MB used, 9378 MB / 12306 MB avail; 3206 kB/s wr, 0 op/s
2016-06-30 05:23:47.593415 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v425: 128 pgs: 128 active+clean; 1024 MB data, 2140 MB used, 9366 MB / 12306 MB avail
2016-06-30 05:23:48.650748 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v426: 128 pgs: 128 active+clean; 1032 MB data, 2040 MB used, 9466 MB / 12306 MB avail; 2354 kB/s wr, 0 op/s
2016-06-30 05:23:52.545141 7fa94073e0  0 log_channel(cluster) log [INF] : pgmap v427: 128 pgs: 128 active+clean; 1032 MB data, 2040 MB used, 9466 MB / 12306 MB avail; 1638 kB/s wr, 0 op/s
2016-06-30 05:23:53.529835 7fa84073e0  0 log_channel(cluster) log [INF] : HEALTH_WARN; pool 'ubuntublockdev' is full

The line that says HEALTH_WARN repeats every hour, on the hour. Again, the original rados put command still has not returned nor errored.

root@pine2:/var/log/ceph# ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

root@pine2:/var/log/ceph# uname -a
Linux pine2 3.10.102-0-pine64-longsleep #7 SMP PREEMPT Fri Jun 17 21:30:48 CEST 2016 aarch64 aarch64 aarch64 GNU/Linux

root@pine2:/var/log/ceph# cat /etc/debian_version
stretch/sid

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #16562

rados put: use the FULL_TRY flag to report errors when cluster is full

Updated by Matthew Sure almost 8 years ago

Updated by Matthew Sure almost 8 years ago

Updated by Matthew Sure almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Greg Farnum almost 7 years ago

Updated by Josh Durgin almost 7 years ago

Updated by Matthew Sure almost 7 years ago

Updated by Josh Durgin almost 7 years ago

Updated by Matthew Sure almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Matthew Sure almost 7 years ago

Updated by Josh Durgin almost 7 years ago