Project

General

Profile

Actions

Bug #22978

closed

assert in Throttle.cc on primary OSD with Bluestore and an erasure coded pool.

Added by Subhachandra Chandra about 6 years ago. Updated about 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

An assert is being hit in Throttle.cc with BlueStore, when a large object is being written on the primary OSD in an erasure encoded pool.

Ceph version: Luminous 12.2.2
Object size: Primary OSD crashes at size 152MB, and works at 144MB, 136MB, and 128MB
Client: Single client using go-ceph. There is only one client writing into the cluster synchronously.
Pool: Erasure code with RS
API: The following snippet causes the crash
if err := ioctx.WriteFull(name, data); err != nil {
vlog.Infof("Write returning error: %v", err)
return 0, err
}

As a workaround, using append and smaller chunks works all the way up to a 256MB object which is the limit set on the cluster.
datalen := len(data)
chunkSize := 48 * 1024 * 1024
for i := 0; i < datalen; i += chunkSize {

end := i + chunkSize
if end > datalen {
end = datalen
}
vlog.Infof("Appending bytes: %v-%v", i, end)
if err := ioctx.Append(name, data[i:end]); err != nil {
vlog.Infof("Appending bytes error: %v", err)
return 0, err
}
}

Backtrace, "ceph -s" and config follow. A full config dump from an OSD is attached.

Feb 09 23:15:58 ceph-osd-run.sh436441: /build/ceph-12.2.2/src/common/Throttle.cc: In function 'int64_t Throttle::put(int64_t)' thread 7f8e3aa4d700 time 2018-02-09 23:15:58.874618
Feb 09 23:15:58 ceph-osd-run.sh436441: /build/ceph-12.2.2/src/common/Throttle.cc: 232: FAILED assert(static_cast<int64_t>(count) >= c)
Feb 09 23:15:58 ceph-osd-run.sh436441: ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
Feb 09 23:15:58 ceph-osd-run.sh436441: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558c6fdc5892]
Feb 09 23:15:58 ceph-osd-run.sh436441: 2: (Throttle::put(long)+0x38e) [0x558c6fdbc72e]
Feb 09 23:15:58 ceph-osd-run.sh436441: 3: (BlueStore::_kv_sync_thread()+0x1992) [0x558c6fc59f12]
Feb 09 23:15:58 ceph-osd-run.sh436441: 4: (BlueStore::KVSyncThread::entry()+0xd) [0x558c6fc9df8d]
Feb 09 23:15:58 ceph-osd-run.sh436441: 5: (()+0x76ba) [0x7f8e4c5c46ba]
Feb 09 23:15:58 ceph-osd-run.sh436441: 6: (clone()+0x6d) [0x7f8e4b63b3dd]
Feb 09 23:15:58 ceph-osd-run.sh436441: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Feb 09 23:15:58 ceph-osd-run.sh436441: 2018-02-09 23:15:58.876394 7f8e3aa4d700 -1 /build/ceph-12.2.2/src/common/Throttle.cc: In function 'int64_t Throttle::put(int64_t)' thread 7f8e3aa4d700 time 2018-02-09 23:15:58.874618
Feb 09 23:15:58 ceph-osd-run.sh436441: /build/ceph-12.2.2/src/common/Throttle.cc: 232: FAILED assert(static_cast<int64_t>(count) >= c)
Feb 09 23:15:58 ceph-osd-run.sh436441: ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
Feb 09 23:15:58 ceph-osd-run.sh436441: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558c6fdc5892]
Feb 09 23:15:58 ceph-osd-run.sh436441: 2: (Throttle::put(long)+0x38e) [0x558c6fdbc72e]
Feb 09 23:15:58 ceph-osd-run.sh436441: 3: (BlueStore::_kv_sync_thread()+0x1992) [0x558c6fc59f12]
Feb 09 23:15:58 ceph-osd-run.sh436441: 4: (BlueStore::KVSyncThread::entry()+0xd) [0x558c6fc9df8d]
Feb 09 23:15:58 ceph-osd-run.sh436441: 5: (()+0x76ba) [0x7f8e4c5c46ba]
Feb 09 23:15:58 ceph-osd-run.sh436441: 6: (clone()+0x6d) [0x7f8e4b63b3dd]
Feb 09 23:15:58 ceph-osd-run.sh436441: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Feb 09 23:15:58 ceph-osd-run.sh436441: 0> 2018-02-09 23:15:58.876394 7f8e3aa4d700 -1 /build/ceph-12.2.2/src/common/Throttle.cc: In function 'int64_t Throttle::put(int64_t)' thread 7f8e3aa4d700 time 2018-02-09 23:15:58.874618

root@ctrl1:/# ceph -s
cluster:
id: efedf32b-603a-44aa-a148-49f1c3e34701
health: HEALTH_WARN
noout,noscrub,nodeep-scrub flag(s) set
121 nearfull osd(s)
2 pool(s) nearfull

services:
mon: 3 daemons, quorum ctrl1,ctrl2,ctrl3
mgr: ctrl1(active), standbys: ctrl3, ctrl2
osd: 540 osds: 540 up, 540 in
flags noout,noscrub,nodeep-scrub
data:
pools: 2 pools, 66560 pgs
objects: 9292k objects, 2179 TB
usage: 3288 TB used, 641 TB / 3929 TB avail
pgs: 66560 active+clean

Config
-------
[DEFAULT]
mon = None

[global]
cluster network = 192.168.13.0/24
fsid = efedf32b-603a-44aa-a148-49f1c3e34701
mon host = 172.16.13.101,172.16.13.102,172.16.13.103
mon initial members = ctrl1,ctrl2,ctrl3
mon_max_pg_per_osd = 1500
mon_osd_backfillfull_ratio = 0.95
mon_osd_down_out_interval = 900
mon_osd_full_ratio = 0.95
mon_osd_nearfull_ratio = 0.85
osd_crush_chooseleaf_type = 3
osd_max_pg_per_osd_hard_ratio = 1.0
public network = 172.16.13.0/24

[osd]
osd_deep_scrub_interval = 2419200
osd_deep_scrub_stride = 4194304
osd_max_backfills = 20
osd_max_object_size = 268435456
osd_max_write_size = 256
osd_pool_erasure_code_stripe_unit = 4194304
osd_recovery_max_active = 15


Files

config-show.txt (55.1 KB) config-show.txt Config Show output from and OSD Subhachandra Chandra, 02/10/2018 12:00 AM
Actions #1

Updated by Igor Fedotov about 6 years ago

  • Status changed from New to Duplicate

Looks like a duplicate of http://tracker.ceph.com/issues/22539. Should be fixed in v12.2.3

Actions

Also available in: Atom PDF