Project

General

Profile

Bug #37932

Throttle.cc: 194: FAILED assert(c >= 0) due to invalid ceph_osd_op union

Added by Simon Ruggier 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I've found an obscure bug that has been in Ceph from 0.22 to the current tip of
the master branch thanks to this assertion error that started happening on a
specific RBD in a Ceph cluster we're using in production.

The cause of the assertion failure is a signed integer overflow, as
with issue 9592, only in this
case it's because of a type safety issue on the union in ceph_osd_op.

The notify op gets created in this code path,
which sets op.watch.ver to the last version of the object. In the case
of the problematic volume that triggered this issue for us, the
version is 3197560675. This goes into the union in ceph_osd_op, and
happens to occupy the same position that the extent length would.

When the flow of execution gets to calc_op_budget, the definition of
CEPH_OSD_OP_NOTIFY has the CEPH_OSD_OP_MODE_READ and CEPH_OSD_OP_TYPE_DATA bits
set, so it ends up in the code path that looks at the extent length
(here).

It looks as though the solution to this is to add a more stringent check to
calc_up_budget to ensure that only operation types that use the extent
interpretation of the union at the time the operation is created end
up in the code path that checks extent length, similar to commit 58212b1,
so I'm going to submit a PR with that change shortly.

There's also a small amount of discussion about this on the mailing list.


Related issues

Copied to rbd - Backport #37986: mimic: Throttle.cc: 194: FAILED assert(c >= 0) due to invalid ceph_osd_op union Resolved
Copied to rbd - Backport #37987: luminous: Throttle.cc: 194: FAILED assert(c >= 0) due to invalid ceph_osd_op union Resolved

History

#1 Updated by Patrick Donnelly 5 months ago

  • Project changed from Ceph to rbd
  • Category deleted (Objecter)
  • Status changed from New to Need Review
  • Assignee set to Simon Ruggier
  • Target version set to v14.0.0
  • Backport changed from Mimic, Luminous, Jewel to mimic,luminous
  • Pull request ID set to 25976
  • Affected Versions deleted (v0.22, v0.22.1, v0.22.2, v0.22.3, v0.23, v0.23.1, v0.23.2, v0.24, v0.24.1, v0.24.2, v0.24.3, v0.25, v0.25.1, v0.25.2, v0.25.3, v0.26, v0.26.1, v0.27, v0.27.1, v0.28, v0.29, v0.30, v0.31, v0.32, v0.33, v0.34, v0.35, v0.36, v0.37, v0.38, v0.39, v0.40, v0.41, v0.42, v0.43, v0.44, v0.45, v0.46, v0.47, v0.48, v0.49, v0.50, v0.51, v0.52a, v0.53a, v0.53b, v0.53c, v0.54a, v0.54b, v0.55a, v0.55b, v0.55c, v0.55d, v0.56, v0.57a, v0.57b, v0.57c, v0.58, v0.59, v0.60, v0.61 - Cuttlefish, v0.62a, v0.62b, v0.63, v0.64, v0.65, v0.66, v0.67 - Dumpling, v0.67rc, v0.67rc - continued, v0.68, v0.68 - continued, v0.69, v0.70, v0.71, v0.72 Emperor, v0.73, v0.74, v0.75, v0.76a, v0.76b, v0.77, 0.78, 0.79, 0.80rc, 0.80, v0.81, 0.82, 0.83, 0.83 cont., 0.84, 0.84 cont., 0.85, 0.85 cont., 0.86, 0.88, 0.89, 0.90, v.91, v.actually90, v.actually91, v0.92, v0.93 - Last Hammer Sprint, v0.94, v0.95, v9.0.2, v9.0.3, v9.0.4, v9.0.5, v9.0.6, v9.0.7, v9.0.8, v10.0.4, v0.80.10, v0.80.11, v0.80.12, v0.94.10, v0.94.11, v0.94.2, v0.94.3, v0.94.4, v0.94.5, v0.94.6, v0.94.7, v0.94.8, v0.94.9, v10.0.0, v10.1.1, v10.2.0, v10.2.1, v10.2.10, v10.2.11, v10.2.12, v10.2.2, v10.2.3, v10.2.4, v10.2.5, v10.2.6, v10.2.7, v10.2.8, v10.2.9, v11.1.0, v11.2.0, v11.2.1, v11.2.2, v12.0.0, v12.1.0, v12.2.0, v12.2.1, v12.2.10, v12.2.11, v12.2.2, v12.2.3, v12.2.4, v12.2.5, v12.2.6, v12.2.7, v12.2.8, v12.2.9, v13.0.0, v13.2.0, v13.2.1, v13.2.2, v13.2.3, v13.2.4, v13.2.5, v14.0.0, v15.0.0, v9.1.1, v9.2.1, v9.2.2)

#2 Updated by Patrick Donnelly 5 months ago

  • Status changed from Need Review to Pending Backport

#3 Updated by Nathan Cutler 5 months ago

  • Copied to Backport #37986: mimic: Throttle.cc: 194: FAILED assert(c >= 0) due to invalid ceph_osd_op union added

#4 Updated by Nathan Cutler 5 months ago

  • Copied to Backport #37987: luminous: Throttle.cc: 194: FAILED assert(c >= 0) due to invalid ceph_osd_op union added

#5 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF