Project

General

Profile

Actions

Bug #54653

closed

crash: uint64_t CephFuse::Handle::fino_snap(uint64_t): assert(stag_snap_map.count(stag))

Added by Telemetry Bot about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Telemetry
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):

2bd08dd44879d92c51e098fd4fca0b83aafc8f7f2555b0fe94fc1dfca1447588
318c7cac3d403656adeeeb6000e1e23f9b396317c45d94cf74ec251449864145
665f9843ed3bc75751102736ae69b71f96c754789eeef57932785e4d85eadf98
e8ac3d63428f21e50eb53cba40ea83f9a9b6d984ee1c77e94e45609b855893b4
fb350f773e244ee08a9ad5cf26df51c1db9d3aca1e561b18f7ec98db8b8945b7
0ad8046071bf41e3eceea3c26d76fea7c8ee40d9e9c3b16295df3260902e7ae9
2da7b6246fe74b15dd4b61caa1f523a15a326f363fa1de0026805345a677bbf9
fa97bbed6578acd5bb9c7f2c77cd2d8d02900feedb05e7c19635ebc3adf320b6
fb3d5e9ab32e10fcc6fa951aa6abc3e5308652ea5fb69821fe3666ed7ba23767
c186283b5deddb185f049c9225a8a6bdab9da52a42eba381ba4e434b495f44b5
fc5bbd537e21f8734a10f60e8ba55576562a299a729c02bcc32ea9fe5d76e119
44b295f403ed0dd160aaf0fb363798d7fb7b099f3abc0530baa5cf95f474b5b5
3cc66cf8b39f301ffac5cec100c47110fe1e760665355af5367efa3aaf641764


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=89d76bc0664d80c1ac55706e8afabbabad4759f4b94323275cbf682195b3b173

Assert condition: stag_snap_map.count(stag)
Assert function: uint64_t CephFuse::Handle::fino_snap(uint64_t)

Sanitized backtrace:

    CephFuse::Handle::fino_snap(unsigned long)
    CephFuse::Handle::iget(unsigned long)

Crash dump sample:
{
    "assert_condition": "stag_snap_map.count(stag)",
    "assert_file": "client/fuse_ll.cc",
    "assert_func": "uint64_t CephFuse::Handle::fino_snap(uint64_t)",
    "assert_line": 1412,
    "assert_msg": "client/fuse_ll.cc: In function 'uint64_t CephFuse::Handle::fino_snap(uint64_t)' thread ffff46ffb880 time 2022-03-07T00:47:36.958893+0100\nclient/fuse_ll.cc: 1412: FAILED ceph_assert(stag_snap_map.count(stag))",
    "assert_thread_name": "ceph-fuse",
    "backtrace": [
        "__kernel_rt_sigreturn()",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x194) [0xffff9f047958]",
        "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xffff9f047ac8]",
        "(CephFuse::Handle::fino_snap(unsigned long)+0x150) [0xaaaada1e99d8]",
        "(CephFuse::Handle::iget(unsigned long)+0x34) [0xaaaada1e9a2c]",
        "ceph-fuse(+0x98500) [0xaaaada1eb500]",
        "/lib/aarch64-linux-gnu/libfuse.so.2(+0x15064) [0xffffa7a55064]",
        "/lib/aarch64-linux-gnu/libfuse.so.2(+0x12158) [0xffffa7a52158]",
        "/lib/aarch64-linux-gnu/libpthread.so.0(+0x751c) [0xffff9edc051c]",
        "/lib/aarch64-linux-gnu/libc.so.6(+0xd122c) [0xffff9eafa22c]" 
    ],
    "ceph_version": "16.2.7",
    "crash_id": "2022-03-06T23:47:37.017833Z_7b408f27-3b91-446d-b5a5-b90245f424a9",
    "entity_name": "client.779720a56c617f4713d46a0389a5f0b5c78d2903",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "20.04.4 LTS (Focal Fossa)",
    "os_version_id": "20.04",
    "process_name": "ceph-fuse",
    "stack_sig": "e8ac3d63428f21e50eb53cba40ea83f9a9b6d984ee1c77e94e45609b855893b4",
    "timestamp": "2022-03-06T23:47:37.017833Z",
    "utsname_machine": "aarch64",
    "utsname_release": "5.4.0-1015-raspi",
    "utsname_sysname": "Linux",
    "utsname_version": "#15-Ubuntu SMP Fri Jul 10 05:34:24 UTC 2020" 
}


Subtasks 1 (0 open1 closed)

Bug #56774: crash: Client::_get_vino(Inode*)Duplicate

Actions

Related issues 4 (0 open4 closed)

Has duplicate CephFS - Bug #56263: crash: Client::_get_vino(Inode*)Duplicate

Actions
Has duplicate CephFS - Bug #56380: crash: Client::_get_vino(Inode*)Duplicate

Actions
Copied to CephFS - Backport #56055: quincy: crash: uint64_t CephFuse::Handle::fino_snap(uint64_t): assert(stag_snap_map.count(stag))ResolvedXiubo LiActions
Copied to CephFS - Backport #56056: pacific: crash: uint64_t CephFuse::Handle::fino_snap(uint64_t): assert(stag_snap_map.count(stag))ResolvedXiubo LiActions
Actions #1

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.1, v16.2.6, v16.2.7 added
Actions #2

Updated by Venky Shankar about 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Xiubo Li
  • Crash signature (v1) updated (diff)
Actions #3

Updated by Xiubo Li about 2 years ago

There has one issue in ceph-fuse code that when lookup/create/mkdir/link/readdir, etc, it will make fake fuse inode numbers and return to libfuse:

1, map the vino.snapid to a stag number from the stag pool which the range is [1, 0xffff], the stag number 0 is reserved for all the CEPH_NOSNAP, and also should reserve one for all the CPEH_SNAPDIR.
2, the make fuse fake 64-bits inode, which will be ((stag << 48) | (vino.ino)). NOTE: the ceph vino.ino will always have no more than 48 bits.

That means if there have more than (2^16 - 1 - 1), two of which are reserved for CEPH_NOSNAP and CEPH_SNAPDIR, snapshots, the must have at least two snapshots have the same stag number. But if these two snapshots are from the same directory, that means two snapshots will have exactly the same fuse inode number.

Normally creating more than (2^16 - 1 - 1) snapshots at the same time seems impossible. And also the stags in the stag pool could be reused by different snapshots if users won't open or get ref of them at the same time. But this still be a problem in case, because in readdir when filling the dentries, the libcephfs won't and no need to increase the snap references, and then the related stags could be reused by other snapshots later.

We'd better to limit the totally snapshots in the ceph-fuse mounter to (2^16 - 1 - 1). IMO we can check this from the global snaprealm->my_snap vector.

Or we should limit this in MDS side ?

Actions #4

Updated by Xiubo Li about 2 years ago

  • Pull request ID set to 45614
Actions #5

Updated by Xiubo Li about 2 years ago

  • Status changed from Triaged to Fix Under Review
Actions #6

Updated by Venky Shankar almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to quincy, pacific
Actions #7

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56055: quincy: crash: uint64_t CephFuse::Handle::fino_snap(uint64_t): assert(stag_snap_map.count(stag)) added
Actions #8

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56056: pacific: crash: uint64_t CephFuse::Handle::fino_snap(uint64_t): assert(stag_snap_map.count(stag)) added
Actions #9

Updated by Telemetry Bot almost 2 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v17.2.0 added
Actions #10

Updated by Luis Henriques almost 2 years ago

  • Crash signature (v1) updated (diff)

It looks like the fix for this bug has broken the build for the latest versions of fuse3 (fyi the fuse version I've on my box is 3.11.0):

[104/954] Building CXX object src/CMakeFiles/ceph-fuse.dir/client/fuse_ll.cc.o
FAILED: src/CMakeFiles/ceph-fuse.dir/client/fuse_ll.cc.o 
/usr/bin/ccache /usr/bin/c++ -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/cephdev/dev/ceph/ceph/build/src/include -I/home/cephdev/dev/ceph/ceph/src -isystem /home/cephdev/dev/ceph/ceph/build/boost/include -isystem /home/cephdev/dev/ceph/ceph/build/include -isystem /home/cephdev/dev/ceph/ceph/src/xxHash -isystem /home/cephdev/dev/ceph/ceph/src/rapidjson/include -isystem /home/cephdev/dev/ceph/ceph/src/fmt/include -isystem /usr/include/fuse3 -Og -g -fPIE -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fno-new-ttp-matching -DCEPH_DEBUG_MUTEX -fstack-protector-strong -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++17 -MD -MT src/CMakeFiles/ceph-fuse.dir/client/fuse_ll.cc.o -MF src/CMakeFiles/ceph-fuse.dir/client/fuse_ll.cc.o.d -o src/CMakeFiles/ceph-fuse.dir/client/fuse_ll.cc.o -c /home/cephdev/dev/ceph/ceph/src/client/fuse_ll.cc
/home/cephdev/dev/ceph/ceph/src/client/fuse_ll.cc: In constructor ‘CephFuse::Handle::Handle(Client*, int)’:
/home/cephdev/dev/ceph/ceph/src/client/fuse_ll.cc:1379:1: error: expected identifier before ‘{’ token
 1379 | {
      | ^
[111/954] Building CXX object src/auth/CMakeFiles/common-auth-objs.dir/cephx/CephxClientHandler.cc.o
ninja: build stopped: subcommand failed.

I guess that it's better to hold on merging the backports (I haven't tested those).

Actions #11

Updated by Xiubo Li almost 2 years ago

Luis Henriques wrote:

It looks like the fix for this bug has broken the build for the latest versions of fuse3 (fyi the fuse version I've on my box is 3.11.0):

[...]

I guess that it's better to hold on merging the backports (I haven't tested those).

Fixed it in https://github.com/ceph/ceph/pull/47026, could you help test it ? And I have marked the backport PRs as Draft already.

Actions #12

Updated by Yaarit Hatuka almost 2 years ago

  • Has duplicate Bug #56263: crash: Client::_get_vino(Inode*) added
Actions #13

Updated by Yaarit Hatuka almost 2 years ago

  • Has duplicate Bug #56380: crash: Client::_get_vino(Inode*) added
Actions #14

Updated by Telemetry Bot almost 2 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v17.2.1 added
Actions #15

Updated by Telemetry Bot almost 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions #16

Updated by Telemetry Bot almost 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions #17

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #18

Updated by Xiubo Li over 1 year ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)

I found one case could cause this, such as in the xfstests-dev's open_by_handle.c, which will use the name_to_handle_at() to store the struct fid, which will contains the ino#, but the name_to_handle_at() won't open the file.

118 struct fid {
119         union {
120                 struct {
121                         u32 ino;                                                                                                                   
122                         u32 gen;
123                         u32 parent_ino;                     
124                         u32 parent_gen;                     
125                 } i32;
126                 struct {
127                         u32 block;
128                         u16 partref;
129                         u16 parent_partref;
130                         u32 generation;                     
131                         u32 parent_block;
132                         u32 parent_generation;              
133                 } udf;
134                 __u32 raw[0];
135         };
136 };

Then the test case could use the above struct fid to open the file later by using open_by_handle_at(). So when opening the file later the file could be already deleted and the ino# reused.

Actions #19

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • Tags deleted (backport_processed)
Actions

Also available in: Atom PDF