Project

General

Profile

Actions

Bug #65281

open

Ceph with SPDK driver can not write data to NVMe-oF(TCP) device.

Added by Alice Wang about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1.In the host, we use ceph (v18.2.2) with SPDK driver connect a NVMe-oF over TCP device, and then strat a cluster through below command:
vstart.sh --new -X --localhost --bluestore-spdk "trtype:tcp traddr:192.168.100.10 adrfam:IPv4 subnqn:nqn.2016-06.io.spdk:cnode1 trsvcid:4420" --bluestore

The NVMe-oF target is created by SPDK driver with an NVMe SSD.
The cluster started succesfully.

2. Then we create storage pool named "testbench" through below command:
ceph osd pool create testbench 100 100

The storage pool is created succesfully.

3. But when we try to write data to "testbench", we found there is nothing data written:
command:rados bench -p testbench 10 write --no-cleanup
result: Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_BlueField_1745819
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 16 0 0 0 - 0
3 16 16 0 0 0 - 0
4 16 16 0 0 0 - 0
5 16 16 0 0 0 - 0
6 16 16 0 0 0 - 0
7 16 16 0 0 0 - 0
8 16 16 0 0 0 - 0
9 16 16 0 0 0 - 0
10 16 16 0 0 0 - 0
11 16 16 0 0 0 - 0
12 16 16 0 0 0 - 0
13 16 16 0 0 0 - 0
14 16 16 0 0 0 - 0
15 16 16 0 0 0 - 0
16 16 16 0 0 0 - 0
17 16 16 0 0 0 - 0
18 16 16 0 0 0 - 0
19 16 16 0 0 0 - 0

At the same time, we found there is a FALIED output:
2024-04-03T10:30:22.890+0800 ffff8ed0bb80 -1 /home/lxb/ceph/src/blk/spdk/NVMEDevice.cc: In function 'SharedDriverQueueData::SharedDriverQueueData(NVMEDevice*, SharedDriverData*)' thread ffff90d4bb80 time 2024-04-03T10:30:22.890858+0800
/home/lxb/ceph/src/blk/spdk/NVMEDevice.cc: 245: FAILED ceph_assert(qpair != __null)

ceph version 19.0.0-1218-g57856522a6a (57856522a6a4dd4c69d9b3b305d29c5559a3da18) squid (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x134) [0xaaaac5632fdc]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaac5633154]
3: (SharedDriverQueueData::SharedDriverQueueData(NVMEDevice*, SharedDriverData*)+0x8a4) [0xaaaac60f0084]
4: (NVMEDevice::aio_submit(IOContext*)+0x138) [0xaaaac60ed4c8]
5: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x4dc) [0xaaaac5bcd690]
6: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x78) [0xaaaac58587dc]
7: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x53c) [0xaaaac5a8b14c]
8: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x540) [0xaaaac57e8b40]
9: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xa24) [0xaaaac5834438]
10: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2178) [0xaaaac5836ecc]
11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1ac) [0xaaaac56c08cc]
12: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x80) [0xaaaac599a460]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x524) [0xaaaac56d7e34]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaac5ce2a98]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaac5ce5468]
16: /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8) [0xffffb680d5c8]
17: /lib/aarch64-linux-gnu/libc.so.6(+0xe5edc) [0xffffb6875edc]

It seems like the spdk create qpair failed.
But we tried ceph+spdk with local NVMe device ---- the result is OK.
Is it a bug of ceph+SPDK+NVMe-oF device issue? Can anyone help us to fix it?

No data to display

Actions

Also available in: Atom PDF