Project

General

Profile

Clustered SCSI target using RBD » History » Version 4

Mike Christie, 01/28/2016 10:11 PM

1 1 Jessica Mack
h1. Clustered SCSI target using RBD
2
3
h3. Summary
4
5
The goal of this project is to modify the Linux target layer, LIO, to be able to support active/active access to a device across multiple nodes running LIO. The changes to LIO are being done in a generic way to allow other cluster aware devices to be used, but our focus is on using RBD.
6
7
h3. Owners
8
9
* Mike Christie (Red Hat)
10
11
h3. Interested Parties
12
13
* Name (Affiliation)
14
15
h3. Current Status
16
17
There are many methods to configure LIO's iSCSI target for High Availability (HA). None support active/active, and most open source implementations do not support distributed SCSI Persistent Group Reservations.
18 2 Jessica Mack
19
h3. Detailed Description
20
21 1 Jessica Mack
In order for some operating systems to be able to access RBD they must go through a SCSI target gateway. Support for RBD with target layers like LIO and TGT exist today, but the HA implementations are lacking features, difficult to use, or only support one transport like iSCSI. To resolve these issues, we are modifying LIO, so that it can be run on multiple nodes and provide SCSI active-optimized access through all ports on all nodes at the same time.
22
23
There are several areas where active/active support in LIO requires distributed meta data and/or locking: SCSI task management/Unit Attention/PREEMPT AND ABORT handling, COMPARE_AND_WRITE support, Persistent Group Reservations, and INQUIRY/ALUA/discovery/setup related commands and state.
24
25
- SCSI task management (TMF) / Unit Attention (UA) / PREEMPT AND ABORT handling:
26
When a initiator cannot determine the state of a device or commands running on the device, it will send TMFs like LOGICAL UNIT RESET. Depending on the SCSI settings used (TAS, QERR, TST) requests like this may require actions to be taked on all LIO nodes. For example, running commands might need to be aborted, notifications like Unit Attentions must be sent, etc.
27
 
28
Other non TMF requests like PERSISTENT RESERVE IN - PREEMPT AND ABORT may also require commands to be abort on remote nodes.
29
30 3 Mike Christie
To synchronize TMF execution across nodes, the ceph watch notify feature will be used. The initial patches for this were posted on ceph-devel here:
31
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/24553
32
33
The current version with fixes and additions by Douglas Fuller can be found here:
34
https://github.com/fullerdj/ceph-client/commits/wip-djf-watch-notify2-new
35
36
Status:
37
We are currently modifying the block layer to support task management requests, so LIO's iblock backend module can call into Low Level Drivers (LLD) like krbd to perform driver specific actions.
38
39
40 1 Jessica Mack
- COMPARE_AND_WRITE (CAW) support:
41
42
CAW is a SCSI command used by ESX to perform finely grained locking. The execution of CAW requires that the handler atomically read N blocks of data, compare them to a buffer passed in with the command, then if matching write N blocks of data. To guarantee this operation is done atomically, LIO uses a standard linux kernel mutex.  For multiple node active/active support, we have proposed to pass this request to the backing storage. This will allow the backing storage to utilize its own locking and serialization support, and LIO will not need to use a clustered lock like DLM.
43
44
Patches for passing COMPARE_AND_WRITE directly to the backing store have been sent upstream for review:
45
http://www.spinics.net/lists/target-devel/msg07823.html
46
47 3 Mike Christie
The current patches along with ceph/rbd support are here:
48 1 Jessica Mack
49 3 Mike Christie
https://github.com/mikechristie/linux-kernel/commits/ceph
50
51
Status:
52 4 Mike Christie
The request/bio operation patches are waiting to be merged. The ceph/rbd patches will be posted for review when that is done.
53 3 Mike Christie
54 1 Jessica Mack
- Persistent Group Reservations (PGR):
55
56 3 Mike Christie
PGRs allow a initiator to control access to a device. This access information needs to be distributed across all nodes and can be dynamically updated while other commands are being processed.
57 1 Jessica Mack
58 3 Mike Christie
David Disseldorp has implemented ceph/rbd PGR support here:
59 1 Jessica Mack
60 3 Mike Christie
https://git.samba.org/?p=ddiss/linux.git;a=shortlog;h=refs/heads/target_rbd_pr_sq_20160126
61 1 Jessica Mack
62 4 Mike Christie
63
Status:
64
This code is now being ported to the upstream linux kernel reservation API added in this commit:
65 3 Mike Christie
66
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/block/ioctl.c?id=bbd3e064362e5057cc4799ba2e4d68c7593e490b
67
68
When this is completed, LIO will call into the iblock backend which will then call rbd's pr_ops.
69
70
71
- Device state and configuration:
72
73
SUSE's lrbd package will manage configuration:
74
75
https://github.com/SUSE/lrbd
76 1 Jessica Mack
77
h3. Work items
78
79
h4. Coding tasks
80
81
# Task 1
82
# Task 2
83
# Task 3
84
85
h4. Build / release tasks
86
87
# Task 1
88
# Task 2
89
# Task 3
90
91
h4. Documentation tasks
92
93
# Task 1
94
# Task 2
95
# Task 3
96
97
h4. Deprecation tasks
98
99
# Task 1
100
# Task 2
101
# Task 3