API
The spindle-token API is broken up into 4 namespaces:
- The top-level
spindle_tokenmodule containing the most commonly used public functions. - An OPPRL module containing implementations of every OPPRL protocol version, including PII normalization, token specifications, and cryptographic transformations.
- An OPPRL metadata module exposing Spark-free token metadata for lightweight schema inspection.
- A module of
coreabstractions than can be extended in advanced use cases to add additional functionality.
spindle_token
A module containing the main API of spindle-token.
Most users will only need to use the 3 main functions in this top-level module along with the provided configuration objects corresponding to OPPRL tokenization.
The 3 main functions provide tokenization and transcoding capabilities for data senders and recipients respectively.
PiiAttribute
Bases: ABC
An attribute (aka column) of personally identifiable information (PII) to use when constructing tokens.
This abstract base class is intended to be extended by users to add support for building tokens from a custom PII attribute.
Attributes:
| Name | Type | Description |
|---|---|---|
attr_id |
An identifier for the PiiAttribute. Should be unique across all logically different PiiAttributes. |
__init__(attr_id)
Initializes the PiiAttribute with the given globally unique attribute ID.
transform(column, dtype)
abstractmethod
Transforms the raw PII column into a normalized representation.
A normalized value has eliminated all representation or encoding differences so all instances of the same logical values have identical physical values. For example, text attributes will often be normalized by filtering to alpha-numeric characters and whitespace, standardizing all whitespace to the space character, and converting all alpha characters to uppercase to ensure that all ways of representing the same phrase normalize to the exact same string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
Column
|
The spark |
required |
dtype
|
DataType
|
The spark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark Column expression of normalized PII values. |
derivatives()
A collection of PII attributes that can be derived from this PII attribute, including this PiiAttribute.
Returns:
| Type | Description |
|---|---|
dict[str, 'PiiAttribute']
|
A |
dict[str, 'PiiAttribute']
|
PiiAttribute that produce normalized values for each derivative attribute |
dict[str, 'PiiAttribute']
|
from the normalized values of this PiiAttribute. |
Token
dataclass
A specification of a token.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
An identifier safe name for the attribute. Will be used as the column name on dataframes. Must be unique
across other |
protocol |
TokenProtocolFactory
|
An instance of |
attribute_ids |
Iterable[str]
|
A collection of attribute IDs used to lookup instances of |
TokenProtocol
Bases: ABC
An abstract base class for a specific version of the OPPRL tokenization protocol.
This abstract base class is intended to be extended by users who want to implement custom tokenization protocols.
It is assumed that instances of the TokenProtocol will provide any configuration or other inputs
(such as encryption keys) required to produce tokens. See TokenProtocolFactory
for more information.
tokenize(attribute_ids)
abstractmethod
Creates a Column expression for a single token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attribute_ids
|
list[str]
|
A collection |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
transcode_out(token)
abstractmethod
Transcodes the given token into an ephemeral token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
Column
|
A pyspark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
transcode_in(ephemeral_token)
abstractmethod
Transcodes the given ephemeral token into a normal token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ephemeral_token
|
Column
|
A pyspark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
generate_pem_keys(key_size=2048)
Generates a fresh RSA key pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key_size
|
int
|
The size (in bits) of the key. |
2048
|
Returns:
| Type | Description |
|---|---|
tuple[bytes, bytes]
|
A tuple containing the private key and public key bytes. Both in the PEM encoding. |
spindle_token.opprl
Standard configurations for different versions of the OPPRL protocol.
This package is Spark-backed. Importing the package itself is safe, but accessing
OPPRL version objects requires the optional spark extra.
spindle_token.opprl.metadata
spindle_token.core
The core abstractions of spindle-token, including abstract base classes for extending functionality.
The spindle-token library provides base interfaces that cane be extended by users to define custom token specifications to encrypt with existing versions of OPPRL cryptography protocols, or define entirely new tokenization protocols.
Warning
Extending the base classes in this module to customize the tokenization behavior has no security or privacy guarantees. These abstractions -- like all OSS -- are "use at your own risk" and users should only use these advanced features if they understand them.
PiiAttribute
Bases: ABC
An attribute (aka column) of personally identifiable information (PII) to use when constructing tokens.
This abstract base class is intended to be extended by users to add support for building tokens from a custom PII attribute.
Attributes:
| Name | Type | Description |
|---|---|---|
attr_id |
An identifier for the PiiAttribute. Should be unique across all logically different PiiAttributes. |
__init__(attr_id)
Initializes the PiiAttribute with the given globally unique attribute ID.
transform(column, dtype)
abstractmethod
Transforms the raw PII column into a normalized representation.
A normalized value has eliminated all representation or encoding differences so all instances of the same logical values have identical physical values. For example, text attributes will often be normalized by filtering to alpha-numeric characters and whitespace, standardizing all whitespace to the space character, and converting all alpha characters to uppercase to ensure that all ways of representing the same phrase normalize to the exact same string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
Column
|
The spark |
required |
dtype
|
DataType
|
The spark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark Column expression of normalized PII values. |
derivatives()
A collection of PII attributes that can be derived from this PII attribute, including this PiiAttribute.
Returns:
| Type | Description |
|---|---|
dict[str, 'PiiAttribute']
|
A |
dict[str, 'PiiAttribute']
|
PiiAttribute that produce normalized values for each derivative attribute |
dict[str, 'PiiAttribute']
|
from the normalized values of this PiiAttribute. |
TokenProtocol
Bases: ABC
An abstract base class for a specific version of the OPPRL tokenization protocol.
This abstract base class is intended to be extended by users who want to implement custom tokenization protocols.
It is assumed that instances of the TokenProtocol will provide any configuration or other inputs
(such as encryption keys) required to produce tokens. See TokenProtocolFactory
for more information.
tokenize(attribute_ids)
abstractmethod
Creates a Column expression for a single token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attribute_ids
|
list[str]
|
A collection |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
transcode_out(token)
abstractmethod
Transcodes the given token into an ephemeral token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
Column
|
A pyspark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
transcode_in(ephemeral_token)
abstractmethod
Transcodes the given ephemeral token into a normal token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ephemeral_token
|
Column
|
A pyspark |
required |
Returns:
| Type | Description |
|---|---|
Column
|
A pyspark |
TokenProtocolFactory
Bases: ABC, Generic[P]
An abstract base class for factories that instantiate TokenProtocol implementations with user provided encryption keys.
This abstract base class is intended to be extended by users who want to implement custom tokenization protocols.
Attributes:
| Name | Type | Description |
|---|---|---|
factory_id |
An identifier for the |
__init__(factory_id)
Initializes the TokenProtocolFactory with the given globally unique factory ID.
bind(private_key, recipient_public_key)
abstractmethod
Creates an instance of the TokenProtocol with the user provided encryption keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
private_key
|
bytes
|
The private RSA key to use when tokenizing PII and transcoding tokens. |
required |
recipient_public_key
|
bytes | None
|
The public RSA key of the intended data recipient to use when transcoding tokens into ephemeral tokens.
Can be |
required |
Returns:
| Type | Description |
|---|---|
P
|
An instance of a |
Token
dataclass
A specification of a token.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
An identifier safe name for the attribute. Will be used as the column name on dataframes. Must be unique
across other |
protocol |
TokenProtocolFactory
|
An instance of |
attribute_ids |
Iterable[str]
|
A collection of attribute IDs used to lookup instances of |