feast.staging package¶
Submodules¶
feast.staging.entities module¶
-
feast.staging.entities.
create_bq_view_of_joined_features_and_entities
(source: feast.data_source.BigQuerySource, entity_source: feast.data_source.BigQuerySource, entity_names: List[str]) → feast.data_source.BigQuerySource[source]¶ Creates BQ view that joins tables from source and entity_source with join key derived from entity_names. Returns BigQuerySource with reference to created view.
-
feast.staging.entities.
stage_entities_to_bq
(entity_source: pandas.core.frame.DataFrame, project: str, dataset: str) → feast.data_source.BigQuerySource[source]¶ Stores given (entity) dataframe as new table in BQ. Name of the table generated based on current time. Table will expire in 1 day. Returns BigQuerySource with reference to created table.
-
feast.staging.entities.
stage_entities_to_fs
(entity_source: pandas.core.frame.DataFrame, staging_location: str, config: feast.config.Config) → feast.data_source.FileSource[source]¶ Dumps given (entities) dataframe as parquet file and stage it to remote file storage (subdirectory of staging_location)
- Returns
FileSource with remote destination path
feast.staging.storage_client module¶
-
class
feast.staging.storage_client.
AbstractStagingClient
[source]¶ Bases:
abc.ABC
Client used to stage files in order to upload or download datasets into a historical store.
-
abstract
download_file
(uri: urllib.parse.ParseResult) → IO[bytes][source]¶ Downloads a file from an object store and returns a TemporaryFile object
-
abstract
list_files
(uri: urllib.parse.ParseResult) → List[str][source]¶ Lists all the files under a directory in an object store.
-
abstract
upload_fileobj
(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶ Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
-
abstract
-
class
feast.staging.storage_client.
AzureBlobClient
(account_name: str, account_access_key: str)[source]¶ Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for Azure Blob storage
-
download_file
(uri: urllib.parse.ParseResult) → IO[bytes][source]¶ Downloads a file from Azure blob storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“wasbs://bucket@account_name.blob.core.windows.net/file.avro”)
- Returns
TemporaryFile object
-
list_files
(uri: urllib.parse.ParseResult) → List[str][source]¶ Lists all the files under a directory in azure blob storage if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
-
upload_fileobj
(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶ Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
-
-
class
feast.staging.storage_client.
GCSClient
[source]¶ Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for google cloud storage
-
download_file
(uri: urllib.parse.ParseResult) → IO[bytes][source]¶ Downloads a file from google cloud storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“gs://bucket/file.avro”)
- Returns
TemporaryFile object
-
list_files
(uri: urllib.parse.ParseResult) → List[str][source]¶ Lists all the files under a directory in google cloud storage if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
-
upload_fileobj
(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶ Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
-
-
class
feast.staging.storage_client.
LocalFSClient
[source]¶ Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for local file Note: The is used for E2E tests.
-
download_file
(uri: urllib.parse.ParseResult) → IO[bytes][source]¶ Reads a local file from the disk
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“file:///folder/file.avro”)
- Returns
TemporaryFile object
-
list_files
(uri: urllib.parse.ParseResult) → List[str][source]¶ Lists all the files under a directory in an object store.
-
upload_fileobj
(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶ Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
-
-
class
feast.staging.storage_client.
S3Client
(endpoint_url: str = None, url_scheme='s3')[source]¶ Bases:
feast.staging.storage_client.AbstractStagingClient
Implementation of AbstractStagingClient for Aws S3 storage
-
download_file
(uri: urllib.parse.ParseResult) → IO[bytes][source]¶ Downloads a file from AWS s3 storage and returns a TemporaryFile object
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“s3://bucket/file.avro”)
- Returns
TemporaryFile object
-
list_files
(uri: urllib.parse.ParseResult) → List[str][source]¶ Lists all the files under a directory in s3 if path has wildcard(*) character.
- Parameters
uri (urllib.parse.ParseResult) – Parsed uri of this location
- Returns
- A list containing the full path to the file(s) in the
remote staging location.
- Return type
List[str]
-
upload_fileobj
(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]¶ Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix
- Parameters
fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.
local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.
remote_uri (ParseResult or None) – destination object URI to upload to
remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode
remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode
- Returns
the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.
- Return type
ParseResult
-