feast.staging package

Submodules

feast.staging.entities module

feast.staging.entities.create_bq_view_of_joined_features_and_entities(source: feast.data_source.BigQuerySource, entity_source: feast.data_source.BigQuerySource, entity_names: List[str]) → feast.data_source.BigQuerySource[source]

Creates BQ view that joins tables from source and entity_source with join key derived from entity_names. Returns BigQuerySource with reference to created view.

feast.staging.entities.stage_entities_to_bq(entity_source: pandas.core.frame.DataFrame, project: str, dataset: str) → feast.data_source.BigQuerySource[source]

Stores given (entity) dataframe as new table in BQ. Name of the table generated based on current time. Table will expire in 1 day. Returns BigQuerySource with reference to created table.

feast.staging.entities.stage_entities_to_fs(entity_source: pandas.core.frame.DataFrame, staging_location: str, config: feast.config.Config) → feast.data_source.FileSource[source]

Dumps given (entities) dataframe as parquet file and stage it to remote file storage (subdirectory of staging_location)

Returns

FileSource with remote destination path

feast.staging.entities.table_reference_from_string(table_ref: str)[source]

Parses reference string with format “{project}:{dataset}.{table}” into bigquery.TableReference

feast.staging.storage_client module

class feast.staging.storage_client.AbstractStagingClient[source]

Bases: abc.ABC

Client used to stage files in order to upload or download datasets into a historical store.

abstract download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]

Downloads a file from an object store and returns a TemporaryFile object

abstract list_files(uri: urllib.parse.ParseResult) → List[str][source]

Lists all the files under a directory in an object store.

abstract upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]

Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix

Parameters
  • fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.

  • local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.

  • remote_uri (ParseResult or None) – destination object URI to upload to

  • remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode

  • remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode

Returns

the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.

Return type

ParseResult

class feast.staging.storage_client.AzureBlobClient(account_name: str, account_access_key: str)[source]

Bases: feast.staging.storage_client.AbstractStagingClient

Implementation of AbstractStagingClient for Azure Blob storage

download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]

Downloads a file from Azure blob storage and returns a TemporaryFile object

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“wasbs://bucket@account_name.blob.core.windows.net/file.avro”)

Returns

TemporaryFile object

list_files(uri: urllib.parse.ParseResult) → List[str][source]

Lists all the files under a directory in azure blob storage if path has wildcard(*) character.

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of this location

Returns

A list containing the full path to the file(s) in the

remote staging location.

Return type

List[str]

upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]

Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix

Parameters
  • fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.

  • local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.

  • remote_uri (ParseResult or None) – destination object URI to upload to

  • remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode

  • remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode

Returns

the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.

Return type

ParseResult

class feast.staging.storage_client.GCSClient[source]

Bases: feast.staging.storage_client.AbstractStagingClient

Implementation of AbstractStagingClient for google cloud storage

download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]

Downloads a file from google cloud storage and returns a TemporaryFile object

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“gs://bucket/file.avro”)

Returns

TemporaryFile object

list_files(uri: urllib.parse.ParseResult) → List[str][source]

Lists all the files under a directory in google cloud storage if path has wildcard(*) character.

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of this location

Returns

A list containing the full path to the file(s) in the

remote staging location.

Return type

List[str]

upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]

Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix

Parameters
  • fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.

  • local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.

  • remote_uri (ParseResult or None) – destination object URI to upload to

  • remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode

  • remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode

Returns

the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.

Return type

ParseResult

class feast.staging.storage_client.LocalFSClient[source]

Bases: feast.staging.storage_client.AbstractStagingClient

Implementation of AbstractStagingClient for local file Note: The is used for E2E tests.

download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]

Reads a local file from the disk

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“file:///folder/file.avro”)

Returns

TemporaryFile object

list_files(uri: urllib.parse.ParseResult) → List[str][source]

Lists all the files under a directory in an object store.

upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]

Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix

Parameters
  • fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.

  • local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.

  • remote_uri (ParseResult or None) – destination object URI to upload to

  • remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode

  • remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode

Returns

the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.

Return type

ParseResult

class feast.staging.storage_client.S3Client(endpoint_url: str = None, url_scheme='s3')[source]

Bases: feast.staging.storage_client.AbstractStagingClient

Implementation of AbstractStagingClient for Aws S3 storage

download_file(uri: urllib.parse.ParseResult) → IO[bytes][source]

Downloads a file from AWS s3 storage and returns a TemporaryFile object

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of the file ex: urlparse(“s3://bucket/file.avro”)

Returns

TemporaryFile object

list_files(uri: urllib.parse.ParseResult) → List[str][source]

Lists all the files under a directory in s3 if path has wildcard(*) character.

Parameters

uri (urllib.parse.ParseResult) – Parsed uri of this location

Returns

A list containing the full path to the file(s) in the

remote staging location.

Return type

List[str]

upload_fileobj(fileobj: IO[bytes], local_path: str, *, remote_uri: Optional[urllib.parse.ParseResult] = None, remote_path_prefix: Optional[str] = None, remote_path_suffix: Optional[str] = None) → urllib.parse.ParseResult[source]

Uploads a file to an object store. You can either specify the destination object URI, or destination suffix+prefix. In the latter case, this interface will work as a content-addressable storage and the remote path will be computed using sha256 of the uploaded content as $remote_path_prefix/$sha256$remote_path_suffix

Parameters
  • fileobj (IO[bytes]) – file-like object containing the data to be uploaded. It needs to supports seek() operation in addition to read/write.

  • local_path (str) – a file name associated with fileobj. This param is only used for diagnostic messages. If fileobj is a local file, pass its filename here.

  • remote_uri (ParseResult or None) – destination object URI to upload to

  • remote_path_prefix (str or None) – destination path prefix to upload to when using content-addressable storage mode

  • remote_path_suffix (str or None) – destination path suffix to upload to when using content-addressable storage mode

Returns

the URI to the uploaded file. It would be the same as remote_uri if remote_uri was passed in. Otherwise it will be the path computed from remote_path_prefix and remote_path_suffix.

Return type

ParseResult

feast.staging.storage_client.get_staging_client(scheme, config: feast.config.Config = None) → feast.staging.storage_client.AbstractStagingClient[source]

Initialization of a specific client object(GCSClient, S3Client etc.)

Parameters
  • scheme (str) – uri scheme: s3, gs or file

  • config (Config) – additional configuration

Returns

An object of concrete implementation of AbstractStagingClient

Module contents