Preprocess

Date published: 21/09/23

src.preprocess.main(config: DictConfig) None[source]

The main entry point for the preprocess pipeline.

Args:
config (DictConfig):

The pipeline configuration.

src.preprocess.write_species_csv(species_code: str, index_df: DataFrame, data_dir: str, cols: str, outfile: str, lowercase_list=[], drop_na=False) None[source]

Creates a species dataframe and writes it to a CSV file.

Args:
species_code (str):

The shark species code.

index_df (pd.DataFrame):

The index dataframe containing metadata.

data_dir (str):

The data directory.

cols (str):

The dataframe column list.

outfile (str):

The output file name.

lowercase_list (list, optional):

Columns to convert all values to lowercase.

drop_na (bool, optional):

Drop missing values