Preprocess¶
Date published: 21/09/23
- src.preprocess.main(config: DictConfig) None [source]¶
The main entry point for the preprocess pipeline.
- Args:
- config (DictConfig):
The pipeline configuration.
- src.preprocess.write_species_csv(species_code: str, index_df: DataFrame, data_dir: str, cols: str, outfile: str, lowercase_list=[], drop_na=False) None [source]¶
Creates a species dataframe and writes it to a CSV file.
- Args:
- species_code (str):
The shark species code.
- index_df (pd.DataFrame):
The index dataframe containing metadata.
- data_dir (str):
The data directory.
- cols (str):
The dataframe column list.
- outfile (str):
The output file name.
- lowercase_list (list, optional):
Columns to convert all values to lowercase.
- drop_na (bool, optional):
Drop missing values