Module pipelines.rj_smtr.registros_ocr_rir.tasks

Tasks for registros_ocr_rir

Functions

def download_and_save_local(file_info: list)

Download files from FTP

Args

file_info : list
containing dicts representing each file

found.

Returns

dict
updated file info with the local path for the downloaded

file.

def get_files_from_ftp(dump: bool = False, execution_time: str = None, wait=None)

Search FTP for files created in the same minute as the capture time.

Args

dump : bool, optional
if True will dump all files found on the FTP.
Defaults to False.
execution_time : str, optional
optionally, search for a file created
at a given minute. Defaults to None.
wait : optional
used to create an upstream dependency with a previous

task.

Returns

dict
'capture' is a flag for skipping tasks if no files were found,

'file_info' is the info for processing the captured file

def pre_treatment_ocr(file_info: list)

Standardize columns

Args

file_info : list
containing dicts representing each file

found.

Returns

str
path to the table folder containing partitioned files