Module pipelines.rj_smtr.registros_ocr_rir.tasks
Tasks for registros_ocr_rir
Functions
def download_and_save_local(file_info: list)
-
Download files from FTP
Args
file_info
:list
- containing dicts representing each file
found.
Returns
dict
- updated file info with the local path for the downloaded
file.
def get_files_from_ftp(dump: bool = False, execution_time: str = None, wait=None)
-
Search FTP for files created in the same minute as the capture time.
Args
dump
:bool
, optional- if True will dump all files found on the FTP.
- Defaults to False.
execution_time
:str
, optional- optionally, search for a file created
- at a given minute. Defaults to None.
wait
:optional
- used to create an upstream dependency with a previous
task.
Returns
dict
- 'capture' is a flag for skipping tasks if no files were found,
'file_info' is the info for processing the captured file
def pre_treatment_ocr(file_info: list)
-
Standardize columns
Args
file_info
:list
- containing dicts representing each file
found.
Returns
str
- path to the table folder containing partitioned files