Module pipelines.rj_smtr.veiculo.tasks

Tasks for veiculos

Functions

def get_raw_ftp(ftp_path: str, filetype: str, csv_args: dict, timestamp: datetime.datetime)

Retrieves raw data from an FTP server.

Args

ftp_path : str
The path to the file on the FTP server.
filetype : str
The file extension of the raw data file.
csv_args : dict
Additional arguments to be passed to the pd.read_csv function.
timestamp : datetime
The timestamp used to construct the file name.

Returns

dict
A dictionary containing the retrieved data and any error messages. The 'data' key holds the retrieved data as a list of dictionaries. The 'error' key holds any error message encountered during the retrieval process.
def get_veiculo_raw_storage(dataset_id: str, table_id: str, timestamp: datetime.datetime, csv_args: dict) ‑> dict

Get data from daily manually extracted files received by email

Args

dataset_id : str
dataset_id on BigQuery
table_id : str
table_id on BigQuery
timestamp : datetime
file extraction date
csv_args : dict
Arguments for read_csv
def pre_treatment_sppo_infracao(status: dict, timestamp: datetime.datetime)

Basic data treatment for violation data. Apply filtering to raw data.

Args

status_dict : dict
dict containing the status of the request made.
Must contain keys: data, timestamp and error
timestamp : datetime
timestamp of the data capture

Returns

dict
dict containing the data treated and the current error status.
def pre_treatment_sppo_licenciamento(status: dict, timestamp: datetime.datetime)

Basic data treatment for vehicle data. Apply filtering to raw data.

Args

status_dict : dict
dict containing the status of the request made.
Must contain keys: data, timestamp and error
timestamp : datetime
timestamp of the data capture

Returns

dict
dict containing the data treated and the current error status.