Module pipelines.rj_smtr.veiculo.tasks
Tasks for veiculos
Functions
def get_raw_ftp(ftp_path: str, filetype: str, csv_args: dict, timestamp: datetime.datetime)
-
Retrieves raw data from an FTP server.
Args
ftp_path
:str
- The path to the file on the FTP server.
filetype
:str
- The file extension of the raw data file.
csv_args
:dict
- Additional arguments to be passed to the
pd.read_csv
function. timestamp
:datetime
- The timestamp used to construct the file name.
Returns
dict
- A dictionary containing the retrieved data and any error messages. The 'data' key holds the retrieved data as a list of dictionaries. The 'error' key holds any error message encountered during the retrieval process.
def get_veiculo_raw_storage(dataset_id: str, table_id: str, timestamp: datetime.datetime, csv_args: dict) ‑> dict
-
Get data from daily manually extracted files received by email
Args
dataset_id
:str
- dataset_id on BigQuery
table_id
:str
- table_id on BigQuery
timestamp
:datetime
- file extraction date
csv_args
:dict
- Arguments for read_csv
def pre_treatment_sppo_infracao(status: dict, timestamp: datetime.datetime)
-
Basic data treatment for violation data. Apply filtering to raw data.
Args
status_dict
:dict
- dict containing the status of the request made.
- Must contain keys: data, timestamp and error
timestamp
:datetime
- timestamp of the data capture
Returns
dict
- dict containing the data treated and the current error status.
def pre_treatment_sppo_licenciamento(status: dict, timestamp: datetime.datetime)
-
Basic data treatment for vehicle data. Apply filtering to raw data.
Args
status_dict
:dict
- dict containing the status of the request made.
- Must contain keys: data, timestamp and error
timestamp
:datetime
- timestamp of the data capture
Returns
dict
- dict containing the data treated and the current error status.