Module pipelines.rj_cor.comando.eventos.tasks

Tasks for comando

Functions

def download_data_atividades(first_date, last_date, wait=None) ‑> pandas.core.frame.DataFrame

Download data from API

def download_data_ocorrencias(first_date, last_date, wait=None) ‑> pandas.core.frame.DataFrame

Download data from API

def get_date_interval(first_date, last_date) ‑> Tuple[dict, str]

If first_date and last_date are provided, format it to DD/MM/YYYY. Else, get data from last 3 days. first_date: str YYYY-MM-DD last_date: str YYYY-MM-DD

def get_redis_df(dataset_id: str, table_id: str, name: str, mode: str = 'prod') ‑> pandas.core.frame.DataFrame

Acess redis to get the last saved df and compare to actual df, return only the rows from actual df that are not already saved.

def get_redis_max_date(dataset_id: str, table_id: str, name: str = None, mode: str = 'prod') ‑> str

Acess redis to get the last saved date and compare to actual df.

def not_none(something: Any) ‑> bool

Returns True if something is not None.

def save_data(dataframe: pandas.core.frame.DataFrame) ‑> Union[str, pathlib.Path]

Save data on a csv file to be uploaded to GCP

def save_no_partition(dataframe: pandas.core.frame.DataFrame, append: bool = False) ‑> str

Saves a dataframe to a temporary directory and returns the path to the directory.

def save_redis_max_date(dataset_id: str, table_id: str, name: str = None, mode: str = 'prod', redis_max_date: str = None, wait=None)

Acess redis to save last date.

def treat_data_atividades(dfr: pandas.core.frame.DataFrame, dfr_redis: pandas.core.frame.DataFrame, columns: list) ‑> Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Normalize data to be similiar to old API.

def treat_data_ocorrencias(dfr: pandas.core.frame.DataFrame, redis_max_date: str) ‑> Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Rename cols and normalize data.