Module pipelines.rj_smtr.br_rj_riodejaneiro_stu.tasks

Tasks for br_rj_riodejaneiro_stu

Functions

def create_final_stu_dataframe(dfs: list[pandas.core.frame.DataFrame]) ‑> tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Join all dataframes according to the document type

Args

dfs : list[pd.DataFrame]
The list of dfs from all stu files

Returns

tuple[pd.DataFrame, pd.DataFrame]
Dataframe for regular persons, dataframe for companies
def get_stu_raw_blobs(data_versao_stu: str) ‑> list[google.cloud.storage.blob.Blob]

Get STU extraction files

Args

data_versao_stu : str
The STU version date in the format YYYY-MM-DD

Returns

list[Blob]
The blob list
def read_stu_raw_file(blob: google.cloud.storage.blob.Blob) ‑> pandas.core.frame.DataFrame

Read an extracted file from STU

Args

blob : Blob
The GCS blob

Returns

pd.DataFrame
data
def save_stu_dataframes(df_pf: pandas.core.frame.DataFrame, df_pj: pandas.core.frame.DataFrame)

Save STU concatenated dataframes into the upload folder

Args

df_pf : pd.DataFrame
Dataframe for regular persons
df_pj : pd.DataFrame
Dataframe for companies