Module pipelines.rj_smtr.br_rj_riodejaneiro_stu.tasks
Tasks for br_rj_riodejaneiro_stu
Functions
def create_final_stu_dataframe(dfs: list[pandas.core.frame.DataFrame]) ‑> tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]
-
Join all dataframes according to the document type
Args
dfs
:list[pd.DataFrame]
- The list of dfs from all stu files
Returns
tuple[pd.DataFrame, pd.DataFrame]
- Dataframe for regular persons, dataframe for companies
def get_stu_raw_blobs(data_versao_stu: str) ‑> list[google.cloud.storage.blob.Blob]
-
Get STU extraction files
Args
data_versao_stu
:str
- The STU version date in the format YYYY-MM-DD
Returns
list[Blob]
- The blob list
def read_stu_raw_file(blob: google.cloud.storage.blob.Blob) ‑> pandas.core.frame.DataFrame
-
Read an extracted file from STU
Args
blob
:Blob
- The GCS blob
Returns
pd.DataFrame
- data
def save_stu_dataframes(df_pf: pandas.core.frame.DataFrame, df_pj: pandas.core.frame.DataFrame)
-
Save STU concatenated dataframes into the upload folder
Args
df_pf
:pd.DataFrame
- Dataframe for regular persons
df_pj
:pd.DataFrame
- Dataframe for companies