API Reference


This package has two modules, detailed below.

Connectors

1. GDC Endpoint Connectors

2. GDC Filters

3. Google Cloud Connector

class src.Connectors.gcp_bigquery_utils.BigQueryUtils(project_id)

Utility class for interacting with Google BigQuery.

table_exists(table_ref)

Example:

bq_utils = BigQueryUtils("my-project-id")
table_ref = "my-project.my_dataset.my_table"
exists = bq_utils.table_exists(table_ref)
print(f"Table exists: {exists}")
dataset_exists(dataset_id)

Example:

bq_utils = BigQueryUtils("my-project-id")
dataset_id = "my-project.my_dataset"
exists = bq_utils.dataset_exists(dataset_id)
print(f"Dataset exists: {exists}")
upload_df_to_bq(table_id, df)

Example:

import pandas as pd

bq_utils = BigQueryUtils("my-project-id")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
table_id = "my-project.my_dataset.my_table"
job = bq_utils.upload_df_to_bq(table_id, df)
job.result()  # Wait for the job to complete
create_bigquery_table_with_schema(table_id, schema, partition_field=None, clustering_fields=None)

Example:

from google.cloud import bigquery

bq_utils = BigQueryUtils("my-project-id")
table_id = "my-project.my_dataset.my_table"
schema = [
   bigquery.SchemaField("name", "STRING"),
   bigquery.SchemaField("age", "INTEGER"),
]
table = bq_utils.create_bigquery_table_with_schema(table_id, schema)
df_to_json(df, file_path='data.json')

Example:

import pandas as pd

bq_utils = BigQueryUtils("my-project-id")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
bq_utils.df_to_json(df, "output.json")
load_json_data(json_object, schema, table_id)

Example:

from google.cloud import bigquery

bq_utils = BigQueryUtils("my-project-id")
json_object = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
schema = [
   bigquery.SchemaField("name", "STRING"),
   bigquery.SchemaField("age", "INTEGER"),
]
table_id = "my-project.my_dataset.my_table"
job = bq_utils.load_json_data(json_object, schema, table_id)
job.result()  # Wait for the job to complete
run_query(query)

Example:

bq_utils = BigQueryUtils("my-project-id")
query = "SELECT * FROM `my-project.my_dataset.my_table` LIMIT 10"
df = bq_utils.run_query(query)
print(df.head())

Engines

1. Analysis Engine

2. BigQuery Engine

3. GDC Engine