API Reference
This package has two modules, detailed below.
Connectors
1. GDC Endpoint Connectors
2. GDC Filters
3. Google Cloud Connector
- class src.Connectors.gcp_bigquery_utils.BigQueryUtils(project_id)
Utility class for interacting with Google BigQuery.
- table_exists(table_ref)
Example:
bq_utils = BigQueryUtils("my-project-id") table_ref = "my-project.my_dataset.my_table" exists = bq_utils.table_exists(table_ref) print(f"Table exists: {exists}")
- dataset_exists(dataset_id)
Example:
bq_utils = BigQueryUtils("my-project-id") dataset_id = "my-project.my_dataset" exists = bq_utils.dataset_exists(dataset_id) print(f"Dataset exists: {exists}")
- upload_df_to_bq(table_id, df)
Example:
import pandas as pd bq_utils = BigQueryUtils("my-project-id") df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) table_id = "my-project.my_dataset.my_table" job = bq_utils.upload_df_to_bq(table_id, df) job.result() # Wait for the job to complete
- create_bigquery_table_with_schema(table_id, schema, partition_field=None, clustering_fields=None)
Example:
from google.cloud import bigquery bq_utils = BigQueryUtils("my-project-id") table_id = "my-project.my_dataset.my_table" schema = [ bigquery.SchemaField("name", "STRING"), bigquery.SchemaField("age", "INTEGER"), ] table = bq_utils.create_bigquery_table_with_schema(table_id, schema)
- df_to_json(df, file_path='data.json')
Example:
import pandas as pd bq_utils = BigQueryUtils("my-project-id") df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) bq_utils.df_to_json(df, "output.json")
- load_json_data(json_object, schema, table_id)
Example:
from google.cloud import bigquery bq_utils = BigQueryUtils("my-project-id") json_object = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}] schema = [ bigquery.SchemaField("name", "STRING"), bigquery.SchemaField("age", "INTEGER"), ] table_id = "my-project.my_dataset.my_table" job = bq_utils.load_json_data(json_object, schema, table_id) job.result() # Wait for the job to complete
- run_query(query)
Example:
bq_utils = BigQueryUtils("my-project-id") query = "SELECT * FROM `my-project.my_dataset.my_table` LIMIT 10" df = bq_utils.run_query(query) print(df.head())