Pipelines
Creating and managing data pipelines
Pipelines are the core concept in Fireconduit. A pipeline defines how data flows from a Firestore collection to a BigQuery table, including the source, destination, schema mapping, and execution settings.
What is a Pipeline?
A pipeline is a configured connection between:
- Source: A Firestore collection in your Firebase project
- Destination: A BigQuery table in your GCP project
- Schema: How document fields map to table columns
- Trigger: When and how backfills run
Once created, a pipeline can be used to run backfill jobs that copy your Firestore data to BigQuery.
Creating a Pipeline
Navigate to Pipelines in the dashboard and click New Pipeline to open the pipeline wizard.
Step 1: Basics
- Name: A descriptive name for your pipeline (e.g., “Users to Analytics”)
- Description: Optional notes about the pipeline’s purpose
Step 2: Source & Destination
Configure where data comes from and where it goes:
Source (Firestore)
- Firebase Project: Select from your linked Firebase projects
- Database: Usually “(default)” unless you use named databases
- Collection: The Firestore collection path (e.g.,
users,orders/2024/items)
Destination (BigQuery)
- GCP Project: Select from your linked GCP projects
- Dataset: The BigQuery dataset name
- Table: The table name (created automatically if it doesn’t exist)
Step 3: Schema
Define how Firestore documents become BigQuery rows. You can:
- Start from a template (Firebase Extension or Simple)
- Add metadata columns for document ID, path, and timestamps
- Map specific document fields to typed BigQuery columns
See Schema Mapping for detailed options.
Step 4: Trigger
Choose when backfills run:
- Manual: Run on-demand from the dashboard
- Scheduled: Run automatically at set intervals (hourly, daily, weekly)
Step 5: Advanced
Configure Dataflow execution settings:
- Region: Where Dataflow jobs run
- Machine Type: Worker instance size
- Max Workers: Maximum parallel workers
See Configuration for details.
Managing Pipelines
Viewing Pipelines
The Pipelines page shows all pipelines in your organization with:
- Pipeline name and description
- Source collection and destination table
- Active/inactive status
- Last job status and timing
Editing Pipelines
Click on a pipeline to open its detail page, then click Edit to modify settings. You can change any configuration except the source and destination (create a new pipeline for different collections or tables).
Activating and Deactivating
Pipelines can be activated or deactivated:
- Active: The pipeline can run backfill jobs
- Inactive: The pipeline is paused—no new jobs can be started
Use deactivation when you want to temporarily stop a pipeline without deleting its configuration.
Deleting Pipelines
Delete a pipeline from its detail page. This removes the pipeline configuration but does not affect:
- Data already written to BigQuery
- The BigQuery table itself
- Job history (retained for audit purposes)
Pipeline Best Practices
Naming Conventions
Use clear, consistent names that identify:
- The source collection
- The purpose or destination
- Any relevant environment (prod, staging)
Examples: “Users - Production”, “Orders Backfill”, “Analytics Events”
One Collection Per Pipeline
Create separate pipelines for different collections rather than trying to combine data. This keeps configurations simple and makes troubleshooting easier.
Test Before Production
Create a test pipeline pointing to a staging BigQuery dataset first. Run a small backfill to verify your schema mapping works correctly before processing production data.
Document Your Pipelines
Use the description field to record:
- Why this pipeline exists
- Any special schema considerations
- Who owns or maintains it