Pipelines | Fireconduit Docs

Pipelines are the core concept in Fireconduit. A pipeline defines how data flows from a Firestore collection to a BigQuery table, including the source, destination, schema mapping, and execution settings.

What is a Pipeline?

A pipeline is a configured connection between:

Source: A Firestore collection in your Firebase project
Destination: A BigQuery table in your GCP project
Schema: How document fields map to table columns
Trigger: When and how backfills run

Once created, a pipeline can be used to run backfill jobs that copy your Firestore data to BigQuery.

Creating a Pipeline

Navigate to Pipelines in the dashboard and click New Pipeline to open the pipeline wizard.

Step 1: Basics

Name: A descriptive name for your pipeline (e.g., “Users to Analytics”)
Description: Optional notes about the pipeline’s purpose

Step 2: Source & Destination

Configure where data comes from and where it goes:

Source (Firestore)

Firebase Project: Select from your linked Firebase projects
Database: Usually “(default)” unless you use named databases
Collection: The Firestore collection path (e.g., users, orders/2024/items)

Destination (BigQuery)

GCP Project: Select from your linked GCP projects
Dataset: The BigQuery dataset name
Table: The table name (created automatically if it doesn’t exist)

Step 3: Schema

Define how Firestore documents become BigQuery rows. You can:

Start from a template (Firebase Extension or Simple)
Add metadata columns for document ID, path, and timestamps
Map specific document fields to typed BigQuery columns

See Schema Mapping for detailed options.

Step 4: Trigger

Choose when backfills run:

Manual: Run on-demand from the dashboard
Scheduled: Run automatically at set intervals (hourly, daily, weekly)

Step 5: Advanced

Configure Dataflow execution settings:

Region: Where Dataflow jobs run
Machine Type: Worker instance size
Max Workers: Maximum parallel workers

See Configuration for details.

Managing Pipelines

Viewing Pipelines

The Pipelines page shows all pipelines in your organization with:

Pipeline name and description
Source collection and destination table
Active/inactive status
Last job status and timing

Editing Pipelines

Click on a pipeline to open its detail page, then click Edit to modify settings. You can change any configuration except the source and destination (create a new pipeline for different collections or tables).

Activating and Deactivating

Pipelines can be activated or deactivated:

Active: The pipeline can run backfill jobs
Inactive: The pipeline is paused—no new jobs can be started

Use deactivation when you want to temporarily stop a pipeline without deleting its configuration.

Deleting Pipelines

Delete a pipeline from its detail page. This removes the pipeline configuration but does not affect:

Data already written to BigQuery
The BigQuery table itself
Job history (retained for audit purposes)

Pipeline Best Practices

Naming Conventions

Use clear, consistent names that identify:

The source collection
The purpose or destination
Any relevant environment (prod, staging)

Examples: “Users - Production”, “Orders Backfill”, “Analytics Events”

One Collection Per Pipeline

Create separate pipelines for different collections rather than trying to combine data. This keeps configurations simple and makes troubleshooting easier.

Test Before Production

Create a test pipeline pointing to a staging BigQuery dataset first. Run a small backfill to verify your schema mapping works correctly before processing production data.

Document Your Pipelines

Use the description field to record:

Why this pipeline exists
Any special schema considerations
Who owns or maintains it