Skip to main content
Fireconduit

Pipelines

Creating and managing data pipelines

Pipelines are the core concept in Fireconduit. A pipeline defines how data flows from a Firestore collection to a BigQuery table, including the source, destination, schema mapping, and execution settings.

What is a Pipeline?

A pipeline is a configured connection between:

  • Source: A Firestore collection in your Firebase project
  • Destination: A BigQuery table in your GCP project
  • Schema: How document fields map to table columns
  • Trigger: When and how backfills run

Once created, a pipeline can be used to run backfill jobs that copy your Firestore data to BigQuery.

Creating a Pipeline

Navigate to Pipelines in the dashboard and click New Pipeline to open the pipeline wizard.

Step 1: Basics

  • Name: A descriptive name for your pipeline (e.g., “Users to Analytics”)
  • Description: Optional notes about the pipeline’s purpose

Step 2: Source & Destination

Configure where data comes from and where it goes:

Source (Firestore)

  • Firebase Project: Select from your linked Firebase projects
  • Database: Usually “(default)” unless you use named databases
  • Collection: The Firestore collection path (e.g., users, orders/2024/items)

Destination (BigQuery)

  • GCP Project: Select from your linked GCP projects
  • Dataset: The BigQuery dataset name
  • Table: The table name (created automatically if it doesn’t exist)

Step 3: Schema

Define how Firestore documents become BigQuery rows. You can:

  • Start from a template (Firebase Extension or Simple)
  • Add metadata columns for document ID, path, and timestamps
  • Map specific document fields to typed BigQuery columns

See Schema Mapping for detailed options.

Step 4: Trigger

Choose when backfills run:

  • Manual: Run on-demand from the dashboard
  • Scheduled: Run automatically at set intervals (hourly, daily, weekly)

Step 5: Advanced

Configure Dataflow execution settings:

  • Region: Where Dataflow jobs run
  • Machine Type: Worker instance size
  • Max Workers: Maximum parallel workers

See Configuration for details.

Managing Pipelines

Viewing Pipelines

The Pipelines page shows all pipelines in your organization with:

  • Pipeline name and description
  • Source collection and destination table
  • Active/inactive status
  • Last job status and timing

Editing Pipelines

Click on a pipeline to open its detail page, then click Edit to modify settings. You can change any configuration except the source and destination (create a new pipeline for different collections or tables).

Activating and Deactivating

Pipelines can be activated or deactivated:

  • Active: The pipeline can run backfill jobs
  • Inactive: The pipeline is paused—no new jobs can be started

Use deactivation when you want to temporarily stop a pipeline without deleting its configuration.

Deleting Pipelines

Delete a pipeline from its detail page. This removes the pipeline configuration but does not affect:

  • Data already written to BigQuery
  • The BigQuery table itself
  • Job history (retained for audit purposes)

Pipeline Best Practices

Naming Conventions

Use clear, consistent names that identify:

  • The source collection
  • The purpose or destination
  • Any relevant environment (prod, staging)

Examples: “Users - Production”, “Orders Backfill”, “Analytics Events”

One Collection Per Pipeline

Create separate pipelines for different collections rather than trying to combine data. This keeps configurations simple and makes troubleshooting easier.

Test Before Production

Create a test pipeline pointing to a staging BigQuery dataset first. Run a small backfill to verify your schema mapping works correctly before processing production data.

Document Your Pipelines

Use the description field to record:

  • Why this pipeline exists
  • Any special schema considerations
  • Who owns or maintains it