Skip to main content
Fireconduit

Configuration

Dataflow settings, triggers, and advanced options

This guide covers the configuration options available when creating or editing a pipeline.

Trigger Options

Fireconduit supports two trigger types that determine when backfill jobs run.

Manual Trigger

With manual triggering, backfill jobs only run when you explicitly start them from the dashboard. This is ideal for:

  • One-time data migrations
  • On-demand analytics refreshes
  • Testing new pipeline configurations

Scheduled Trigger

Scheduled triggers automatically run backfills at regular intervals. When configuring a schedule, you set:

  • Start Time: When the first scheduled run should occur
  • Interval Value: The number of time units between runs
  • Interval Unit: Hours, days, or weeks

For example, you might schedule a backfill to run every 24 hours starting at midnight UTC.

Scheduled backfills are useful for:

  • Keeping BigQuery data up-to-date with Firestore
  • Regular analytics data refreshes
  • Nightly batch processing workflows

Dataflow Settings

Fireconduit uses Google Cloud Dataflow to execute backfill jobs. You can configure Dataflow settings to optimize performance and cost.

Region

The GCP region where Dataflow jobs run. Choose a region close to your Firestore and BigQuery resources to minimize latency. Common choices include:

  • us-central1
  • us-east1
  • europe-west1
  • asia-northeast1

Worker Machine Type

The Compute Engine machine type for Dataflow workers. This affects processing speed and cost:

Machine TypevCPUsMemoryUse Case
n1-standard-113.75 GBSmall collections, cost-sensitive
n1-standard-227.5 GBMedium collections, balanced
n1-standard-4415 GBLarge collections, faster processing

The default n1-standard-1 works well for most use cases.

Maximum Workers

The maximum number of worker instances Dataflow can scale up to. More workers mean faster processing but higher cost:

  • 1-3 workers: Good for small to medium collections (< 1M documents)
  • 5-10 workers: Better for large collections (1M+ documents)
  • 10+ workers: For very large collections requiring fast processing

Dataflow autoscales based on workload, so setting a higher max doesn’t guarantee higher costs—it just allows the system to scale up when needed.

Best Practices

Start Small

When setting up a new pipeline, start with conservative Dataflow settings (1 max worker, n1-standard-1). Run a test backfill and monitor performance before scaling up.

Match Regions

Keep your Firestore database, BigQuery dataset, and Dataflow region in the same geographic area to minimize data transfer costs and latency.

Consider Schedule Timing

If using scheduled triggers, consider when your data is most stable. Running backfills during off-peak hours can reduce impact on production workloads.

Monitor Costs

Dataflow charges are based on worker time. For large collections, monitor your first few backfills to understand typical costs before committing to frequent schedules.