Configuration
Dataflow settings, triggers, and advanced options
This guide covers the configuration options available when creating or editing a pipeline.
Trigger Options
Fireconduit supports two trigger types that determine when backfill jobs run.
Manual Trigger
With manual triggering, backfill jobs only run when you explicitly start them from the dashboard. This is ideal for:
- One-time data migrations
- On-demand analytics refreshes
- Testing new pipeline configurations
Scheduled Trigger
Scheduled triggers automatically run backfills at regular intervals. When configuring a schedule, you set:
- Start Time: When the first scheduled run should occur
- Interval Value: The number of time units between runs
- Interval Unit: Hours, days, or weeks
For example, you might schedule a backfill to run every 24 hours starting at midnight UTC.
Scheduled backfills are useful for:
- Keeping BigQuery data up-to-date with Firestore
- Regular analytics data refreshes
- Nightly batch processing workflows
Dataflow Settings
Fireconduit uses Google Cloud Dataflow to execute backfill jobs. You can configure Dataflow settings to optimize performance and cost.
Region
The GCP region where Dataflow jobs run. Choose a region close to your Firestore and BigQuery resources to minimize latency. Common choices include:
us-central1us-east1europe-west1asia-northeast1
Worker Machine Type
The Compute Engine machine type for Dataflow workers. This affects processing speed and cost:
| Machine Type | vCPUs | Memory | Use Case |
|---|---|---|---|
n1-standard-1 | 1 | 3.75 GB | Small collections, cost-sensitive |
n1-standard-2 | 2 | 7.5 GB | Medium collections, balanced |
n1-standard-4 | 4 | 15 GB | Large collections, faster processing |
The default n1-standard-1 works well for most use cases.
Maximum Workers
The maximum number of worker instances Dataflow can scale up to. More workers mean faster processing but higher cost:
- 1-3 workers: Good for small to medium collections (< 1M documents)
- 5-10 workers: Better for large collections (1M+ documents)
- 10+ workers: For very large collections requiring fast processing
Dataflow autoscales based on workload, so setting a higher max doesn’t guarantee higher costs—it just allows the system to scale up when needed.
Best Practices
Start Small
When setting up a new pipeline, start with conservative Dataflow settings (1 max worker, n1-standard-1). Run a test backfill and monitor performance before scaling up.
Match Regions
Keep your Firestore database, BigQuery dataset, and Dataflow region in the same geographic area to minimize data transfer costs and latency.
Consider Schedule Timing
If using scheduled triggers, consider when your data is most stable. Running backfills during off-peak hours can reduce impact on production workloads.
Monitor Costs
Dataflow charges are based on worker time. For large collections, monitor your first few backfills to understand typical costs before committing to frequent schedules.