Skip to main content
Fireconduit

Backfill Jobs

Running and monitoring backfill jobs

A backfill job is a single execution of a pipeline that copies Firestore documents to BigQuery. This guide covers how to run jobs, monitor progress, and troubleshoot issues.

Running a Backfill

Manual Backfill

To run a backfill manually:

  1. Navigate to your pipeline’s detail page
  2. Click Run Backfill
  3. Confirm to start the job

The job will be queued and start processing within a few moments.

Scheduled Backfills

Pipelines configured with scheduled triggers automatically run backfills at the specified intervals. Scheduled jobs appear in the Jobs list like manual jobs.

Monitoring Jobs

Job List

Navigate to Jobs in the dashboard to see all backfill jobs across your pipelines. Each job shows:

  • Pipeline name
  • Status
  • Start time
  • Duration (if completed)
  • Documents processed

Job Detail

Click on a job to see detailed information:

  • Dataflow Job ID: The Google Cloud Dataflow job identifier
  • Status: Current job state
  • Timing: When the job started and completed
  • Metrics: Documents and bytes processed
  • Errors: Any error messages if the job failed

You can also link directly to the Google Cloud Console to see full Dataflow job details.

Job Statuses

Jobs progress through several states:

StatusDescription
PendingJob is queued and waiting to start
RunningJob is actively processing documents
SucceededJob completed successfully
FailedJob encountered an error and stopped
CancelledJob was manually cancelled

Pending

A job enters the pending state when it’s first created. Dataflow takes a short time to provision workers and start the job.

Running

The job is actively reading from Firestore and writing to BigQuery. During this phase, you’ll see document counts update as processing continues.

Succeeded

The job completed without errors. All documents from the source collection have been written to the destination table.

Failed

The job encountered an error and could not complete. Check the error message for details. Common causes include:

  • Permission issues with Firestore or BigQuery
  • Schema mismatches between document fields and table columns
  • Quota exceeded in GCP
  • Network or service availability issues

Cancelled

The job was manually stopped before completion. Partial data may have been written to BigQuery.

Cancelling Jobs

To cancel a running job:

  1. Navigate to the job’s detail page
  2. Click Cancel Job
  3. Confirm the cancellation

Cancellation requests are sent to Dataflow and may take a few moments to take effect. The job status will update to “Cancelled” once complete.

Note that cancelling a job does not roll back data already written to BigQuery.

Troubleshooting Failed Jobs

Check the Error Message

The job detail page shows error messages from Dataflow. Common errors include:

Permission Denied

  • Verify Fireconduit has access to your Firebase and GCP projects
  • Check that the service account has the required BigQuery and Firestore permissions

Schema Mismatch

  • If writing to an existing table, ensure your pipeline schema matches the table schema
  • Check that field types are compatible (e.g., don’t map a Firestore string to a BigQuery INTEGER)

Quota Exceeded

  • Check your GCP quotas for Dataflow, BigQuery, and Compute Engine
  • Consider reducing max workers or running during off-peak hours

Collection Not Found

  • Verify the collection path is correct
  • Ensure the Firestore database name is correct

View Dataflow Logs

For detailed troubleshooting, click the link to view the job in Google Cloud Console. Dataflow provides:

  • Worker logs with detailed error traces
  • Resource utilization graphs
  • Step-by-step pipeline execution details

Retry Failed Jobs

To retry a failed job, simply start a new backfill from the pipeline detail page. Each job is independent, so previous failures don’t affect new runs.

Best Practices

Start with Small Collections

When testing a new pipeline, run a backfill on a small collection first to verify everything works correctly before processing large datasets.

Monitor First Runs

Watch your first few backfill jobs closely. Check that:

  • Documents are being processed at expected rates
  • The data in BigQuery looks correct
  • Costs are within expected ranges

Set Up Alerts

Consider setting up GCP monitoring alerts for:

  • Dataflow job failures
  • Unusual resource usage
  • Budget thresholds

Avoid Concurrent Jobs

Running multiple backfill jobs on the same pipeline simultaneously can cause issues. Wait for one job to complete before starting another.