DQE file-based batch processing lets you validate, standardize, enrich, clean, or deduplicate large volumes of data from CSV files exchanged through a secure SFTP server.
This article explains how file-based batch processing works, from file preparation to output file retrieval.
Who should read this article
Use this article if you need to understand how DQE processes data files through a batch workflow.
This applies when your data is processed through a file-based exchange, especially for use cases related to Standalone, Salesforce, Microsoft Dynamics, or dedicated file processing projects.
Note: This article covers file-based batch processing via SFTP. It does not apply to real-time API calls.
What file-based batch processing does
File-based batch processing allows DQE to process a dataset automatically. Depending on your configuration, DQE can apply one or more data quality services to your file.
Typical processing can include:
- postal address standardization;
- deduplication;
- data enrichment;
- move detection;
- data cleaning;
- field standardization;
- specific processing configured for your project.
How the process works
A file-based batch process usually follows these steps:
- You prepare a CSV file using the expected format.
- You upload the file to the dedicated SFTP input folder.
- DQE checks the file format and processes the records according to the agreed configuration.
- DQE generates an output file containing the original data and additional DQE result columns.
- You retrieve the processed file from the SFTP output folder.
Processing flow
CSV file
↓
SFTP /IN
↓
DQE processing chain
↓
Data quality services
(RNVP, phone, email, B2B, deduplication, etc.)
↓
Processed CSV file
↓
SFTP /OUT
Input file
The input file is the CSV file you provide to DQE for processing. Its structure must match the specifications defined for your project.
In general, the file must:
- use CSV format;
- use UTF-8 encoding without signature;
- use a semicolon (
;) as separator; - include a header row;
- include one line per record;
- keep the expected column order;
- include a unique identifier for each record.
The exact required columns depend on the services included in your processing configuration.
File transfer through SFTP
Files are exchanged through a secure SFTP server. Depending on your setup, the SFTP platform can be hosted by DQE or by your organization after validation by DQE.
For a standard DQE-hosted SFTP workflow:
- input files are uploaded to the
/INfolder; - processed files are retrieved from the
/OUTfolder.
Connection details are provided separately during project setup.
Output file
After processing, DQE returns an output CSV file. This file usually contains the original data plus additional DQE columns.
These additional columns help you understand:
- which records were processed successfully;
- which values were standardized or corrected;
- which records require review;
- which error or quality codes were returned;
- which enrichment data was added.
The exact output columns depend on the DQE services enabled for your project.
Manual and automated processing
File-based batch processing can be handled as a one-time process or as an automated recurring process.
With automated processing, the upload and retrieval workflow remains similar:
- You deposit files in the input folder.
- The DQE processing chain retrieves and processes the files automatically.
- DQE deposits the processed files in the output folder.
- You retrieve the processed files.
Best practices
- Always keep the original file until the processed file has been validated.
- Use a unique identifier for each record so that results can be matched with the source data.
- Do not change the expected column names or column order without validating the change with DQE.
- Check rejected files and logs when a file is not processed as expected.
- Review DQE output columns before updating production data.
Next steps
- Understand DQE output columns
- Prepare your input file for file-based batch processing
- Upload and retrieve batch files via SFTP
- Understand automated batch processing folders