Deduplication
Overview
Deduplication is the process of identifying duplicate contacts within your dataset.
Datumo detects duplicates and returns them in structured groups. You are responsible for reviewing these duplicates and deciding how to handle them.
The key fields considered in this process include:
NameSurnameEmailFiscal Code
Deduplication Parameter
Invocations of type deduplication can optionally accept an additional argument, exclude_companies, which excludes company-related emails (e.g., info.company@company.com) from the deduplication comparison.
The default value is True.
Requesting a Deduplication
To request a deduplication, you must have a collection containing contact data on Datumo. You can create a collection and upload your data by following the instructions in the Collection section.
Once your contact data is uploaded, initiate a deduplication request by sending a POST request to the invocation endpoint with deduplication as the invocationType.
For more details, refer to Invoke Datumo.
Interpreting Deduplication Results
The deduplication results group duplicate contacts together. The output is available in multiple formats and includes the following fields:
| Column Name | Description | Format | Nullable | Example |
|---|---|---|---|---|
ID | Unique identifier of the contact. | String | False | 1 |
Group | The group to which the contact belongs. | String | False | G123 |
Understanding Groups
- Contacts identified as duplicates are assigned the same
Group. - Contacts without duplicates are placed in individual groups.
Example
Input Data
A deduplication request is made for the following contacts:
ID,Name,Surname,Gender,Email,Company - Name
0,Silvia,Marri,female,silvia.marri@snrt.co.eu,SN RTek
1,,Toninal,male,toninal@nicojd.com,NicoJds
2,Marri,Silvi,female,silvia.marr@snrt.co.eu,SN RTek
Output Data
The deduplication result in CSV format:
ID,Group
0,G0
1,G1
2,G0
In this example:
- Contacts
0and2are considered duplicates and assigned toG0. - Contact
1has no duplicates and is assigned toG1.