Skip to main content

Deduplication

Overview

Deduplication is the process of identifying duplicate contacts within your dataset.

Datumo detects duplicates and returns them in structured groups. You are responsible for reviewing these duplicates and deciding how to handle them.

The key fields considered in this process include:

  • Name
  • Surname
  • Email
  • Fiscal Code

Deduplication Parameter

Invocations of type deduplication can optionally accept an additional argument, exclude_companies, which excludes company-related emails (e.g., info.company@company.com) from the deduplication comparison.

The default value is True.

Requesting a Deduplication

To request a deduplication, you must have a collection containing contact data on Datumo. You can create a collection and upload your data by following the instructions in the Collection section.

Once your contact data is uploaded, initiate a deduplication request by sending a POST request to the invocation endpoint with deduplication as the invocationType.

For more details, refer to Invoke Datumo.

Interpreting Deduplication Results

The deduplication results group duplicate contacts together. The output is available in multiple formats and includes the following fields:

Column NameDescriptionFormatNullableExample
IDUnique identifier of the contact.StringFalse1
GroupThe group to which the contact belongs.StringFalseG123

Understanding Groups

  • Contacts identified as duplicates are assigned the same Group.
  • Contacts without duplicates are placed in individual groups.

Example

Input Data

A deduplication request is made for the following contacts:

ID,Name,Surname,Gender,Email,Company - Name
0,Silvia,Marri,female,silvia.marri@snrt.co.eu,SN RTek
1,,Toninal,male,toninal@nicojd.com,NicoJds
2,Marri,Silvi,female,silvia.marr@snrt.co.eu,SN RTek

Output Data

The deduplication result in CSV format:

ID,Group
0,G0
1,G1
2,G0

In this example:

  • Contacts 0 and 2 are considered duplicates and assigned to G0.
  • Contact 1 has no duplicates and is assigned to G1.