Boosting Secondary Results
A boosting invocation produces also the following secondary results:
datasetMaskreport: This report is an additional result available in csv format and provides a summary of the actions taken on each field in the dataset. It indicates whether a cell was cleaned, filled, or enriched;columnReportreport: This report is an additional result available in json format and provides detailed statistics on each column, including counts and percentages of filled, cleaned, and enriched data;overallReportreport: This report is an additional result available in json format and provides a summary of the overall data processing results, including counts and percentages of filled, cleaned, and enriched data;insights: These represent the insights derived from the data after boosting. Their format matches the output of aninsightsinvocation (see Insights), which computes insights on the collection’s data before any booster is applied.
Dataset Mask Report
The datasetMask report is a csv file that provides a summary of the actions taken on each field in the dataset. It indicates whether a cell was cleaned, filled, or enriched.
It is useful to understand the quality of the data and the actions taken by the boosting on each cell.
It is structured as the boosting output, with the same columns, but with the values indicating the action taken on each cell.
The possible values are:
cleaned: the cell was cleaned;filled: the cell was filled;enriched: the cell was enriched;- empty: the cell was not cleaned, filled, or enriched.
Column Report
The columnReport report is a json file that provides detailed statistics on each column, including counts and percentages of filled, cleaned, and enriched data.
It is useful to understand the quality of the data and the actions taken by the boosting on each column.
It is structured as a json object, where each key is a column name and the value is an object with the following keys:
filled: an object with the keyscount,percentage, andtotal, indicating the number of cells filled, the percentage of cells filled (expressed as a percentage), and the total number of cells in the column that were empty;cleaned: an object with the keyscount,percentage, andtotal, indicating the number of cells cleaned, the percentage of cells cleaned (expressed as a percentage), and the total number of cells in the column that were originally filled-in;enriched: an object with the keyscount,percentage, andtotal, indicating the number of cells enriched, the percentage of cells enriched (expressed as a percentage), and the total number of cells in the column that were added.
Overall Report
The overallReport report is a json file that provides a summary of the overall data processing results, including counts and percentages of filled, cleaned, and enriched data.
It is useful to understand the quality of the data and the actions taken by the boosting on the dataset.
It is structured as a json object with the following keys:
filled: an object with the keyscount,percentage, andtotal, indicating the number of cells filled, the percentage of cells filled (expressed as a percentage), and the total number of cells in the dataset that were empty;cleaned: an object with the keyscount,percentage, andtotal, indicating the number of cells cleaned, the percentage of cells cleaned (expressed as a percentage), and the total number of cells in the dataset that were originally filled-in;enriched: an object with the keyscount,percentage, andtotal, indicating the number of cells enriched, the percentage of cells enriched (expressed as a percentage), and the total number of cells in the dataset that were added.
Insights
The insights result is a json file that provides a global overview of the dataset, focusing on its structure, completeness, and distribution of values.
It includes counts of contacts, attributes, and cells, as well as statistics on filled cells both overall and per attribute.
In addition, it contains segmentations of categorical attributes, such as Is human, Gender, Preferred Language, and Business Language, reporting both counts and percentages for each value.
It is useful to quickly assess the overall quality of the dataset and identify potential gaps or imbalances in the data.
It is structured as a json object with the following keys:
contacts: an integer indicating the number of contacts in the dataset;attributes: an integer indicating the number of attributes in the dataset;cells: an integer indicating the total number of cells in the dataset;filledCells: an object with the keyscountandpercentage, indicating the number and percentage of filled cells across the dataset;filledCellsPerAttribute: an object where each key is an attribute name and the value is an object with the keyscountandpercentage, showing the number and percentage of filled cells for that attribute;categoricalAttributesSegmentation: an object containing breakdowns of selected categorical attributes (if present). Each supported attribute (e.g.,Is human,Gender,Preferred Language,Business Language) is represented as a nested object where each possible attribute value has its owncountandpercentage.
Interpreting the secondary results
The results of a boosting request will provide you with the reports for understanding which data where cleaned, filled and enriched.
Example
You request a boosting for the following contacts:
ID,Name,Surname,Gender,Country,Preferred Language,Email,Company - Name
0,Silvia,Marri,female,it,it,silvi.marri@snrt.co.eu,SN RTek
1,,,female,it,it,toninal@nicojd.com,NicoJds
The output (primary result), in csv format, will be:
index,ID,Country,Email,Company - Name,Birthday,Age,Income by Educational Level,Income by Degree Of Urbanisation,Company - Site,Income by Household Type,Generation,Phone Number,Income by Birth Country,Income by Age and Gender,Minimum family size,Maximum family size,Company - Sectors,Name,Surname,Is human,Preferred Language,Business Language,Fiscal Code,Gender
0,0,it,silvi.marri@snrt.co.eu,SN RTek,,,,,,,,,,,,,,Marri,Silvi,True,it,,,male
1,1,it,toninal@nicojd.com,NicoJds,,,,,,,,,,,,,,Toni,Al,True,en,,,male
The datasetMask report, in csv format, will be:
index,ID,Country,Email,Company - Name,Birthday,Age,Income by Educational Level,Income by Degree Of Urbanisation,Company - Site,Income by Household Type,Generation,Phone Number,Income by Birth Country,Income by Age and Gender,Minimum family size,Maximum family size,Company - Sectors,Name,Surname,Is human,Preferred Language,Business Language,Fiscal Code,Gender
enriched,0,,,,,,,,,,,,,,,,,cleaned,cleaned,enriched,,,,cleaned
enriched,1,,,,,,,,,,,,,,,,,filled,filled,enriched,cleaned,,,cleaned
The columnReport report, in json format, will be:
{
"index": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 2,
"percentage": 100,
"total": 2
}
},
"Country": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": 0,
"total": 2
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Email": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": 0,
"total": 2
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Company - Name": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": 0,
"total": 2
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Birthday": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Age": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Income by Educational Level": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Income by Degree Of Urbanisation": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Company - Site": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Income by Household Type": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Generation": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Phone Number": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Income by Birth Country": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Income by Age and Gender": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Minimum family size": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Maximum family size": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Company - Sectors": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Name": {
"filled": {
"count": 1,
"percentage": 100,
"total": 1
},
"cleaned": {
"count": 1,
"percentage": 100,
"total": 1
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Surname": {
"filled": {
"count": 1,
"percentage": 100,
"total": 1
},
"cleaned": {
"count": 1,
"percentage": 100,
"total": 1
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Is human": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 2,
"percentage": 100,
"total": 2
}
},
"Preferred Language": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 1,
"percentage": 50,
"total": 2
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
},
"Business Language": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Fiscal Code": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 0,
"percentage": null,
"total": 0
},
"enriched": {
"count": 0,
"percentage": 0,
"total": 2
}
},
"Gender": {
"filled": {
"count": 0,
"percentage": null,
"total": 0
},
"cleaned": {
"count": 2,
"percentage": 100,
"total": 2
},
"enriched": {
"count": 0,
"percentage": null,
"total": 0
}
}
}
The overallReport report, in json format, will be:
{
"filled": {
"count": 2,
"percentage": 100,
"total": 2
},
"cleaned": {
"count": 5,
"percentage": 42,
"total": 12
},
"enriched": {
"count": 4,
"percentage": 12,
"total": 34
}
}
The insights report, in json format, will be:
{
"contacts": 2,
"attributes": 25,
"cells": 50,
"filledCells": {
"count": 20,
"percentage": 40.0
},
"filledCellsPerAttribute": {
"index": {
"count": 2,
"percentage": 100.0
},
"ID": {
"count": 2,
"percentage": 100.0
},
"Country": {
"count": 2,
"percentage": 100.0
},
"Email": {
"count": 2,
"percentage": 100.0
},
"Company - Name": {
"count": 2,
"percentage": 100.0
},
"Birthday": {
"count": 0,
"percentage": 0.0
},
"Age": {
"count": 0,
"percentage": 0.0
},
"Income by Educational Level": {
"count": 0,
"percentage": 0.0
},
"Income by Degree Of Urbanisation": {
"count": 0,
"percentage": 0.0
},
"Company - Site": {
"count": 0,
"percentage": 0.0
},
"Income by Household Type": {
"count": 0,
"percentage": 0.0
},
"Generation": {
"count": 0,
"percentage": 0.0
},
"Phone Number": {
"count": 0,
"percentage": 0.0
},
"Income by Birth Country": {
"count": 0,
"percentage": 0.0
},
"Income by Age and Gender": {
"count": 0,
"percentage": 0.0
},
"Minimum family size": {
"count": 0,
"percentage": 0.0
},
"Maximum family size": {
"count": 0,
"percentage": 0.0
},
"Company - Sectors": {
"count": 0,
"percentage": 0.0
},
"Name": {
"count": 2,
"percentage": 100.0
},
"Surname": {
"count": 2,
"percentage": 100.0
},
"Is human": {
"count": 2,
"percentage": 100.0
},
"Preferred Language": {
"count": 2,
"percentage": 100.0
},
"Business Language": {
"count": 0,
"percentage": 0.0
},
"Fiscal Code": {
"count": 0,
"percentage": 0.0
},
"Gender": {
"count": 2,
"percentage": 100.0
}
},
"categoricalAttributesSegmentation": {
"Is human": {
"True": {
"count": 2,
"percentage": 100.0
}
},
"Gender": {
"male": {
"count": 2,
"percentage": 100.0
}
},
"Preferred Language": {
"it": {
"count": 1,
"percentage": 50.0
},
"en": {
"count": 1,
"percentage": 50.0
}
},
"Business Language": {
"": {
"count": 2,
"percentage": 100.0
}
}
}
}