Data Schemas

Churn Prediction leverages historical data to forecast the likelihood of customers leaving a service. In this section, we provide the schema for the data required.

The data types required for Churn Prediction are:

subscription: data related to the subscription of the customer.
invoice: data related to the invoices issued to the customer.
subscription_item: data related to the items included in the subscription.
usage: data related to the usage of the services or products included in the subscription.
ticket: data related to the interactions between the customer and the support service.
interaction: data related to the interactions between the customer and the customer care service.

Guidelines

To ensure Churn Prediction performs optimally, please adhere to the following guidelines:

Keep Data Up-to-Date: Regularly update the data in your collection to provide the most accurate predictions. Stale data can lead to misleading results.
Include All Data Types: Ensure your collection contains data for all the required types listed above. Missing data types can hinder the effectiveness of the prediction models.
Historical Data Requirements: Each data type should include a substantial amount of historical data. The required length of this history depends on the billing frequency. As an example:
- For a 1-month billing frequency, maintain at least 2 years of historical data.
- For a 1-year billing frequency, retain at least 5-10 years of historical data.
Subscription Cancellations: Subscription data must include records of cancellations to enable the models to learn effectively from past behaviors.
Churn Prediction Requests: When requesting a Churn Prediction, all non-cancelled subscriptions in your subscription data will be returned along with their churn probabilities. Ensure you have at least one non-cancelled subscription available in your collection before making a prediction request.
Minimum Schema for Data Types: Some data types have a minimum schema; however, to achieve more accurate predictions, you are encouraged to upload additional columns.

Subscription data

In the following table, you can find the schema for the subscription data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.

Column name	Description	Format details	Nullable	Example
id	Identifier of the subscription.	String	False	1
activation_datetime	Datetime of subscription first activation.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2018-02-01T00:00:00
cancellation_datetime	Datetime of subscription cancellation.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	True	2023-06-01T00:00:00
is_cancelled	Boolean indicating whether the subscription has been cancelled or not.	Bool	False	False
billing_period_unit	String indicating the type of billing period unit. Can be one of: DAY, WEEK, MONTH, YEAR.	String	False	MONTH
billing_period	Number of `billing_period_unit` forming a billing period.	Positive integer number	False	1
customer_id	Identifier of the customer.	String	False	45
customer_type	Type of the customer. Can be one of: COMPANY, PRIVATE.	String	False	PRIVATE
customer_gender	Gender of the customer. Can be one of: MALE, FEMALE.	String	True	FEMALE
customer_birth_date	Date of birth of the customer.	yyyy-mm-dd	True	1990-02-03
customer_city	City of the customer.	String	True	Milano
customer_country	Country of the customer.	String	True	Italy
customer_ateco_code	Ateco code of the customer.	String	True	31.00.00
customer_number_employees	Number of employees of the customer.	Positive integer number	True	20
customer_revenue	Revenue of the customer.	Positive number with decimal digits separated by dots	True	1000000.00

CSV file example

id,activation_datetime,cancellation_datetime,is_cancelled,billing_period_unit,billing_period,customer_id,customer_type,customer_gender,customer_birth_date,customer_city,customer_country,customer_ateco_code,customer_number_employees,customer_revenue
1,2018-02-01T00:00:00,,False,MONTH,1,45,PRIVATE,MALE,1990-02-03,Milano,Italy,,,
2,2019-05-10T14:00:00,2023-06-01T00:00:00,True,YEAR,1,56,COMPANY,,,,France,62.02.00,100,5000000.00
3,2020-11-15T08:30:00,,False,MONTH,3,78,PRIVATE,FEMALE,1985-11-20,Rome,Italy,,,
4,2021-09-01T13:45:00,,False,WEEK,4,92,COMPANY,,,Berlin,Germany,13.30.00,50,750000.00

Invoice data

In the following table, you can find the schema for the invoice data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.

Column name	Description	Format details	Nullable	Example
id	Identifier of the invoice.	String	False	1
subscription_id	Identifier of the subscription.	String	False	1
emission_datetime	Datetime of invoice emission for the specified billing period.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-02-01T00:00:00
payment_datetime	Datetime of payment of the invoice for the specified billing period.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	True	2022-07-01T07:30:00
was_paid	Boolean indicating whether the invoice has been paid or not.	Bool	False	True
due_datetime	Due datetime of the invoice for the specified billing period.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	True	2022-10-01T00:00:00
amount	Amount of the invoice for the specified billing period.	Positive number with decimal digits separated by dots	False	450.00
billing_period_start_datetime	Start datetime of the billing period.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-02-01T00:00:00
billing_period_end_datetime	End datetime of the billing period.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-01-01T00:00:00

CSV file example

id,subscription_id,emission_datetime,payment_datetime,was_paid,due_datetime,amount,billing_period_start_datetime,billing_period_end_datetime
1,1,2022-02-01T00:00:00,2022-07-01T07:30:00,True,2022-10-01T00:00:00,450.00,2022-01-01T00:00:00,2022-02-01T00:00:00
2,2,2023-01-01T00:00:00,2023-02-01T00:00:00,True,2023-03-01T00:00:00,1200.00,2022-01-01T00:00:00,2023-01-01T00:00:00
3,3,2021-12-01T10:00:00,,False,2022-02-01T00:00:00,350.00,2022-01-01T00:00:00,2022-04-01T00:00:00
4,4,2022-04-01T15:00:00,2022-04-15T12:00:00,True,2022-05-01T00:00:00,150.00,2022-04-01T00:00:00,2022-04-29T00:00:00

Subscription Item data

In the following table, you can find the schema for the subscription item data. Subscription items are the products or services that are part of a subscription.

The data must be uploaded via a unique csv file, and must have all the columns listed below.

The following columns are the minimum required for the subscription item data. It is however recommended to add extra columns to characterize the item, such as its technical specifications.

Column name	Description	Format details	Nullable	Example
id	Identifier of the subscription item.	String	False	12
subscription_id	Identifier of the subscription this item belongs to.	String	False	324
amount	Price for this item in the subscription bill.	Positive number with decimal digits separated by dots	True	10.3
deactivation_cost	Cost for deactivating this item in the subscription.	Positive number with decimal digits separated by dots	True	5.0
from_datetime	Datetime from which this item is included in the subscription with this price.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-01-01T00:00:00
to_datetime	Datetime from which this item is no longer included in the subscription with this price.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	True	2022-11-01T00:00:00
is_active	Boolean indicating whether this item is currently active in the subscription.	Bool	False	True
item_id	Identifier of the item in the catalogue.	String	True	1234
item_type	Type of the item.	String	True	type_A
item_price	Price of this item in the catalogue.	Positive number with decimal digits separated by dots	True	15.0

CSV file example

id,subscription_id,amount,deactivation_cost,from_datetime,to_datetime,is_active,item_id,item_type,item_price
12,324,10.3,,2022-01-01T00:00:00,2022-11-01T00:00:00,False,1234,type_A,15.0
13,325,20.5,3.2,2021-05-15T08:30:00,2023-06-01T00:00:00,False,5678,type_B,25.0
14,326,,,2020-11-20T12:00:00,,True,7890,type_C,30.0
15,327,8.75,0.0,2019-07-10T14:45:00,2020-01-01T00:00:00,False,,,

Usage data

In the following table, you can find the schema for the usage data. Usage data represents the consumption of the services or products included in the subscription.

The data must be uploaded via a unique csv file, and must have all the columns listed below.

The following columns are the minimum required for the usage data. Since usage data characteristics can vary significantly depending on the service or product, it is recommended to add extra columns to characterize the usage data depending on your specific case.

Column name	Description	Format details	Nullable	Example
id	Identifier of the usage entry.	String	False	12
subscription_id	Identifier of the subscription this usage refers to.	String	False	324
subscription_item_id	Identifier of the subscription item this usage refers to.	String	True	10ss
datetime	Datetime the usage entry refers to.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-01-01T00:00:00
type	Type of the usage. Possible values must be defined in the collection properties.	String	False	type_A

warning

type is an ENUM field. When creating a new collection, you have to define possible values for this field. Once defined, these will be the only values accepted for the type field in the CSV file of the usage data.

CSV file example

id,subscription_id,subscription_item_id,datetime,type
12,324,10ss,2022-01-01T00:00:00,type_A
13,325,,2021-05-15T08:30:00,type_B
14,326,20aa,2020-11-20T12:00:00,type_C
15,327,30bb,2019-07-10T14:45:00,type_D

Ticket data

In the following table, you can find the schema for the ticket data. Ticket data represents the interactions between the customer and the support service.

The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the ticket data.

Column name	Description	Format details	Nullable	Example
id	Identifier of the ticket.	String	False	12
subscription_id	Identifier of the subscription the ticket is related to.	String	False	44
subscription_item_id	Identifier of the subscription item the ticket is related to.	String	True	1
open_datetime	Datetime the ticket was opened.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-01-01T00:00:00
closure_datetime	Datetime the ticket was closed.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	True	2022-01-01T00:00:00
is_closed	Boolean indicating whether the ticket has been closed.	Bool	False	False
type	Type of the ticket. Possible values must be defined in the collection properties.	String	False	type_A
was_solved	Boolean indicating whether the ticket has been solved or not.	Bool	False	False

warning

CSV file example

id,subscription_id,subscription_item_id,open_datetime,closure_datetime,is_closed,type,was_solved
12,44,1,2022-01-01T00:00:00,2022-01-01T12:00:00,True,type_A,True
13,45,,2021-05-15T08:30:00,2021-05-16T10:00:00,True,type_B,False
14,46,2,2020-11-20T12:00:00,2020-11-22T14:30:00,True,type_C,True
15,47,3,2019-07-10T14:45:00,,False,type_D,False

Interaction data

In the following table, you can find the schema for the interaction data. Interaction data represents the interactions between the customer and the customer care service (e.g., chat, phone calls).

The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the interaction data.

Column name	Description	Format details	Nullable	Example
id	Identifier of the interaction.	String	False	12
subscription_id	Identifier of the subscription the interaction is related to.	String	False	44
subscription_item_id	Identifier of the subscription item the interaction is related to.	String	True	1
datetime	Datetime the interaction refers to.	yyyy-mm-dd`T`hh:MM:ss+hh:MM (timezone is optional)	False	2022-01-01T00:00:00
type	Type of the interaction. Possible values must be defined in the collection properties.	String	False	type_A
commercial_agent_id	Identifier of the commercial agent involved in the interaction.	String	True	1

warning

CSV file example

id,subscription_id,subscription_item_id,datetime,type,commercial_agent_id
12,44,1,2022-01-01T00:00:00,type_A,1
13,45,,2021-05-15T08:30:00,type_B,2
14,46,2,2020-11-20T12:00:00,type_C,
15,47,3,2019-07-10T14:45:00,type_D,3

Guidelines​

Subscription data​

CSV file example​

Invoice data​

CSV file example​

Subscription Item data​

CSV file example​

Usage data​

CSV file example​

Ticket data​

CSV file example​

Interaction data​

CSV file example​

Guidelines

Subscription data

CSV file example

Invoice data

CSV file example

Subscription Item data

CSV file example

Usage data

CSV file example

Ticket data

CSV file example

Interaction data

CSV file example