Data Schemas
Churn Prediction leverages historical data to forecast the likelihood of customers leaving a service. In this section, we provide the schema for the data required.
The data types required for Churn Prediction are:
- subscription: data related to the subscription of the customer.
- invoice: data related to the invoices issued to the customer.
- subscription_item: data related to the items included in the subscription.
- usage: data related to the usage of the services or products included in the subscription.
- ticket: data related to the interactions between the customer and the support service.
- interaction: data related to the interactions between the customer and the customer care service.
Guidelines
To ensure Churn Prediction performs optimally, please adhere to the following guidelines:
-
Keep Data Up-to-Date: Regularly update the data in your collection to provide the most accurate predictions. Stale data can lead to misleading results.
-
Include All Data Types: Ensure your collection contains data for all the required types listed above. Missing data types can hinder the effectiveness of the prediction models.
-
Historical Data Requirements: Each data type should include a substantial amount of historical data. The required length of this history depends on the billing frequency. As an example:
- For a 1-month billing frequency, maintain at least 2 years of historical data.
- For a 1-year billing frequency, retain at least 5-10 years of historical data.
-
Subscription Cancellations: Subscription data must include records of cancellations to enable the models to learn effectively from past behaviors.
-
Churn Prediction Requests: When requesting a Churn Prediction, all non-cancelled subscriptions in your subscription data will be returned along with their churn probabilities. Ensure you have at least one non-cancelled subscription available in your collection before making a prediction request.
-
Minimum Schema for Data Types: Some data types have a minimum schema; however, to achieve more accurate predictions, you are encouraged to upload additional columns.
Subscription data
In the following table, you can find the schema for the subscription data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the subscription. | String | False | 1 |
| activation_datetime | Datetime of subscription first activation. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2018-02-01T00:00:00 |
| cancellation_datetime | Datetime of subscription cancellation. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | True | 2023-06-01T00:00:00 |
| is_cancelled | Boolean indicating whether the subscription has been cancelled or not. | Bool | False | False |
| billing_period_unit | String indicating the type of billing period unit. Can be one of: DAY, WEEK, MONTH, YEAR. | String | False | MONTH |
| billing_period | Number of billing_period_unit forming a billing period. | Positive integer number | False | 1 |
| customer_id | Identifier of the customer. | String | False | 45 |
| customer_type | Type of the customer. Can be one of: COMPANY, PRIVATE. | String | False | PRIVATE |
| customer_gender | Gender of the customer. Can be one of: MALE, FEMALE. | String | True | FEMALE |
| customer_birth_date | Date of birth of the customer. | yyyy-mm-dd | True | 1990-02-03 |
| customer_city | City of the customer. | String | True | Milano |
| customer_country | Country of the customer. | String | True | Italy |
| customer_ateco_code | Ateco code of the customer. | String | True | 31.00.00 |
| customer_number_employees | Number of employees of the customer. | Positive integer number | True | 20 |
| customer_revenue | Revenue of the customer. | Positive number with decimal digits separated by dots | True | 1000000.00 |
CSV file example
id,activation_datetime,cancellation_datetime,is_cancelled,billing_period_unit,billing_period,customer_id,customer_type,customer_gender,customer_birth_date,customer_city,customer_country,customer_ateco_code,customer_number_employees,customer_revenue
1,2018-02-01T00:00:00,,False,MONTH,1,45,PRIVATE,MALE,1990-02-03,Milano,Italy,,,
2,2019-05-10T14:00:00,2023-06-01T00:00:00,True,YEAR,1,56,COMPANY,,,,France,62.02.00,100,5000000.00
3,2020-11-15T08:30:00,,False,MONTH,3,78,PRIVATE,FEMALE,1985-11-20,Rome,Italy,,,
4,2021-09-01T13:45:00,,False,WEEK,4,92,COMPANY,,,Berlin,Germany,13.30.00,50,750000.00
Invoice data
In the following table, you can find the schema for the invoice data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the invoice. | String | False | 1 |
| subscription_id | Identifier of the subscription. | String | False | 1 |
| emission_datetime | Datetime of invoice emission for the specified billing period. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-02-01T00:00:00 |
| payment_datetime | Datetime of payment of the invoice for the specified billing period. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | True | 2022-07-01T07:30:00 |
| was_paid | Boolean indicating whether the invoice has been paid or not. | Bool | False | True |
| due_datetime | Due datetime of the invoice for the specified billing period. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | True | 2022-10-01T00:00:00 |
| amount | Amount of the invoice for the specified billing period. | Positive number with decimal digits separated by dots | False | 450.00 |
| billing_period_start_datetime | Start datetime of the billing period. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-02-01T00:00:00 |
| billing_period_end_datetime | End datetime of the billing period. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-01-01T00:00:00 |
CSV file example
id,subscription_id,emission_datetime,payment_datetime,was_paid,due_datetime,amount,billing_period_start_datetime,billing_period_end_datetime
1,1,2022-02-01T00:00:00,2022-07-01T07:30:00,True,2022-10-01T00:00:00,450.00,2022-01-01T00:00:00,2022-02-01T00:00:00
2,2,2023-01-01T00:00:00,2023-02-01T00:00:00,True,2023-03-01T00:00:00,1200.00,2022-01-01T00:00:00,2023-01-01T00:00:00
3,3,2021-12-01T10:00:00,,False,2022-02-01T00:00:00,350.00,2022-01-01T00:00:00,2022-04-01T00:00:00
4,4,2022-04-01T15:00:00,2022-04-15T12:00:00,True,2022-05-01T00:00:00,150.00,2022-04-01T00:00:00,2022-04-29T00:00:00
Subscription Item data
In the following table, you can find the schema for the subscription item data. Subscription items are the products or services that are part of a subscription.
The data must be uploaded via a unique csv file, and must have all the columns listed below.
The following columns are the minimum required for the subscription item data. It is however recommended to add extra columns to characterize the item, such as its technical specifications.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the subscription item. | String | False | 12 |
| subscription_id | Identifier of the subscription this item belongs to. | String | False | 324 |
| amount | Price for this item in the subscription bill. | Positive number with decimal digits separated by dots | True | 10.3 |
| deactivation_cost | Cost for deactivating this item in the subscription. | Positive number with decimal digits separated by dots | True | 5.0 |
| from_datetime | Datetime from which this item is included in the subscription with this price. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-01-01T00:00:00 |
| to_datetime | Datetime from which this item is no longer included in the subscription with this price. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | True | 2022-11-01T00:00:00 |
| is_active | Boolean indicating whether this item is currently active in the subscription. | Bool | False | True |
| item_id | Identifier of the item in the catalogue. | String | True | 1234 |
| item_type | Type of the item. | String | True | type_A |
| item_price | Price of this item in the catalogue. | Positive number with decimal digits separated by dots | True | 15.0 |
CSV file example
id,subscription_id,amount,deactivation_cost,from_datetime,to_datetime,is_active,item_id,item_type,item_price
12,324,10.3,,2022-01-01T00:00:00,2022-11-01T00:00:00,False,1234,type_A,15.0
13,325,20.5,3.2,2021-05-15T08:30:00,2023-06-01T00:00:00,False,5678,type_B,25.0
14,326,,,2020-11-20T12:00:00,,True,7890,type_C,30.0
15,327,8.75,0.0,2019-07-10T14:45:00,2020-01-01T00:00:00,False,,,
Usage data
In the following table, you can find the schema for the usage data. Usage data represents the consumption of the services or products included in the subscription.
The data must be uploaded via a unique csv file, and must have all the columns listed below.
The following columns are the minimum required for the usage data. Since usage data characteristics can vary significantly depending on the service or product, it is recommended to add extra columns to characterize the usage data depending on your specific case.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the usage entry. | String | False | 12 |
| subscription_id | Identifier of the subscription this usage refers to. | String | False | 324 |
| subscription_item_id | Identifier of the subscription item this usage refers to. | String | True | 10ss |
| datetime | Datetime the usage entry refers to. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-01-01T00:00:00 |
| type | Type of the usage. Possible values must be defined in the collection properties. | String | False | type_A |
type is an ENUM field. When creating a new collection, you have to define possible values for this field.
Once defined, these will be the only values accepted for the type field in the CSV file of the usage data.
CSV file example
id,subscription_id,subscription_item_id,datetime,type
12,324,10ss,2022-01-01T00:00:00,type_A
13,325,,2021-05-15T08:30:00,type_B
14,326,20aa,2020-11-20T12:00:00,type_C
15,327,30bb,2019-07-10T14:45:00,type_D
Ticket data
In the following table, you can find the schema for the ticket data. Ticket data represents the interactions between the customer and the support service.
The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the ticket data.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the ticket. | String | False | 12 |
| subscription_id | Identifier of the subscription the ticket is related to. | String | False | 44 |
| subscription_item_id | Identifier of the subscription item the ticket is related to. | String | True | 1 |
| open_datetime | Datetime the ticket was opened. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-01-01T00:00:00 |
| closure_datetime | Datetime the ticket was closed. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | True | 2022-01-01T00:00:00 |
| is_closed | Boolean indicating whether the ticket has been closed. | Bool | False | False |
| type | Type of the ticket. Possible values must be defined in the collection properties. | String | False | type_A |
| was_solved | Boolean indicating whether the ticket has been solved or not. | Bool | False | False |
type is an ENUM field. When creating a new collection, you have to define possible values for this field.
Once defined, these will be the only values accepted for the type field in the CSV file of the ticket data.
CSV file example
id,subscription_id,subscription_item_id,open_datetime,closure_datetime,is_closed,type,was_solved
12,44,1,2022-01-01T00:00:00,2022-01-01T12:00:00,True,type_A,True
13,45,,2021-05-15T08:30:00,2021-05-16T10:00:00,True,type_B,False
14,46,2,2020-11-20T12:00:00,2020-11-22T14:30:00,True,type_C,True
15,47,3,2019-07-10T14:45:00,,False,type_D,False
Interaction data
In the following table, you can find the schema for the interaction data. Interaction data represents the interactions between the customer and the customer care service (e.g., chat, phone calls).
The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the interaction data.
| Column name | Description | Format details | Nullable | Example |
|---|---|---|---|---|
| id | Identifier of the interaction. | String | False | 12 |
| subscription_id | Identifier of the subscription the interaction is related to. | String | False | 44 |
| subscription_item_id | Identifier of the subscription item the interaction is related to. | String | True | 1 |
| datetime | Datetime the interaction refers to. | yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional) | False | 2022-01-01T00:00:00 |
| type | Type of the interaction. Possible values must be defined in the collection properties. | String | False | type_A |
| commercial_agent_id | Identifier of the commercial agent involved in the interaction. | String | True | 1 |
type is an ENUM field. When creating a new collection, you have to define possible values for this field.
Once defined, these will be the only values accepted for the type field in the CSV file of the interaction data.
CSV file example
id,subscription_id,subscription_item_id,datetime,type,commercial_agent_id
12,44,1,2022-01-01T00:00:00,type_A,1
13,45,,2021-05-15T08:30:00,type_B,2
14,46,2,2020-11-20T12:00:00,type_C,
15,47,3,2019-07-10T14:45:00,type_D,3