Skip to main content

Data Schemas

Churn Prediction leverages historical data to forecast the likelihood of customers leaving a service. In this section, we provide the schema for the data required.

The data types required for Churn Prediction are:

  • subscription: data related to the subscription of the customer.
  • invoice: data related to the invoices issued to the customer.
  • subscription_item: data related to the items included in the subscription.
  • usage: data related to the usage of the services or products included in the subscription.
  • ticket: data related to the interactions between the customer and the support service.
  • interaction: data related to the interactions between the customer and the customer care service.

Guidelines

To ensure Churn Prediction performs optimally, please adhere to the following guidelines:

  1. Keep Data Up-to-Date: Regularly update the data in your collection to provide the most accurate predictions. Stale data can lead to misleading results.

  2. Include All Data Types: Ensure your collection contains data for all the required types listed above. Missing data types can hinder the effectiveness of the prediction models.

  3. Historical Data Requirements: Each data type should include a substantial amount of historical data. The required length of this history depends on the billing frequency. As an example:

    • For a 1-month billing frequency, maintain at least 2 years of historical data.
    • For a 1-year billing frequency, retain at least 5-10 years of historical data.
  4. Subscription Cancellations: Subscription data must include records of cancellations to enable the models to learn effectively from past behaviors.

  5. Churn Prediction Requests: When requesting a Churn Prediction, all non-cancelled subscriptions in your subscription data will be returned along with their churn probabilities. Ensure you have at least one non-cancelled subscription available in your collection before making a prediction request.

  6. Minimum Schema for Data Types: Some data types have a minimum schema; however, to achieve more accurate predictions, you are encouraged to upload additional columns.

Subscription data

In the following table, you can find the schema for the subscription data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the subscription.StringFalse1
activation_datetimeDatetime of subscription first activation.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2018-02-01T00:00:00
cancellation_datetimeDatetime of subscription cancellation.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)True2023-06-01T00:00:00
is_cancelledBoolean indicating whether the subscription has been cancelled or not.BoolFalseFalse
billing_period_unitString indicating the type of billing period unit. Can be one of: DAY, WEEK, MONTH, YEAR.StringFalseMONTH
billing_periodNumber of billing_period_unit forming a billing period.Positive integer numberFalse1
customer_idIdentifier of the customer.StringFalse45
customer_typeType of the customer. Can be one of: COMPANY, PRIVATE.StringFalsePRIVATE
customer_genderGender of the customer. Can be one of: MALE, FEMALE.StringTrueFEMALE
customer_birth_dateDate of birth of the customer.yyyy-mm-ddTrue1990-02-03
customer_cityCity of the customer.StringTrueMilano
customer_countryCountry of the customer.StringTrueItaly
customer_ateco_codeAteco code of the customer.StringTrue31.00.00
customer_number_employeesNumber of employees of the customer.Positive integer numberTrue20
customer_revenueRevenue of the customer.Positive number with decimal digits separated by dotsTrue1000000.00

CSV file example

id,activation_datetime,cancellation_datetime,is_cancelled,billing_period_unit,billing_period,customer_id,customer_type,customer_gender,customer_birth_date,customer_city,customer_country,customer_ateco_code,customer_number_employees,customer_revenue
1,2018-02-01T00:00:00,,False,MONTH,1,45,PRIVATE,MALE,1990-02-03,Milano,Italy,,,
2,2019-05-10T14:00:00,2023-06-01T00:00:00,True,YEAR,1,56,COMPANY,,,,France,62.02.00,100,5000000.00
3,2020-11-15T08:30:00,,False,MONTH,3,78,PRIVATE,FEMALE,1985-11-20,Rome,Italy,,,
4,2021-09-01T13:45:00,,False,WEEK,4,92,COMPANY,,,Berlin,Germany,13.30.00,50,750000.00

Invoice data

In the following table, you can find the schema for the invoice data. The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the invoice.StringFalse1
subscription_idIdentifier of the subscription.StringFalse1
emission_datetimeDatetime of invoice emission for the specified billing period.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-02-01T00:00:00
payment_datetimeDatetime of payment of the invoice for the specified billing period.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)True2022-07-01T07:30:00
was_paidBoolean indicating whether the invoice has been paid or not.BoolFalseTrue
due_datetimeDue datetime of the invoice for the specified billing period.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)True2022-10-01T00:00:00
amountAmount of the invoice for the specified billing period.Positive number with decimal digits separated by dotsFalse450.00
billing_period_start_datetimeStart datetime of the billing period.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-02-01T00:00:00
billing_period_end_datetimeEnd datetime of the billing period.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-01-01T00:00:00

CSV file example

id,subscription_id,emission_datetime,payment_datetime,was_paid,due_datetime,amount,billing_period_start_datetime,billing_period_end_datetime
1,1,2022-02-01T00:00:00,2022-07-01T07:30:00,True,2022-10-01T00:00:00,450.00,2022-01-01T00:00:00,2022-02-01T00:00:00
2,2,2023-01-01T00:00:00,2023-02-01T00:00:00,True,2023-03-01T00:00:00,1200.00,2022-01-01T00:00:00,2023-01-01T00:00:00
3,3,2021-12-01T10:00:00,,False,2022-02-01T00:00:00,350.00,2022-01-01T00:00:00,2022-04-01T00:00:00
4,4,2022-04-01T15:00:00,2022-04-15T12:00:00,True,2022-05-01T00:00:00,150.00,2022-04-01T00:00:00,2022-04-29T00:00:00

Subscription Item data

In the following table, you can find the schema for the subscription item data. Subscription items are the products or services that are part of a subscription.

The data must be uploaded via a unique csv file, and must have all the columns listed below.

The following columns are the minimum required for the subscription item data. It is however recommended to add extra columns to characterize the item, such as its technical specifications.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the subscription item.StringFalse12
subscription_idIdentifier of the subscription this item belongs to.StringFalse324
amountPrice for this item in the subscription bill.Positive number with decimal digits separated by dotsTrue10.3
deactivation_costCost for deactivating this item in the subscription.Positive number with decimal digits separated by dotsTrue5.0
from_datetimeDatetime from which this item is included in the subscription with this price.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-01-01T00:00:00
to_datetimeDatetime from which this item is no longer included in the subscription with this price.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)True2022-11-01T00:00:00
is_activeBoolean indicating whether this item is currently active in the subscription.BoolFalseTrue
item_idIdentifier of the item in the catalogue.StringTrue1234
item_typeType of the item.StringTruetype_A
item_pricePrice of this item in the catalogue.Positive number with decimal digits separated by dotsTrue15.0

CSV file example

id,subscription_id,amount,deactivation_cost,from_datetime,to_datetime,is_active,item_id,item_type,item_price
12,324,10.3,,2022-01-01T00:00:00,2022-11-01T00:00:00,False,1234,type_A,15.0
13,325,20.5,3.2,2021-05-15T08:30:00,2023-06-01T00:00:00,False,5678,type_B,25.0
14,326,,,2020-11-20T12:00:00,,True,7890,type_C,30.0
15,327,8.75,0.0,2019-07-10T14:45:00,2020-01-01T00:00:00,False,,,

Usage data

In the following table, you can find the schema for the usage data. Usage data represents the consumption of the services or products included in the subscription.

The data must be uploaded via a unique csv file, and must have all the columns listed below.

The following columns are the minimum required for the usage data. Since usage data characteristics can vary significantly depending on the service or product, it is recommended to add extra columns to characterize the usage data depending on your specific case.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the usage entry.StringFalse12
subscription_idIdentifier of the subscription this usage refers to.StringFalse324
subscription_item_idIdentifier of the subscription item this usage refers to.StringTrue10ss
datetimeDatetime the usage entry refers to.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-01-01T00:00:00
typeType of the usage. Possible values must be defined in the collection properties.StringFalsetype_A
warning

type is an ENUM field. When creating a new collection, you have to define possible values for this field. Once defined, these will be the only values accepted for the type field in the CSV file of the usage data.

CSV file example

id,subscription_id,subscription_item_id,datetime,type
12,324,10ss,2022-01-01T00:00:00,type_A
13,325,,2021-05-15T08:30:00,type_B
14,326,20aa,2020-11-20T12:00:00,type_C
15,327,30bb,2019-07-10T14:45:00,type_D

Ticket data

In the following table, you can find the schema for the ticket data. Ticket data represents the interactions between the customer and the support service.

The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the ticket data.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the ticket.StringFalse12
subscription_idIdentifier of the subscription the ticket is related to.StringFalse44
subscription_item_idIdentifier of the subscription item the ticket is related to.StringTrue1
open_datetimeDatetime the ticket was opened.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-01-01T00:00:00
closure_datetimeDatetime the ticket was closed.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)True2022-01-01T00:00:00
is_closedBoolean indicating whether the ticket has been closed.BoolFalseFalse
typeType of the ticket. Possible values must be defined in the collection properties.StringFalsetype_A
was_solvedBoolean indicating whether the ticket has been solved or not.BoolFalseFalse
warning

type is an ENUM field. When creating a new collection, you have to define possible values for this field. Once defined, these will be the only values accepted for the type field in the CSV file of the ticket data.

CSV file example

id,subscription_id,subscription_item_id,open_datetime,closure_datetime,is_closed,type,was_solved
12,44,1,2022-01-01T00:00:00,2022-01-01T12:00:00,True,type_A,True
13,45,,2021-05-15T08:30:00,2021-05-16T10:00:00,True,type_B,False
14,46,2,2020-11-20T12:00:00,2020-11-22T14:30:00,True,type_C,True
15,47,3,2019-07-10T14:45:00,,False,type_D,False

Interaction data

In the following table, you can find the schema for the interaction data. Interaction data represents the interactions between the customer and the customer care service (e.g., chat, phone calls).

The data must be uploaded via a unique csv file, and must have all the columns listed below. You can add any extra columns to characterize the interaction data.

Column nameDescriptionFormat detailsNullableExample
idIdentifier of the interaction.StringFalse12
subscription_idIdentifier of the subscription the interaction is related to.StringFalse44
subscription_item_idIdentifier of the subscription item the interaction is related to.StringTrue1
datetimeDatetime the interaction refers to.yyyy-mm-ddThh:MM:ss+hh:MM (timezone is optional)False2022-01-01T00:00:00
typeType of the interaction. Possible values must be defined in the collection properties.StringFalsetype_A
commercial_agent_idIdentifier of the commercial agent involved in the interaction.StringTrue1
warning

type is an ENUM field. When creating a new collection, you have to define possible values for this field. Once defined, these will be the only values accepted for the type field in the CSV file of the interaction data.

CSV file example

id,subscription_id,subscription_item_id,datetime,type,commercial_agent_id
12,44,1,2022-01-01T00:00:00,type_A,1
13,45,,2021-05-15T08:30:00,type_B,2
14,46,2,2020-11-20T12:00:00,type_C,
15,47,3,2019-07-10T14:45:00,type_D,3