Home โ€บ Insights โ€บ Data Analytics
๐ŸŽฏ Data Analytics

How to Calculate Customer Lifetime Value (LTV) Using SQL and Python โ€” Full Worked Example

AT
Awais Tahir
Finance Professional ยท Data Analyst
๐Ÿ“… March 28, 2025
โฑ 10 min read
๐Ÿ‘ LTV ยท SQL ยท Python
LTVSQLPythonRFMCustomer Analytics
๐ŸŽฏ
โšก Key Takeaways
  • LTV is the total revenue a customer generates over their entire relationship with your business
  • Most companies either do not calculate LTV or use a formula that ignores churn โ€” both are costly mistakes
  • The historic LTV model uses actual transaction data โ€” the most reliable method for any business with existing data
  • RFM segmentation (Recency, Frequency, Monetary) lets you act on LTV โ€” not just measure it
  • The ratio of LTV to Customer Acquisition Cost (CAC) is the most important unit economics metric for any business

Why LTV Is the Most Important Number in Your Business

Customer Lifetime Value answers one question: how much is a customer actually worth to your business? Not just their first order โ€” their entire relationship, across every purchase, over every year they remain a customer.

This number drives every critical business decision: how much you can afford to spend acquiring a customer, which customer segments to prioritise, which products to promote, and when a customer is at risk of churning. Without LTV, these decisions are guesses. With it, they are calculations.

Yet in five years of working with businesses on their data, I have found that fewer than 20% actually calculate LTV correctly โ€” and many do not calculate it at all. This article fixes that.

"If you do not know what a customer is worth, you do not know what a customer is worth spending to acquire."

Three LTV Models โ€” Which One to Use

There are three main approaches to calculating LTV, each suited to different situations:

  1. Historic LTV โ€” sums actual revenue from existing customers. Best when you have transaction history. Most accurate. We will use this one.
  2. Predictive LTV โ€” uses machine learning to forecast future purchases. Best for large datasets with clear purchase patterns. More complex.
  3. Simple Formula LTV โ€” Average Order Value ร— Purchase Frequency ร— Customer Lifespan. Quick but often inaccurate because it ignores variance and churn.
โš ๏ธ
The most common LTV mistake
Using LTV = AOV ร— Frequency ร— Lifespan without accounting for gross margin. If your average order is ยฃ100, frequency is 4x/year, and lifespan is 3 years โ€” your LTV is ยฃ1,200 in revenue. But if your gross margin is 30%, the real LTV is only ยฃ360. Decisions based on the revenue number will systematically overspend on acquisition.

Step 1 โ€” Calculate Historic LTV in SQL

Start with the database. This query calculates each customer's lifetime revenue, order count, average order value, first purchase date, most recent purchase, and how many days they have been a customer.

PostgreSQL โ€” Historic LTV Base Query
WITH customer_orders AS (
    SELECT
        o.customer_id,
        c.customer_name,
        c.segment,
        c.acquisition_channel,
        SUM(o.order_value)                    AS lifetime_revenue,
        SUM(o.order_value * o.gross_margin)   AS lifetime_gross_profit,
        COUNT(o.order_id)                     AS total_orders,
        AVG(o.order_value)                    AS avg_order_value,
        MIN(o.order_date)                     AS first_order_date,
        MAX(o.order_date)                     AS last_order_date,
        MAX(o.order_date) - MIN(o.order_date) AS customer_lifespan_days,
        NOW()::date - MAX(o.order_date)      AS days_since_last_order
    FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
    GROUP BY
        o.customer_id, c.customer_name,
        c.segment, c.acquisition_channel
)
SELECT
    *,
    -- Annualised LTV (gross profit per year)
    CASE
        WHEN customer_lifespan_days > 0
        THEN ROUND(lifetime_gross_profit / (customer_lifespan_days / 365.0), 2)
        ELSE lifetime_gross_profit
    END                                      AS annual_ltv,

    -- LTV quartile ranking
    NTILE(4) OVER (
        ORDER BY lifetime_gross_profit DESC
    )                                        AS ltv_quartile

FROM customer_orders
ORDER BY lifetime_gross_profit DESC;

-- Top results example:
-- customer  | segment    | ltv_revenue | ltv_profit | orders | quartile
-- Acme Ltd  | Enterprise |   ยฃ284,420  |   ยฃ96,703  |   47   |    1
-- Beta Co   | Mid-Market |   ยฃ142,810  |   ยฃ42,843  |   31   |    1
-- Gamma Inc | Enterprise |    ยฃ98,340  |   ยฃ29,502  |   22   |    2

Step 2 โ€” RFM Segmentation in SQL

LTV tells you what a customer has been worth. RFM (Recency, Frequency, Monetary) tells you what they are likely to do next โ€” and therefore who to act on right now. Each customer gets a score of 1โ€“4 on all three dimensions, then a combined segment label.

PostgreSQL โ€” Full RFM Segmentation
WITH rfm_raw AS (
    SELECT
        customer_id,
        NOW()::date - MAX(order_date)  AS recency_days,
        COUNT(order_id)               AS frequency,
        SUM(order_value)              AS monetary
    FROM orders
    GROUP BY customer_id
),
rfm_scored AS (
    SELECT
        customer_id,
        recency_days,
        frequency,
        monetary,
        -- Recency: lower days = better = higher score
        NTILE(4) OVER (ORDER BY recency_days ASC)  AS r_score,
        -- Frequency: higher = better = higher score
        NTILE(4) OVER (ORDER BY frequency DESC)    AS f_score,
        -- Monetary: higher = better = higher score
        NTILE(4) OVER (ORDER BY monetary DESC)     AS m_score
    FROM rfm_raw
)
SELECT
    customer_id,
    recency_days,
    frequency,
    monetary,
    r_score, f_score, m_score,
    (r_score + f_score + m_score)           AS rfm_total,
    CASE
        WHEN r_score = 4 AND f_score = 4  THEN 'Champions'
        WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
        WHEN r_score = 4 AND f_score <= 2  THEN 'New Customers'
        WHEN r_score <= 2 AND f_score >= 3  THEN 'At Risk'
        WHEN r_score = 1 AND f_score >= 2  THEN 'Cannot Lose Them'
        WHEN r_score = 1 AND f_score = 1  THEN 'Lost'
        ELSE 'Potential'
    END                                     AS rfm_segment
FROM rfm_scored
ORDER BY rfm_total DESC;

Step 3 โ€” Analyse and Visualise in Python

Now bring the SQL results into Python for deeper analysis and visualisation. This code loads your query result, calculates segment-level averages, and produces a summary ready for a dashboard or PowerPoint.

Python โ€” LTV Segment Analysis
import pandas as pd
import matplotlib.pyplot as plt
import sqlalchemy

# Connect and load
engine = sqlalchemy.create_engine(DATABASE_URL)
df = pd.read_sql("SELECT * FROM rfm_results", engine)

# Segment summary
summary = df.groupby('rfm_segment').agg(
    customer_count  = ('customer_id',  'count'),
    avg_ltv         = ('monetary',     'mean'),
    total_revenue   = ('monetary',     'sum'),
    avg_recency     = ('recency_days', 'mean'),
    avg_frequency   = ('frequency',    'mean')
).round(2).sort_values('total_revenue', ascending=False)

print(summary)

# Output:
# segment          | count | avg_ltv  | total_rev   | avg_recency
# Champions        |   142 | ยฃ18,420  | ยฃ2,615,640  |   12 days
# Loyal Customers  |   384 |  ยฃ6,280  | ยฃ2,411,520  |   34 days
# At Risk          |   218 |  ยฃ4,940  | ยฃ1,076,920  |  142 days
# Cannot Lose Them |    89 |  ยฃ8,120  |   ยฃ722,680  |  198 days
# New Customers    |   312 |    ยฃ840  |   ยฃ262,080  |    8 days
# Lost             |   441 |    ยฃ320  |   ยฃ141,120  |  380 days

# At-risk customers to action immediately
at_risk = df[df['rfm_segment'] == 'At Risk'].sort_values('monetary', ascending=False)
print(f"\n{len(at_risk)} at-risk customers โ€” ยฃ{at_risk['monetary'].sum():,.0f} at stake")
# 218 at-risk customers โ€” ยฃ1,076,920 at stake

# Export list for sales team
at_risk[['customer_id', 'monetary', 'recency_days', 'frequency']].to_csv(
    'at_risk_customers.csv', index=False
)

Step 4 โ€” The LTV:CAC Ratio

LTV in isolation is interesting. LTV divided by Customer Acquisition Cost (CAC) is actionable. This ratio tells you how efficient your customer acquisition is โ€” and whether you can afford to grow faster.

Python โ€” LTV:CAC by Acquisition Channel
# Merge LTV data with marketing spend data ltv_by_channel = df.groupby('acquisition_channel')['lifetime_gross_profit'].mean() cac_by_channel = pd.Series({ 'Google PPC': 280, 'Email Marketing': 42, 'SEO': 95, 'Social Media': 165, 'Referral': 18 }) ratio = (ltv_by_channel / cac_by_channel).round(1).sort_values(ascending=False) print(ratio) # LTV:CAC Results: # Referral 42.3x โ† extraordinary โ€” invest more here # Email Marketing 28.6x โ† excellent # SEO 14.2x โ† great # Social Media 6.8x โ† acceptable # Google PPC 3.1x โ† barely viable โ€” review spend # Rule of thumb: LTV:CAC > 3x = viable, > 5x = healthy, > 10x = excellent
โœ…
What to do with this
If your Referral channel has a 42x LTV:CAC ratio and Google PPC has 3x, the implication is clear โ€” build a formal referral programme before spending another pound on PPC. This single analysis often justifies a complete reallocation of the marketing budget.

What to Do With Each Segment

The entire point of RFM segmentation is to take different actions with different groups. Here is the playbook:

  • Champions โ€” reward them. Ask for referrals. Offer early access to new products. They will bring you more Champions.
  • Loyal Customers โ€” upsell. They trust you. Introduce them to higher-margin products or services they have not tried.
  • At Risk โ€” contact personally. These are high-value customers who have gone quiet. A personal email or call recovers a significant percentage.
  • Cannot Lose Them โ€” urgent outreach. High historical value, very inactive. This is your highest-priority retention list.
  • New Customers โ€” onboard properly. Their second purchase is the most important predictor of long-term retention. Make it easy.
  • Lost โ€” win-back campaign or accept the loss. Do not spend the same CAC to reacquire them โ€” they have already shown low engagement.

Conclusion โ€” LTV Is a Decision Engine

Customer Lifetime Value is not a vanity metric for investor decks. It is a decision engine that tells you where to spend, who to call, and how fast you can grow. Build the SQL queries above on your own transaction data this week. The insights will pay for the time many times over.

If you want help building a full LTV and RFM model for your business โ€” or want to connect this analysis to a Power BI dashboard your sales team can use daily โ€” get in touch.

Need a Customer Analytics Model?

I build complete LTV and RFM segmentation models using your transaction data โ€” delivered as SQL views, Python scripts, and a Power BI dashboard your team can use every week.