E-commerce analytics: From raw transactions to business insights
540K transactions, 38 countries, 12 months of sales data. A complete end-to-end analysis of a UK-based online retailer, from data cleaning to revenue trends, product performance and customer behaviour.
Why this dataset
The UCI Online Retail dataset is a classic benchmark in e-commerce analytics: 541 909 transactions from a UK-based gift retailer selling across 38 countries between December 2010 and December 2011. It contains every ingredient a real business analyst faces: missing customer IDs, returns mixed in with sales, ambiguous stock codes and a clear seasonality signal heading into Christmas.
My goal was to treat it exactly as I would a real assignment: define business questions first, clean the data rigorously, then answer each question with a chart that communicates clearly rather than just looking impressive.
12 months of retail activity
What the data shows
Monthly revenue: seasonality and Q4 surge
Revenue is relatively stable from January to August (£510K to £740K/month), then accelerates sharply from September onward, peaking at £1.46M in November 2011. The Q4 build-up is textbook for a gift retailer: it starts earlier than most operators expect and is over before December.
Top 10 products by revenue
The top product, REGENCY CAKESTAND 3 TIER at £174K, outsells the second by only £6K, suggesting a relatively flat long-tail rather than a dominant hero SKU. This is useful context for pricing and inventory decisions: no single product carries disproportionate risk.
Top 10 markets outside the UK
The Netherlands and Ireland lead international sales at £284K and £276K respectively, both ahead of larger economies like Germany (£206K) and France (£185K). This likely reflects B2B wholesale relationships rather than pure consumer demand, which is typical for a gift-trade retailer.
Order activity by hour and day, heatmap
Orders concentrate heavily between 10am and 3pm, Monday to Thursday. Friday drops off from the afternoon, and the weekend is nearly silent. This is a pure B2B signal: purchasing managers place orders during working hours and do not shop on weekends, which has direct implications for email campaign scheduling and customer support staffing.
Basket value distribution
The distribution is right-skewed: the median basket sits at £303, well below the mean of £521. This gap reveals a small number of very large orders pulling the average up, consistent with the wholesale profile of the customer base. The modal basket is between £50 and £150, which is where most individual transactions land.
Geographic spread of revenue
The UK accounts for the overwhelming majority of revenue. Beyond it, European markets dominate, with isolated pockets in Australia and Japan. The absence of North America and the rest of Asia is notable for a retailer already serving 38 countries, a potential expansion opportunity.
Interactive Tableau Public dashboard
The same dataset in a fully interactive dashboard with KPI cards, monthly revenue trend, top products, choropleth map and an order heatmap by hour and day of week. Filterable by quarter and country.
E-commerce analytics Tableau Public
Revenue, top products, geographic breakdown and return rates, filterable by period and country.
How it was built
The raw CSV (541 909 rows) was loaded with latin-1 encoding,
then cleaned in a single reproducible Python script: dates parsed to
datetime, returns isolated via the C invoice prefix
and negative quantities, special stock codes (postage, bank charges) excluded,
and a Revenue column derived as Quantity × UnitPrice.
Each chart is a self-contained Plotly HTML file embedded here via an iframe, the same approach as the RLC project, keeping every visualisation interactive without any JavaScript framework dependency on the portfolio side.
Key takeaways
The 25% missing CustomerID rate is not random noise. It likely represents walk-in or one-off buyers without accounts. Treating these rows as "anonymous" rather than dropping them lets you compute accurate revenue totals while still separating cohort analyses that require customer identity.
The heatmap was the most operationally useful chart. Revenue by month tells you what happened; orders by hour and day tell you how to staff, when to send emails, and when to run promotions. For a B2B retailer, targeting weekday mornings is not optional, it is where the business lives.
The mean/median basket gap (£521 vs £303) matters more than either number alone. Reporting only the mean would overstate what a typical customer spends. Reporting only the median would understate the revenue contribution of large wholesale accounts. Both are needed to give an honest picture.
A 7% return rate on revenue is lower than e-commerce benchmarks typically cited for B2C fashion (20–30%), which further supports the B2B wholesale interpretation of this customer base. Context changes what a number means.