You are going to build an AI data analysis agency. Not a blog about data. Not a course about analytics. A service that takes a business’s raw data, extracts insights, and delivers recommendations they can act on — for money. This guide covers every step. Follow it in order. Do not skip steps.
Prerequisites
- A laptop with at least 8GB RAM
- Python 3.11+ installed — verify with
python --version - A ChatGPT Plus account ($20/mo) — for Code Interpreter
- A Google Sheets account (free)
- A Notion account (free)
- A Canva account (free)
- A Make account (free tier)
- A Stripe account (free)
- 4-6 hours of uninterrupted time
Total upfront cost: $20 for ChatGPT Plus.
Step 1: Configure Your Python Analysis Environment
Open your terminal. Run:
mkdir ai-data-analysis && cd ai-data-analysis
mkdir templates clients output
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install jupyterlab pandas plotly openpyxl scipy scikit-learn kaleido nbconvert
Verify:
python -c "import pandas; import plotly; import scipy; import sklearn; print('All packages installed')"
You should see All packages installed. Start Jupyter
:
jupyter lab
Create a notebook called analysis-template.ipynb in the templates/ folder. Add this boilerplate:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats
import numpy as np
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)
print("Analysis environment ready.")
CHECK-IN: Step 1 Complete
- Virtual environment active (
(venv)in prompt) - All packages import without errors
- Jupyter Lab opens and notebook runs boilerplate
Step 2: Master the ChatGPT Code Interpreter Pipeline
Upload and Describe
Open ChatGPT. Upload a CSV file. Type:
“Analyze this dataset. Provide: (1) Total rows and columns, (2) Data types, (3) Missing values per column, (4) Duplicate rows, (5) Basic statistics for numeric columns, (6) Unique values for categorical columns, (7) Top 3 data quality issues.”
Identify Patterns
“Identify the 10 most interesting patterns in this data. For each: (1) Describe in plain English, (2) Statistical evidence, (3) Business impact (HIGH/MEDIUM/LOW), (4) One specific action.”
Generate Visualizations
“Create 5 Plotly visualizations. Each must have a clear title, axis labels, and a 2-sentence annotation. Save as PNG at 1200x800.”
Develop Recommendations
“Create 7 prioritized recommendations. For each: (1) One-sentence recommendation, (2) Data evidence, (3) Expected impact, (4) Implementation difficulty, (5) Timeline.”
CHECK-IN: Step 2 Complete
- Can upload CSV and get structured summary
- Can generate prioritized patterns with evidence
- Can produce 5 annotated visualizations
- Can generate actionable recommendations
Step 3: Build Reusable Analysis Templates
Template 1: Sales Analysis
Create templates/sales-analysis.ipynb. This template analyzes sales data for revenue trends, top products, seasonal patterns, and growth opportunities. Expected columns: date, product, category, quantity, revenue, customer_id, region.
Key sections: Data cleaning, revenue trend analysis with month-over-month comparison, product Pareto analysis (top 20% products by revenue), seasonal pattern detection by day of week.
Template 2: Customer Segmentation
Create templates/customer-segmentation.ipynb. This segments customers by RFM (Recency, Frequency, Monetary) analysis and K-means clustering.
Key sections: RFM scoring (1-5 scale per dimension), rule-based segment assignment (Champions, Loyal, New, At Risk, Lost), K-means clustering with elbow method for optimal k, segment visualization and revenue contribution pie chart.
Template 3: Financial Reporting
Create templates/financial-reporting.ipynb. This turns raw financial data into P&L summaries, expense breakdowns, and cash flow analysis.
Key sections: Monthly P&L with revenue/expenses/profit bars, expense breakdown treemap, cumulative cash flow line chart, margin trend analysis.
CHECK-IN: Step 3 Complete
- Sales template runs end-to-end on sample CSV
- Customer segmentation produces RFM scores and clusters
- Financial reporting generates P&L and cash flow charts
- All templates export PNGs to output/ folder
Step 4: Build the Client Onboarding Process
Create the Onboarding Form
In Notion , create a form collecting: company name, industry, data sources, file format, approximate row count, data quality self-assessment (1-5), top 3 questions they want answered, what decision the analysis will inform.
Create the Data Handling Agreement
A simple document covering: scope of data processing, data security measures, data ownership, confidentiality, and liability limits.
Set Up Client Folder Structure
clients/
CLIENT_NAME/
raw/ # Original data (never modify)
cleaned/ # Cleaned datasets
analysis/ # Jupyter notebooks
output/ # Charts and reports
communication/ # Meeting notes
contract/ # Signed agreements
CHECK-IN: Step 4 Complete
- Onboarding form collects all required information
- Data handling agreement exists
- Folder structure template ready
Step 5: Execute the Data Cleaning Workflow
When you receive client data, save the original to clients/CLIENT_NAME/raw/. Create a data quality notebook:
import pandas as pd
import numpy as np
df = pd.read_csv('../raw/data.csv')
# 1. Remove duplicates
df = df.drop_duplicates()
# 2. Standardize column names
df.columns = df.columns.str.lower().str.replace(' ', '_')
# 3. Parse dates
# 4. Handle missing values (>50% missing → drop column, numeric → median, text → 'Unknown')
# 5. Remove negative revenue/price values
# 6. Save cleaned data
df.to_csv('../cleaned/data_cleaned.csv', index=False)
CHECK-IN: Step 5 Complete
- Data quality report generated
- Cleaning notebook documents every transformation
- Cleaned data saved separately from raw
- Validation checks all pass
Step 6: Produce Client Deliverables
The Analysis Report Structure
- Executive Summary — 3 key findings, each with one data point
- Data Quality Assessment — What was cleaned, what was missing, what was excluded
- Key Patterns and Anomalies — Top 10 with statistical evidence
- Visualizations — 5-7 charts with annotations
- Recommendations — 7 prioritized, each tied to a specific finding
- Appendix — Full statistical output and methodology
Package for the Client
Use Canva to create a professional cover page. Export the analysis as a PDF. Create a Looker Studio dashboard for interactive exploration.
CHECK-IN: Step 6 Complete
- Analysis report follows the structure above
- Every recommendation ties to a specific data finding
- Professional PDF and interactive dashboard delivered
Step 7: Scale with Automation and Templates
Automate Report Delivery with Make.com
Build a Make.com scenario that:
- Triggers when a new Google Sheets row is added (weekly data update)
- Runs the analysis template via API
- Generates visualizations
- Emails the weekly summary to the client
Build Your Template Library
After 10 clients, you will have 15-20 templates covering 80% of requests. Each template takes 4-8 hours to build initially but reduces delivery from days to hours.
Hire a Junior Analyst
At 8-10 clients, hire a junior analyst ($15-25/hour on Upwork). Give them your templates and SOPs. They handle delivery while you focus on sales.
Cost Breakdown
| Item | Cost | When |
|---|---|---|
| ChatGPT Plus | $20/mo | Immediately |
| Python/Jupyter | $0 | Always |
| Google Sheets | $0 | Always |
| Looker Studio | $0 | Always |
| Canva Pro | $13/mo | At 3+ clients |
| Make.com Teams | $16/mo | At 5+ clients |
| Tableau | $70/mo | At 8+ clients |
Total monthly cost at launch: $20 Total monthly cost at 5 clients: $49
Production Checklist
- Python environment configured and tested
- Three analysis templates created and tested with sample data
- ChatGPT Code Interpreter pipeline mastered
- Onboarding form and data handling agreement ready
- Client folder structure template prepared
- Report structure template created
- At least one complete client analysis delivered



