👋 Hi, I'm Franco Cappanera

Data Analyst | Business Analyst | BI Engineer

📊 Data Driven Accountant
💻 Excel, Tableau, SQL, Python R
🌄 Reading, Running, Hiking
📍 Minneapolis, MN, USA

About Me
Projects
Resume

Technical Skills

Tableau
Power BI
Looker
Project Management

SQL
Excel
Python
R

Projects

XRP: From Banking Revolution to Courtroom Drama — And What Comes Next

Python | Exploratory Data Analysis & Storytelling
◾ Real-world cryptocurrency data (XRP, BTC, ETH) from Yahoo Finance
◾ Using Python (Pandas, Matplotlib, yFinance) for analysis and data visualization
◾ Utilized moving averages, rolling volatility, drawdowns, cumulative returns, and z-scores
◾ Provided a comprehensive storytelling write-up tying XRP’s price history to Ripple’s milestones, SEC lawsuit, and future outlook (ETFs & regulation)

24 Hours in the Plant: Iron Quality Insights

Python | Data Analysis & Visualization
◾ Analyzed 737k+ rows of real mining process data from Kaggle
◾ Built hourly snapshots, trendlines, pairplots & heatmaps for key variables
◾ Delivered business insights & recommendations to improve iron recovery

From Onboarding to Exit: Using Data to Predict Attrition and Pay at IBM

R | Data Analysis & Predictive Modeling
◾ Cleaned HR dataset & built regression models
◾Visualized attrition & income trends
◾Delivered actionable retention insights

Analyzing Hospital Data with SQL

SQL| Exploratory Data Analysis
◾ Analyzed what affections hospital stay length in MySQL
◾ Created a histogram using SQL (weird, I know)
◾ Real data from over 101,000 hospital patients

Analyzing NBA Data with Tableau

Tableau | Dashboarding
◾ Analyzed the 2022 NBA season statistics
◾ Created Stacked Bar Charts, Heatmaps, Treemaps & Bubble Charts
◾ Built a Tableau story & shared via written report

About Me

👋 Hi, I'm Franco & welcome to my portfolio! Over the past few years, I’ve become passionate about data and the way it can transform decision-making. Coming from a background in accounting and finance, I’ve seen firsthand how turning numbers into insights can create real business impact.I’m proficient in analyzing data with:Excel | Tableau | SQL | Python | RWhat excites me most is the intersection of accounting and data. I believe numbers aren’t just records on a ledger or rows in a database — they tell stories about how an organization operates, where it’s thriving, and where it can improve. By combining my accounting expertise with data-driven analysis, I aim to uncover insights that go beyond reporting: finding efficiencies in financial processes, identifying opportunities for growth, and helping teams make smarter, more confident decisions.This portfolio is a reflection of that vision — a space where finance meets analytics to create meaningful solutions. If you’d like to connect or discuss ideas, you can reach me at [email protected]

Previous Experience

💼 Staff Accountant & Financial Aid Manager
Snow Data Science · Internship · Oct 2023 - Jan 2024DeLaSalle High School · Nov 2024 – Present
◾ Manage the financial aid process, ensuring accurate application reviews and award distribution
◾ Maintain general ledger entries, reconciliations, and financial reporting support
◾ Provide families with clear communication regarding tuition and aid options📊 Accounts Receivable Specialist
Gurstel Law Firm · Mar 2024 – Oct 2024
◾ Reduced outstanding receivables through proactive client communication
◾ Reconciled accounts and produced detailed cash flow reports
◾ Resolved billing discrepancies and maintained accurate records📈 Bookkeeper / Accounting Specialist
U.S. Bank · Nov 2023 – Mar 2024 | Gabylo Accounting · Oct 2021 – Nov 2023
◾ Supported daily operations including reconciliations, accounts payable, and month-end close
◾ Processed invoices, payments, and bank transactions while supporting monthly reporting

XRP: From Banking Revolution to Courtroom Drama — And What Comes Next

Why This Project?

XRP is one of the most controversial and fascinating cryptocurrencies ever created. Between its ambition to revolutionize global payments and its ongoing legal battles, XRP’s journey tells us as much about the future of finance as it does about blockchain itself. That’s why this project dives into XRP’s price action, volatility, and performance — to cut through the noise and show the data behind the story.

Here’s Why You Should Keep Reading

This isn’t just another crypto price analysis. By the end of this article, you’ll understand how XRP rose to fame, how lawsuits nearly crushed it, and how the numbers reveal both scars and resilience. We’ll use data to connect the dots between XRP’s narrative and its performance — and we’ll finish with a look at what could happen next, with ETFs and a potential bull run on the horizon.

What & Where of Dataset

Before diving into XRP’s story and market behavior, it was crucial to gather and clean reliable data. For this project, I pulled daily XRP price data from Yahoo Finance, spanning from 2013 to 2025. This gave me over a decade of price history, including open, high, low, close, and trading volume.The data was then cleaned and structured in two convenient formats:- xrp → with “Date” as a column, making it easier for scatterplots, annotations, and table views.
- xrpi → with “Date” as the index, perfect for resampling, rolling averages, and time-based analysis.This dual setup made it possible to run flexible transformations and ensured every graph in this article is both accurate and insightful.

Setup & Data Cleaning Code

42% of Customers Are 36-50, Our Core Audience

Ripple’s Full Journey

Ripple Labs was founded in 2012 to bring efficiency to global payments. XRP functioned as the native token for liquidity on Ripple’s network, aiming to let financial institutions transfer value instantly and cheaply between currencies.

The 2017 Boom & 2018 Crash

As global crypto mania took hold, XRP surged—from mere cents to peaks above $3.30, capturing public imagination and institutional interest. Yet, after the peak, like many altcoins, XRP plummeted, losing over 90% of value during the 2018 crash and spending several years in relative obscurity.

The SEC Lawsuit & Its Core Arguments

In December 2020, the SEC filed suit against Ripple Labs, alleging that the company raised hundreds of millions by selling XRP as an unregistered security. The case hinged on whether XRP sales were investment contracts under the Howey Test.Ripple’s defense emphasized that programmatic sales (i.e. automatic or algorithmic distribution of XRP—sales conducted on exchanges) differ from direct institutional sales to investors. The argument: when XRP is exchanged on open markets (programmatically), it behaves like a commodity, not a security.

The 2023 Court Ruling: Not a Security, but with Conditions

In 2023, a federal judge ruled that XRP was not a security when sold “on public exchanges” (programmatic sales). However, XRP sold directly to institutional investors by Ripple was deemed potentially securities in some instances. In short:- Programmatic exchange sales → not securities
- Institutional deals / direct placements → might require registrationThis nuance was a win for XRP’s tradability: it restored confidence in secondary markets, pushed relistings on U.S. exchanges, and made broader participation viable again without full SEC registration.

After the Verdict: Reaction & Market Response

Following the 2023 ruling, XRP’s price rebounded, influenced by resumed exchange listings and renewed investor interest. Still, the ruling left ambiguity around institutional issuances and compliance. Ripple continued settlement expansions, partnership announcements, and evolving legal strategies to clarify its regulatory footing.

What This Means Today

- XRP now trades freely on many U.S. platforms, giving it renewed liquidity.
- The “not a security for programmatic sales” path offers a regulatory pathway without requiring complete capitulation.
- Future risks remain in how Ripple handles direct sales or institutional issuance.
- Many see this ruling as foundational for crypto ETF expansion: if XRP is recognized as tradable in its primary market of exchanges, ETFs or index funds may be able to include it.

All Analysis: Telling the XRP Story Through Data

Ripple’s journey isn’t just told in headlines—it’s etched into its price chart. Each rise and fall mirrors moments of hype, adoption, legal battles, and renewed optimism. By visualizing XRP through different lenses, we can better understand how narratives shaped its trajectory.

XRP’s Price History — From Boom to Lawsuit Fallout

The raw closing price chart shows XRP’s evolution: explosive growth in 2017, the devastating crash of 2018, and the long sideways grind that followed. Notice the sharp resurgence in 2021, coinciding with renewed adoption news—and the chilling effect of the SEC lawsuit in late 2020.

Households earning $80k+ spend nearly 4x more than those under $60k.

Key Insight: XRP peaked near $3.80 in early 2018 before collapsing >90%. The SEC lawsuit (late-2020) capped rallies until 2021; momentum only returned with partial legal clarity and exchange relistings leading to the late 2024 rally.

Trends Behind the Noise — The 30-Day Moving Average

A 30-day MA smooths noise so you can see trend regimes—sustained uptrends in 2017 and early 2021; long consolidations in 2019–2020 and during the lawsuit window.

On average, households without kids spend 362% more per order than households with kids ($846 vs $183)

Key Insight: The 30-day MA reclaimed $1 in Apr-2021 as positive legal news and exchange access boosted confidence. When price sits above the MA, trends persist more often.

How Volatile Is XRP Really? — 30-Day Rolling Volatility

Volatility is crypto’s double-edged sword. Annualized 30-day realized volatility spikes around euphoric runs and legal headlines.

Key Insight: Volatility hit >250% in the 2017 blow-off and ~200% in 2021. Lawsuit headlines raised risk even when price stalled—uncertainty alone moved the risk needle.

Drawdowns: Measuring the Pain

Drawdowns show how far price sits below its prior peak—an honest view of risk lived by holders.

Non-discount users deliver the highest revenue per customer

Key Insight: Post-2018 drawdown reached ~−93%. During the SEC fight, XRP suffered another ~−70% trough before relief rallies—clear evidence that regulatory fog amplifies risk.

Since-2020 Cumulative Returns — XRP vs BTC & ETH

Context matters. Benchmarking XRP against BTC/ETH shows how the lawsuit period weighed on performance.

Key Insight: Since Jan-2020, BTC/ETH delivered ~8–10× at peak cycles, while XRP lagged around ~1.5–2× on average—lawsuit overhang mattered. When clarity improved, XRP’s snap-back outpaced peers over short windows

Volume vs Price: 60-Day Z-Scores (Anomaly Detector)

Standardizing price and volume highlights unusual activity—often the spark before sentiment shifts.

Key Insight: In late-2020, volume > +5σ while price lagged—exchanges paused, but traders piled in. Similar volume-first spikes preceded multiple mini-rallies; this is a useful leading sentiment indicator.

Seasonality: Monthly Returns Heatmap

Crypto isn’t Wall Street, but XRP does show rhythm—some months have historically delivered outsized returns

Key Insight: April and November stand out with frequent double-digit average gains, while June–July skew weaker—timing can add edge when paired with news flow.

Re-emphasize Main Takeaways

XRP’s journey is not just about price action. It’s a case study in resilience, regulation, and adoption. Despite lawsuits, market crashes, and years of uncertainty, XRP remains one of the top traded digital assets with billions in volume. The recent ruling confirming that programmatic sales of XRP are not securities has set a precedent for crypto regulation in the U.S., giving both investors and institutions more clarity.The data confirms it:- Cycles of volatility align with lawsuits, rulings, and macroeconomic shocks.
- Drawdowns show the pain of being early in crypto, but also the strength of recovery.
- Cumulative returns prove XRP has remained competitive with BTC and ETH since 2020.
- Monthly heatmaps highlight patterns of seasonality and investor behavior.The key message: XRP is a survivor — and its future will likely depend not just on speculation, but on whether banks, institutions, and ETFs accelerate its adoption.

Call To Action

This project is more than charts — it’s about understanding how data tells the story of a technology fighting for legitimacy.👉 What do you think:- Will ETFs push XRP into the mainstream the same way Bitcoin and Ethereum did?
- Do you see XRP as a long-term institutional asset, or just another speculative play?

Linkedin Article

Back

A Day in the Plant: What 24 Hours of Flotation Data Reveal About Iron Quality

Why This Project?

I’m working through Avery Smith’s Data Career Jumpstart bootcamp, and this project came from one of my favorite lessons. The challenge was:

Pick one day of data, pare it down to just the important variables, and tell a clear story.

This wasn’t random — the dataset comes from a flotation plant in mining, and the lesson had us:1.Check the full date range so we know what we’re working with.2. Filter down to one specific day (June 1, 2017, flagged as “something weird happened”).3. Focus only on key process variables:I loved this assignment because it forced me to scope aggressively, keep the analysis tight, and practice telling the story of a single day in data — exactly what process engineers or ops leaders need when they’re reviewing production issues.

Here’s Why You Should Keep Reading

Most dashboards just show you a line chart and call it a day. This post is different.You’ll see how to:- Filter raw process data to a single day and keep only the important variables.- Turn 4,320 rows into a clear hourly snapshot that anyone can scan.- Plot a smooth but honest trend with min/max callouts and a day mean reference.- Check relationships visually with a pairplot and a correlation heatmap.And you’ll get copy-paste-ready Python code for every step — so you can replicate this workflow on your own data tomorrow.

What & Where of Dataset + Basic Exploration

Before writing a single line of code, it’s crucial to know what we’re working with.📊 Dataset Info: This dataset is real-world mining process data taken from March to September 2017, published on Kaggle. It contains 737,454 rows and 24 columns, with readings taken as frequently as every 20 seconds for some variables and once per hour for others.Why it matters: This is the kind of messy, detailed data engineers deal with every day — perfect for practicing time-series analysis and feature selection.

Loading and Preparing the Dataset

We first import the necessary Python libraries (pandas, seaborn, matplotlib) and then load the CSV file. Converting the date column to a datetime object is critical because it allows us to filter, resample, and visualize by time later.

Exploring the Data Range

Before filtering for June 1, we check the earliest and latest timestamps to make sure we understand the full range of available data.

✅ Key Takeaway: We confirmed that the dataset spans March–September 2017, making it safe to zoom in on June 1, 2017 to investigate what happened that day.

Columns We Care About

From the original 24 columns, we focus on just the most relevant process variables — the ones our engineer peers told us have the biggest impact on product quality:- date – Timestamp of the reading, needed for filtering, resampling, and trend analysis- % Iron Concentrate – Final output quality metric, our primary target variable (higher = better)- % Silica Concentrate – Impurity metric, lower is better and directly affects product quality- Ore Pulp pH – Process condition, pH levels can make or break flotation efficiency- Flotation Column 05 Level – Key equipment parameter indicating process stability✅ Key Takeaway: Narrowing down to these columns makes the analysis more focused, interpretable, and avoids unnecessary noise.

Telling the Story of June 1

Instead of leaving these as a static table, we created a single timeline visualization that overlays all four critical variables. This lets us see, at a glance, how fluctuations in pH and column level moved with iron and silica — making the day’s story far easier to read than scanning rows.

Latin America's share of global IDA flows has steadily declined despite absolute growth, indicating rising competition for development resources.

✅ Key Takeaway: This plot shows that pH dips and column level drops correlate with late-day iron losses, supporting the hypothesis of process instability between 17:00–23:00.

What the Data Reveals: Key Insights from June 1

Instead of just dumping code, this section answers business questions, uses visuals with subtitles, and ends each insight with a clear takeaway for the plant team.

Insight 1 — A Clear Picture of the Day

Business Question: What exactly happened on June 1 across the critical process variables? We filtered the dataset to just the 24 hours of June 1 and focused on the four most important process variables our engineer peers identified.

What We See:- pH dips below 9.6 at 17:00 and rises above 10.1 twice.- Flotation Column 05 Level falls below 480 at 18:00 — a potential equipment/feed issue.Why It Matters: This table is a “shift report” for the process. Operators can see when instability occurred, saving troubleshooting time.✅ Key Takeaway: Out-of-range pH and low column level late in the day point to process instability that likely caused lower iron recovery near midnight.

Insight 2 — Seeing Relationships at a Glance

Business Question: How do our key variables interact?

What We See:- Negative correlation between Iron and Silica — higher iron means lower impurities.- Slight positive trend between Iron and both pH and column level.✅ Key Takeaway: Focus on pH and column level control — even small drifts affect iron recovery.

Insight 3 — The Afternoon Spike & Midnight Crash

Business Question: Did iron concentrate behave unusually on this day?

What We See:- Peak at 66.39% (15:00) well above average.- Lowest at 63.00% (23:00) — sharp drop late in the day.✅ Key Takeaway: The midnight crash is real — process logs from 21:00–23:00 should be reviewed to find root cause.

Insight 4 — Big-Picture Perspective

Business Question: Was June 1 just a one-off, or part of a bigger trend?Rather than just looking at a histogram of all data, we used monthly box plots to compare iron concentrate performance over time. Box plots make it easy to see the median, spread, and outliers for each month, giving more context to June 1’s unusual behavior.

What We See:- June’s median (≈ 65%) is slightly above May’s and July’s, showing it was generally a strong month.- June has the widest spread, meaning more volatility in quality compared to other months.- June 1’s midnight crash sits near the lowest whisker, confirming it was an outlier event rather than typical behavior.✅ Key Takeaway: June wasn’t a systemic failure — but its wider spread is a warning sign that process stability was slipping. Plant teams should watch pH and column levels closely in June to avoid repeat late-day crashes.

Insight 5 — Confirming with Correlations

Business Question: Which variables are most tied to quality?

What We See:- Iron vs. Silica: -0.27 (negative, expected).- Iron vs. pH: +0.30 (strongest positive link).✅ Key Takeaway: Keep pH steady — its correlation with iron means any drift will hurt recovery and increase impurity.

Main Takeaways

What We Learned in 24 Hours of Flotation Data:- Iron concentrate crashed late in the day — pointing to a process stability problem between 21:00–23:00 that must be reviewed.- pH control is critical — it shows the strongest positive correlation to iron recovery. Deviations below 9.6 or above 10.1 are early warning signals.- Flotation Column Level matters — a drop below 480 at 18:00 is a key clue to operational issues.- Silica moves opposite to iron — confirming that reducing silica content directly improves quality.- Big picture: June 1 is not catastrophic, but it’s a leading indicator that process control should be tightened to avoid future quality losses.✅ Key Business Value: Acting on these insights means the plant can:- Catch issues before production losses occur- Optimize reagent usage and energy consumption- Consistently hit iron quality targets (avoiding reprocessing costs)

Conclusion — My Experience With This Project

This is, without a doubt, the best project I’ve done so far.For this challenge, I really pushed myself:- Went beyond the scope of the bootcamp exercise, figuring out extra code on my own- Combined what I’ve learned in Data Career Jumpstart with lessons from my IBM Data Science Career Certificate- Focused on writing clear, commented code so that even readers new to Python can follow alongAnd honestly? I’m finishing this project a little past 2 AM on a Friday, but I can’t express how rewarding it is to step back and see everything come together — clean analysis, polished visuals, and actionable insights that could actually help a real plant improve.

🚀 The Ball’s in Your Court

Your Turn: This project uses real Kaggle data and reproducible Python code — so try it yourself!- Run the Code: Fork the dataset and replicate the analysis to see if you get similar results.- Test Another Day: Pick a different date (e.g., July 15, 2017) and compare its stability to June 1.- Go Deeper: Add more variables (like amina or starch flow) and measure their impact on iron recovery.Your Insights Matter: If you were the process engineer, what would you investigate next? Would you focus on pH control, flotation levels, or another parameter first?📢 Let’s Connect: Find me on LinkedIn — I’d love to chat about data science, mining, and turning raw data into insights that move the needle.

Back

From Onboarding to Exit: Using Data to Predict Attrition and Pay at IBM

Why THIS Project?

Employee turnover is one of the most expensive problems HR teams face. Losing talent means recruiting, onboarding, and training replacements — costing companies up to 150% of an employee’s annual salary per departure.This project started with a simple question: What if we could predict which employees are most likely to leave? If we could do that, IBM could proactively address risk factors, retain top performers, and save millions in replacement costs.

Why You Should Keep Reading

This analysis doesn’t just crunch numbers — it uncovers real patterns about which employees are most at risk of leaving and how salary progression ties into tenure. The insights here could help any HR department shift from being reactive to strategic partners in retention.

What & Where of Dataset

For this analysis, I used the IBM HR Analytics Attrition Dataset, a widely used HR dataset with 1,470 rows and 35 columns. Each row represents an employee and includes features such as:- Demographics: Age, Gender, Marital Status
- Employment Data: Department, Job Role, Job Level, Total Working Years, Years at Company
- Compensation: Monthly Income, Daily/Hourly Rate, Stock Option Level
- Performance & Engagement: Training Times Last Year, Job Satisfaction, Work-Life Balance
-Target Variable: Attrition (Yes/No) — the key outcome we aim to predict and understandThis dataset is perfect for modeling turnover risk and exploring compensation fairness because it captures both personal and job-related factors.

Dataset

Loading and Preparing the Data

SQL query aggregating yearly disbursements, repayments, and outstanding balances for Latin America (2011-2025).

Data Cleaning & Preparation Steps:

1). Converted Categorical Variables: Turned Attrition into a factor with levels No/Yes to allow grouped summaries.
2). Created Age Groups: Bucketed employees into age ranges (18–25, 26–35, etc.) for easier cohort analysis.3). Removed Missing Values: Dropped rows with missing YearsAtCompany or MonthlyIncome to avoid bias in attrition and pay calculations.4). Checked for Duplicates: Confirmed there were no duplicate employee records.5). Standardized Data Types: Converted all numeric columns to numeric and ensured consistent formatting (e.g., no stray characters in text fields).These steps guarantee clean, reliable input data — making every insight credible and reproducibl

Understanding Key Relationships in HR Data

We began by exploring how different numerical variables relate to each other, to see which factors might influence pay and attrition the most. Calculating the pairwise correlations helped uncover which metrics move together and where HR leaders should focus to drive retention and engagement.

Correlation Matrix – Code

We started by calculating pairwise correlations among all key numerical variables in the dataset, excluding categorical columns such as Education. This step allows us to quantify how tightly variables such as Age, Monthly Income, and Total Working Years move together. The code below computes the correlation matrix:

Bolivia, Honduras, and Nicaragua lead in IDA disbursements, reflecting their deep reliance on long-term concessional financing.

Correlation Matrix – Table

This table shows the pairwise correlation values between core HR metrics. The most striking result is the strong link between Monthly Income and Total Working Years (0.77), followed by Age and Total Working Years (0.68). Number of Companies Worked has a moderate positive correlation with Age (0.30), while Distance from Home shows negligible correlation with any other metric.

SQL query extracting yearly disbursements for the region's top 5 borrowers.

Business Insight: salary progression is clearly tied to tenure rather than age alone. This finding supports experience-based pay structures — HR could implement milestone bonuses or tenure-based raises to incentivize retention and reward loyalty.

Correlation Heatmap

To make the relationships more interpretable, I visualized the same matrix as a heatmap with darker shades indicating stronger correlations.

Bolivia, Honduras, and Nicaragua remain consistent leaders, with Haiti's rapid rise post-2015 highlighting a shift in aid dynamics.

SQL query calculating Latin America's share of global IDA disbursements and repayments over time.

Business Insight: The heatmap visually confirms that tenure (TotalWorkingYears) is the single biggest driver of income. This can help HR set fair salary benchmarks and plan promotions. Features with near-zero correlation (e.g., DailyRate, TrainingTimesLastYear) are likely not useful predictors for attrition models and could be deprioritized.

Pairplot – Key Variables

To explore relationships more deeply, I built a pairplot of MonthlyIncome, Age, and TotalWorkingYears, color-coded by Attrition (Yes/No).

Business Insight:- Employees with lower MonthlyIncome and fewer TotalWorkingYears show slightly higher attrition density (red areas on the left side).- This suggests that compensation and career stage are major retention drivers — entry-level employees may need additional engagement strategies.

T-Test on Employee Number

To check whether employee numbers (a proxy for hire sequence) influence attrition, I performed a Welch Two-Sample t-test.

Business Insight: Attrition does not depend on when an employee was hired — newer employees are not quitting at a significantly higher rate than older cohorts. Retention issues are not a “recent hire” problem but systemic.

Boxplot of Employee Number

I visualized employee number distributions with a boxplot to double-check the t-test result.

Business Insight: The visualization confirms no meaningful difference in attrition by employee number. HR can focus resources on role-specific or tenure-based retention efforts rather than targeting new vs. old hires.

Regression Models

Next, I modeled MonthlyIncome first with Age only, then with Age + TotalWorkingYears.

Business Insight:- Age alone explains only 24% of income variance (R² = 0.2479).- Adding TotalWorkingYears dramatically improves explanatory power (R² = 0.5983). This shows that experience, not age, drives pay progression — crucial for HR to design fair compensation plans based on merit/experience rather than age seniority.

Regression Line – Age vs. Monthly Income

To visualize the linear model, I plotted Age against MonthlyIncome with a fitted regression line.

Business Insight: The positive slope reinforces that pay increases with age — but combined with the earlier result, HR should consider experience-based pay models to ensure younger employees with strong experience aren’t underpaid.

Attrition by Age Group

I binned Age into groups and calculated attrition rates per group.

Business Insight: Attrition is highest among employees aged 18–25, suggesting that early-career employees are the most likely to leave. This could be due to factors such as seeking better opportunities, continuing education, or career exploration outside IBM. HR should focus on early-career retention strategies — for example, mentorship programs, career development paths, and clearer progression opportunities — to keep young talent engaged and reduce turnover in this critical group.

Attrition by Years at Company

Finally, I calculated attrition rate by each year of tenure.

Business Insight:- The sharp spike in Year 1 attrition likely reflects onboarding challenges or mismatched expectations for new hires. HR should analyze exit interviews from first-year employees and refine the onboarding process to improve early retention.The smaller bump at 5–6 years may signal a career plateau where employees feel stuck. Introducing structured career progression programs, lateral move opportunities, or leadership tracks around this milestone could reduce mid-tenure turnover.

Conclusion & Final Thoughts

This project demonstrates how data-driven HR analytics can move beyond simple reporting to uncover real drivers of employee turnover and pay equity. By combining data cleaning, feature engineering, correlation analysis, hypothesis testing, and regression modeling, I translated raw HR data into strategic insights HR leaders can act on immediately.

Main Analytical Takeaways

- Experience predicts pay better than age — salary structures should be aligned with skill and tenure rather than time in the company.- Mid-career employees (30–40 yrs) are most at risk of leaving — targeted retention programs and career progression plans could have a strong ROI.- Early exits may be an onboarding issue — reviewing training programs and early management practices could reduce churn.- Data-driven decision making works — even a relatively small HR dataset can highlight key intervention points to save costs and improve culture.

Brief Project Summary

This project was both challenging and incredibly rewarding. Starting with a dataset of 1,470 employees, I transformed raw HR records into a powerful narrative about IBM’s workforce. Using R, I carefully cleaned and prepared the data, explored key variables visually, tested hypotheses statistically, and built predictive models to explain salary progression.The process pushed me to think critically about what truly drives employee attrition and pay — and the result was a set of clear, actionable recommendations that HR leaders could use to improve retention, optimize compensation structures, and strengthen employee engagement. It reminded me why I love working with data: turning complex problems into insights that make a real business impact.

Linkedin Article

Back

Did you know 1 in 9 hospital stays ends in a readmission within 30 days ? I used SQL to dig into why

Why THIS Project?

I chose this project because healthcare is one of the most data-rich industries and has a direct impact on people’s lives. Hospitals and clinics often face inefficiencies in how they allocate resources and measure patient outcomes. By applying SQL to healthcare data, I wanted to show how even simple queries can surface powerful insights that can make a real difference in decision-making.

Here’s Why You Should Keep Reading

This project demonstrates how structured SQL analysis can uncover clear, actionable insights from messy healthcare data. In just a few queries, I was able to highlight patterns in readmission rates, treatment costs, and seasonal spikes in visits.If you’re curious how raw numbers can be transformed into business intelligence that improves care and reduces costs, keep reading.

What & Where of Dataset

This project uses a publicly available dataset from Kaggle, covering 10 years of hospital encounters (1999–2008) across 130 U.S. hospitals.It contains over 100,000 anonymized records of patients with diabetes, capturing:Patient demographics
Lab results
Medications
Length of stay
OutcomesThe main goal of this dataset is to support analysis around early hospital readmission (within 30 days of discharge), which is a critical issue in diabetes care. With its depth and scale, the data provides valuable opportunities to uncover patterns in care delivery and patient outcomes.For this project, I used SQL for querying and aggregating the data, with results visualized and explored further in Tableau/Excel.

1. Readmission Rate Overall

How common are hospital readmissions?Out of 101,766 encounters, 11.2% of patients were readmitted within 30 days, 34.9% after 30 days, and 53.9% had no readmission.Meaning: This shows that while most patients don’t return, early readmission (within 30 days) still affects 1 in 9 hospital stays, making it a meaningful issue for hospitals to monitor.

2. Average Length of Stay by Readmission Status

Are longer hospital stays tied to higher readmission rates?Patients readmitted within 30 days stayed an average of 4.8 days, compared to 4.3 days for those not readmitted.Meaning: Longer stays often signal more complex cases — and these patients are more likely to return sooner, pointing to a need for closer follow-up when patients are discharged after extended stays.

3. Medication Count vs Readmission Rate

Does the number of medications affect readmission rates?Patients on 11+ medications had a readmission rate of 11.9%, compared to 7.5% for those with fewer than 5.Meaning: More medications are associated with higher risk, suggesting polypharmacy may complicate recovery. Reviewing prescriptions and simplifying treatment where possible could help reduce this risk.

4. High-Risk Groups by Demographics & Diagnosis

Do some patient groups face higher readmission risk?Yes — certain combinations of demographics and diagnoses had early readmission rates near 30%, almost triple the baseline. Example: [40–50] Female African American patients with diagnosis 250.6.Meaning: Risk is not evenly distributed. Some demographic + diagnosis groups show much higher rates, suggesting interventions may need to be tailored to specific patient groups rather than applied broadly.

5. Medications vs Length of Stay

Which matters more — number of medications or length of stay?Both patterns show higher risk on their own. Patients with longer stays (7+ days) had an early readmission rate of 13.2%, and patients on 11+ medications had a rate of 11.9%.Meaning: Clinical complexity (longer stays) and treatment burden (more meds) both increase risk. Tackling them together — for example, follow-up after long stays plus med reviews for high counts — could have the biggest impact.

What the Data Tells Us

What the Data Tells Us- Early readmissions (~11%) are a meaningful issue. While most patients don’t return, the subset who do within 30 days represent a critical focus.
- Complexity increases risk. Longer stays and more medications are both associated with higher readmission rates.
- Risk isn’t evenly distributed. Certain demographic + diagnosis groups show nearly triple the baseline risk.
- Multiple drivers stack together. Both treatment burden and hospitalization burden raise readmission risk.Practical Takeaways:- Flag discharges after 7+ days for a follow-up call within 48 hours.
- Run a medication review when prescriptions exceed 10 drugs.

What Do You Think?

This project was a great exercise in combining SQL with healthcare data to surface meaningful insights about readmissions.I’d love to hear from others working with healthcare or operations data:What other factors would you explore to understand patient readmissions?
Have you seen different approaches hospitals take to reduce early returns?If you’re also exploring SQL in healthcare (or any industry), I’d love to connect and swap ideas.Here's the link to the LinkedIn Article:

Article

Back

Beyond the Box Score: A Tableau Story of NBA Team Chemistry

Basketball has always been about more than just points scored. It’s about chemistry, balance, and the unique mix of skills that each player brings to the floor. In this project, I used Tableau to dig into an NBA dataset and explore how different positions, age groups, and player contributions shape team performance.📊 Data becomes much more engaging when it’s visual. To bring this project to life, I designed an interactive Tableau Story with charts that highlight how positions, age groups, and player contributions shape the flow of NBA teams.

Tableau Story

Why This Project?

There are endless directions for sports analytics — from advanced shot charts to win probability models. I wanted to focus on something different: team chemistry.This project stood out to me because basketball isn’t just about individual stats. It’s about how positions complement each other, how age affects contribution, and how playmaking is distributed. By framing the analysis this way, I could highlight the bigger picture of how teams actually function on the court.

Here’s Why You Should Keep Reading

It’s easy to get lost in numbers when analyzing sports. What makes this project different is the story behind the stats.By breaking down scoring, playmaking, and shooting by position, age group, and team structure, this project reveals insights that go beyond the box score:- Which positions drive playmaking across the league.
- How veteran-heavy vs. youth-heavy teams differ in scoring.
- Where outliers like Nikola Jokić reshape positional expectations.If you’re interested in seeing how data can uncover the hidden patterns of teamwork, this story is for you.

What & Where of Dataset

For this project, I used an NBA player statistics dataset that includes points, assists, rebounds, age, and position for every player in the league.The dataset provided the foundation to explore questions like:- How do positions differ in their contributions?
- What’s the impact of age groups on scoring?
- Which players stand out as outliers compared to their positional norms?By structuring the data in Tableau, I could build comparisons across teams and players, and highlight trends that aren’t visible in traditional box scores.

Dataset

Analysis 1: Who Really Stretches the Floor?

Question: Which positions are actually contributing from beyond the arc — and where do teams rely most for spacing?Answer with data:- Guards unsurprisingly carry most of the outside shooting load.
- Forwards are more inconsistent — some teams use stretch-forwards effectively, while others rely on traditional inside roles.
- Centers stand out as clear outliers: most offer almost no shooting threat, but unique players like Nikola Jokić redefine what’s possible at the position.Takeaway: Teams with shooting bigs can bend defenses in ways others can’t, while non-shooting centers force their teams to depend heavily on guards and wings for spacing.

Analysis 2: Do Teams Rely on One Star, or Shared Scoring?

Question: Are NBA teams built around one dominant scorer, or do they spread the load across multiple players?Answer with data:- Some rosters lean on a single high-usage star (like Luka Dončić in Dallas or Trae Young in Atlanta), where one player drives most of the offense.
- Other teams distribute points more evenly across two or three core scorers — offering more balance and less vulnerability if one player is shut down.
- Role players’ contributions become especially important on deep playoff teams, where a “next man up” mentality can swing games.Takeaway: Teams with balanced scoring cores tend to be harder to game-plan against, while star-driven squads rely on individual brilliance to stay competitive.

Analysis 3: Who Impacts the Game Across Multiple Dimensions?

Question: Which players contribute the most all-around value — not just scoring, but also playmaking and rebounding?Answer with data:- Nikola Jokić is the clearest outlier, combining elite scoring, rebounding, and assists at a level unmatched by any other center.
- Guards like Chris Paul and Trae Young stand out for their high assists but lower rebounding, showing their playmaking-first roles.
- Forwards like Giannis Antetokounmpo blend scoring and rebounding, but contribute less as playmakers compared to elite guards and Jokic.Takeaway: Different positions create value in different ways — but rare all-around players (like Jokic) completely redefine positional expectations.

Analysis 4: Do Younger or Older Players Drive Team Scoring?

Question: Are teams relying more on young talent, players in their prime, or seasoned veterans to carry the scoring load?Answer with data:- Under-25 players drive scoring for developing teams, where young cores are being groomed as future stars.
- 25–29 (prime years) represent the largest share of scoring across the league, showing the sweet spot where athleticism and experience peak.
- 30+ veterans still play critical roles, especially on playoff-caliber teams where experience matters, but they rarely dominate total scoring output.Takeaway: While youth movements are important, the NBA is still largely defined by players in their prime years — balancing explosive ability with basketball maturity.

Analysis 5: Which Positions Truly Drive Playmaking?

Question: Is playmaking still dominated by point guards, or are other positions stepping up as facilitators?Answer with data:- Point guards clearly lead in assists, reaffirming their role as primary ball-handlers and offensive organizers.
- Shooting guards and small forwards contribute secondary playmaking on many teams, often as initiators in modern offenses.
- Centers are usually low in assists — except for outliers like Nikola Jokić, who redefines what a playmaking big can be.Takeaway: While playmaking responsibility remains centered on point guards, the evolution of the game shows more shared creation across wings and even select bigs.

Big Picture Reflection

This project showed me how much more there is to basketball than what we see in a box score. By combining stats with visual storytelling in Tableau, I was able to explore team chemistry, positional roles, and age dynamics in a way that raw numbers alone can’t reveal.The biggest lessons:- Efficiency matters: Looking at weighted 3-point shooting revealed truths that averages alone would hide.
- Balance vs reliance: Some teams thrive on multiple scorers, while others lean heavily on one superstar.
- Evolving roles: Point guards still dominate assists, but outliers like Jokic prove positions are changing.

Call to Action

This was a fun project that pushed my Tableau skills and deepened my understanding of sports analytics. If you’re interested in how data visualization can uncover hidden patterns in sports, I’d love to connect and talk more.

Linkedin Article

Back