Saima Abbas

Data Analyst | Machine Learning Enthusiast | Data Storyteller

Applying Python to NHS Dataset

Overview:

As part of the LSE Data Analytics Online Career Accelerator (Course 2), I conducted an in-depth analysis of real NHS datasets to explore how the UK's healthcare system is responding to increased patient demand, digital engagement, and missed appointments. This project combined Python-based exploratory analysis, data visualisation, and public Twitter sentiment to derive actionable healthcare insights.
The analysis focused on:

  1. Utilisation trends across appointment types, modes, healthcare professionals and Time between Booking and Appointment

  2. Staff and capacity evaluation across regions

  3. Patient behaviour patterns, including missed appointments

  4. Public sentiment and trend tracking via Twitter hashtags

Based on the gathered insights, I was able to make the following recommendations:

Approach:

For this project Python was used for Data wrangling, Scraping and Visualisation (Pandas, Matplotlib, Seaborn, Beautiful Soup). APIs and webscraping were also used.
To answer the NHS's questions, I used Python with Pandas, Seaborn, and Matplotlib to clean, analyse, and visualise four datasets: actual_duration.csv, appointments_regional.csv, national_categories.xlsx, and tweets.csv. Key steps included:

Key Insights:

Appointment volume peaked in October 2021 and March 2022, aligning with seasonal health pressures (e.g., flu season). Despite high appointment volumes, average daily utilisation remained within the NHS guideline of 1.2 million per day, indicating that the NHS has adequate capacity at the national level.

Monthly Capacity Utilisation of Daily Appointments

General Practice dominates appointment volumes, especially in winter months. GPs and Other Practice Staff share the bulk of the workload.


Number of Patient Appointments per month by Service Settings

Face-to-face appointments dominated, but telephone consultations were also heavily used — suggesting a successful hybrid model. Home visits and video consultations remained low but stable, pointing to potential underuse or limited applicability.


Monthly Appointments Counts by Appointment Mode

Most appointments are booked same-day or within a week, highlighting the need for short-notice availability.


Time Between Booking and Appointment Over 11 Month Period

Missed appointments "Did Not Attend" (DNAs) are highest for bookings made 8–14 days in advance — these may benefit most from targeted reminders.


Missed Appointments by Time Between Booking and Appointment

Appointment mode affects attendance: Face-to-face appointments have the highest DNA (Did Not Attend) rate, while home visits and phone consultations see better follow-through.


Missed Appointments over months by Appointment Mode

Top Twitter hashtags included #Healthcare, #MedTwitter, and #DigitalHealth — showing strong engagement around innovation, education, and systemic critiques.


Most Frequently Used Hashtags After Removing Overrepresented Hashtags

General Practice consistently recorded the highest number of appointments across all months and quarters. Care-related encounters were the dominant context type, especially under national categories like General Consultation Acute and Routine. The top five locations based on record counts were found to be in and around London.


Number of Patient Appointments per Service Setting in 1st Quarter (June - August 2021)

For a complete picture, feel free to look at my report and complete Python code in the Jupyter Notebook on GitHub.

Report Full Report GitHub GitHub