Covid-19 Data Analysis


Objectives
- To calculate the global numbers of Covid-19 cases, deaths and their percentage.
- To calculate the total death in each continent.
- To calculate the percentage of the population infected by country.
- To create a visual representation of all the calculations with the possibility to select each country on the map.
Data Cleansing and Transformation (SQL)
One data source (Covid-19 data) was supplied in Excel format and loaded into the SQL database.
The following tables were extracted using SQL to generate the necessary data model for analysis and fulfilling the criteria.
The SQL statements for cleansing and transforming the required data are provided below.
Data Exploration
-- Select rows from a Table or View '[TableOrViewName]' in schema '[dbo]'
SELECT * FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
ORDER by 3,4
-- Data Selection
SELECT Location, date, total_cases, new_cases, total_deaths, population
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
ORDER by 1,2
--Alter Data type
Alter table PortfolioProject..CovidDeaths alter column total_deaths float
Alter table PortfolioProject..CovidDeaths alter column total_cases float
-- Total Cases vs Total Deaths (Likelihood of dying if you contract covid in your country)
SELECT Location, date, total_cases, total_deaths, (total_deaths/total_cases)*100 AS DeathPercentage
FROM PortfolioProject..CovidDeaths
WHERE location LIKE '%united kingdom'
AND continent IS NOT NULL
ORDER by 1,2
-- Total Cases vs Population (Percentage of the popoulation that got covid)
SELECT Location, date, population, total_cases, (total_cases/population)*100 AS InfectedPercentage
FROM PortfolioProject..CovidDeaths
WHERE location LIKE '%united kingdom'
AND continent IS NOT NULL
ORDER by 1,2
-- Countries with Highest Infection rate vs Population
SELECT Location, population, MAX(total_cases) AS HighestInfectionCount, MAX((total_cases/population))*100 AS PercentPopulationInfected
FROM PortfolioProject..CovidDeaths
--WHERE location LIKE '%united kingdom'
WHERE continent IS NOT NULL
GROUP by Location, population
ORDER by PercentPopulationInfected DESC
-- Countries with Highest Death Count Per Population
SELECT Location, MAX(cast(total_deaths as int)) AS TotalDeathCount
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP by Location
ORDER by TotalDeathCount DESC
-- Continent with Highest Death Count Per Population
SELECT continent, MAX(cast(total_deaths as int)) AS TotalDeathCount
FROM PortfolioProject..CovidDeaths
WHERE continent IS NOT NULL
GROUP by continent
ORDER by TotalDeathCount DESC
-- World Numbers
SELECT SUM(new_cases) as total_cases, SUM(cast(new_deaths as int))as total_deaths,
SUM(cast(new_deaths as float))/SUM(new_cases)*100 as DeathPercentage
FROM PortfolioProject..CovidDeaths
-- WHERE location LIKE '%united kingdom'
WHERE continent IS NOT NULL
--GROUP by date
ORDER by 1,2
-- Total Population vs Vaccinations
SELECT death.date, death.continent, death.location, death.population, vacs.new_vaccinations
FROM PortfolioProject..CovidDeaths death
JOIN PortfolioProject..CovidVaccination vacs
ON death.location = vacs.location
AND death.date = vacs.date
WHERE death.continent IS NOT NULL
ORDER by 3,1
-- Rolling numbers of vaccinated people
WITH PopvsVac (continent, location, date, population, new_vaccinations, RollingVaccinatedPeople)
AS
(
SELECT death.date, death.continent, death.location, death.population, vacs.new_vaccinations
, SUM(CONVERT(FLOAT, vacs.new_vaccinations))
OVER (Partition by death.location ORDER BY death.location, death.date) as RollingVaccinatedPeople
FROM PortfolioProject..CovidDeaths death
JOIN PortfolioProject..CovidVaccination vacs
ON death.location = vacs.location
AND death.date = vacs.date
WHERE death.continent IS NOT NULL
-- ORDER by 3,1
)
SELECT * , (RollingVaccinatedPeople/Population)*100 AS PercentageRollingVacs
FROM PopvsVac
-- View for visualization
CREATE VIEW PercentPopulationVaccinated AS
WITH PopvsVac (continent, location, date, population, new_vaccinations, RollingVaccinatedPeople)
AS
(
SELECT death.date, death.continent, death.location, death.population, vacs.new_vaccinations
, SUM(CONVERT(FLOAT, vacs.new_vaccinations))
OVER (Partition by death.location ORDER BY death.location, death.date) as RollingVaccinatedPeople
FROM PortfolioProject..CovidDeaths death
JOIN PortfolioProject..CovidVaccination vacs
ON death.location = vacs.location
AND death.date = vacs.date
WHERE death.continent IS NOT NULL
-- ORDER by 3,1
)
SELECT * , (RollingVaccinatedPeople/Population)*100 AS PercentageRollingVacs
FROM PopvsVac
Table Structure for visualization (Tableau)
The SQL queries shown below were used to generate the various tables utilised in the visualisation.
/*
Queries used for Tableau Project
*/
-- 1.
SELECT Sum(new_cases) AS total_cases,
Sum(Cast(new_deaths AS INT)) AS total_deaths,
Sum(Cast(new_deaths AS INT)) / Sum(new_cases) * 100 AS DeathPercentage
FROM portfolioproject..coviddeaths
WHERE continent IS NOT NULL
ORDER BY 1,
2
-- 2.
SELECT location,
Sum(Cast(new_deaths AS INT)) AS TotalDeathCount
FROM portfolioproject..coviddeaths
WHERE continent IS NULL
AND location NOT IN ( 'World', 'European Union', 'International', 'High income', 'Lower middle income', 'Upper middle income', 'Low income')
GROUP BY location
ORDER BY totaldeathcount DESC
-- 3.
SELECT location,
population,
Max(total_cases) AS HighestInfectionCount,
Max(( total_cases / population )) * 100 AS PercentPopulationInfected
FROM portfolioproject..coviddeaths
GROUP BY location,
population
ORDER BY percentpopulationinfected DESC
-- 4.
SELECT location,
population,
date,
Max(total_cases) AS HighestInfectionCount,
Max(( total_cases / population )) * 100 AS PercentPopulationInfected
FROM portfolioproject..coviddeaths
GROUP BY location,
population,
date
ORDER BY percentpopulationinfected DESC
Covid-19 Analysis Dashboard
The finished dashboard visualises the global number of covid cases, total death and death percentage sorted and filtered by country and continent.