Welcome to Covid19 Data Analysis Notebook¶

Let's Import the modules¶

import pandas as pd 
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt 
print('Modules are imported.')

Modules are imported.

Task 2¶

Task 2.1: importing covid19 dataset¶

importing "Covid19_Confirmed_dataset.csv" from "./Dataset" folder.

corona_dataset_csv = pd.read_csv("Datasets/covid19_Confirmed_dataset.csv")
corona_dataset_csv.head(5)

	Province/State	Country/Region	Lat	Long	...	4/21/20	4/22/20	4/23/20	4/24/20	4/25/20	4/26/20	4/27/20	4/28/20	4/29/20	4/30/20
0	NaN	Afghanistan	33.0000	65.0000	...	1092	1176	1279	1351	1463	1531	1703	1828	1939	2171
1	NaN	Albania	41.1533	20.1683	...	609	634	663	678	712	726	736	750	766	773
2	NaN	Algeria	28.0339	1.6596	...	2811	2910	3007	3127	3256	3382	3517	3649	3848	4006
3	NaN	Andorra	42.5063	1.5218	...	717	723	723	731	738	738	743	743	743	745
4	NaN	Angola	-11.2027	17.8739	...	24	25	25	25	25	26	27	27	27	27

5 rows × 104 columns

Let's check the shape of the dataframe¶

Task 2.2: Delete the useless columns¶

corona_dataset_csv.drop(["Lat", "Long"], axis=1, inplace=True)

	Province/State	Country/Region	1/26/20	1/27/20	1/28/20	1/29/20	...	4/21/20	4/22/20	4/23/20	4/24/20	4/25/20	4/26/20	4/27/20	4/28/20	4/29/20	4/30/20
0	NaN	Afghanistan	0	0	0	0	...	1092	1176	1279	1351	1463	1531	1703	1828	1939	2171
1	NaN	Albania	0	0	0	0	...	609	634	663	678	712	726	736	750	766	773
2	NaN	Algeria	0	0	0	0	...	2811	2910	3007	3127	3256	3382	3517	3649	3848	4006
3	NaN	Andorra	0	0	0	0	...	717	723	723	731	738	738	743	743	743	745
4	NaN	Angola	0	0	0	0	...	24	25	25	25	25	26	27	27	27	27
5	NaN	Antigua and Barbuda	0	0	0	0	...	23	24	24	24	24	24	24	24	24	24
6	NaN	Argentina	0	0	0	0	...	3031	3144	3435	3607	3780	3892	4003	4127	4285	4428
7	NaN	Armenia	0	0	0	0	...	1401	1473	1523	1596	1677	1746	1808	1867	1932	2066
8	Australian Capital Territory	Australia	0	0	0	0	...	104	104	104	105	106	106	106	106	106	106
9	New South Wales	Australia	3	4	4	4	...	2969	2971	2976	2982	2994	3002	3004	3016	3016	3025

10 rows × 102 columns

Task 2.3: Aggregating the rows by the country¶

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()

corona_dataset_aggregated.head()

	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	4/21/20	4/22/20	4/23/20	4/24/20	4/25/20	4/26/20	4/27/20	4/28/20	4/29/20	4/30/20
Country/Region
Afghanistan	0	0	0	0	0	0	0	0	0	0	...	1092	1176	1279	1351	1463	1531	1703	1828	1939	2171
Albania	0	0	0	0	0	0	0	0	0	0	...	609	634	663	678	712	726	736	750	766	773
Algeria	0	0	0	0	0	0	0	0	0	0	...	2811	2910	3007	3127	3256	3382	3517	3649	3848	4006
Andorra	0	0	0	0	0	0	0	0	0	0	...	717	723	723	731	738	738	743	743	743	745
Angola	0	0	0	0	0	0	0	0	0	0	...	24	25	25	25	25	26	27	27	27	27

5 rows × 100 columns

corona_dataset_aggregated.shape

(187, 100)

visualization always helps for better understanding of our data.

corona_dataset_aggregated.loc["India"]

1/22/20        0
1/23/20        0
1/24/20        0
1/25/20        0
1/26/20        0
           ...  
4/26/20    27890
4/27/20    29451
4/28/20    31324
4/29/20    33062
4/30/20    34863
Name: India, Length: 100, dtype: int64

Task3: Calculating a good measure¶

we need to find a good measure reperestend as a number, describing the spread of the virus in a country.

corona_dataset_aggregated.loc['China'].plot()
corona_dataset_aggregated.loc["India"].plot()
corona_dataset_aggregated.loc["Spain"].plot()
plt.legend()

<matplotlib.legend.Legend at 0x17096d0>

corona_dataset_aggregated.loc["India"][:3].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x17457c0>

task 3.1: caculating the first derivative of the curve¶

corona_dataset_aggregated.loc["India"].diff().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x177d4c0>

task 3.2: find maxmimum infection rate for China¶

corona_dataset_aggregated.loc["India"].diff().max()

1893.0

Task 3.3: find maximum infection rate for all of the countries.¶

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []
for c in countries :
    max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())
corona_dataset_aggregated["max_infection rate"] = max_infection_rates
corona_dataset_aggregated.head()

	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	4/22/20	4/23/20	4/24/20	4/25/20	4/26/20	4/27/20	4/28/20	4/29/20	4/30/20	max_infection rate
Country/Region
Afghanistan	0	0	0	0	0	0	0	0	0	0	...	1176	1279	1351	1463	1531	1703	1828	1939	2171	232.0
Albania	0	0	0	0	0	0	0	0	0	0	...	634	663	678	712	726	736	750	766	773	34.0
Algeria	0	0	0	0	0	0	0	0	0	0	...	2910	3007	3127	3256	3382	3517	3649	3848	4006	199.0
Andorra	0	0	0	0	0	0	0	0	0	0	...	723	723	731	738	738	743	743	743	745	43.0
Angola	0	0	0	0	0	0	0	0	0	0	...	25	25	25	25	26	27	27	27	27	5.0

5 rows × 101 columns

Task 3.4: create a new dataframe with only needed column¶

corona_data = pd.DataFrame(corona_dataset_aggregated["max_infection rate"])

corona_data.head()

	max_infection rate
Country/Region
Afghanistan	232.0
Albania	34.0
Algeria	199.0
Andorra	43.0
Angola	5.0

Task4:¶

Importing the WorldHappinessReport.csv dataset
selecting needed columns for our analysis
join the datasets
calculate the correlations as the result of our analysis

Task 4.1 : importing the dataset¶

happiness_report_csv = pd.read_csv("Datasets/worldwide_happiness_report.csv")

happiness_report_csv.head()

	Overall rank	Country or region	Score	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption
0	1	Finland	7.769	1.340	1.587	0.986	0.596	0.153	0.393
1	2	Denmark	7.600	1.383	1.573	0.996	0.592	0.252	0.410
2	3	Norway	7.554	1.488	1.582	1.028	0.603	0.271	0.341
3	4	Iceland	7.494	1.380	1.624	1.026	0.591	0.354	0.118
4	5	Netherlands	7.488	1.396	1.522	0.999	0.557	0.322	0.298

Task 4.2: let's drop the useless columns¶

useless_cols = ["Overall rank", "Score", "Generosity", "Perceptions of corruption"]

#happiness_report_csv.drop(useless_cols, axis=1, inplace=True)
happiness_report_csv.head()

	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Country or region
Finland	1.340	1.587	0.986	0.596
Denmark	1.383	1.573	0.996	0.592
Norway	1.488	1.582	1.028	0.603
Iceland	1.380	1.624	1.026	0.591
Netherlands	1.396	1.522	0.999	0.557

Task 4.3: changing the indices of the dataframe¶

happiness_report_csv.set_index("Country or region", inplace =True)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-57-6a716b052f35> in <module>
----> 1 happiness_report_csv.set_index("Country or region", inplace =True)

c:\users\administrator\appdata\local\programs\python\python38-32\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4301 
   4302         if missing:
-> 4303             raise KeyError(f"None of {missing} are in the columns")
   4304 
   4305         if inplace:

KeyError: "None of ['Country or region'] are in the columns"

Task4.4: now let's join two dataset we have prepared¶

Corona Dataset :¶

corona_data.head(
)

	max_infection rate
Country/Region
Afghanistan	232.0
Albania	34.0
Algeria	199.0
Andorra	43.0
Angola	5.0

wolrd happiness report Dataset :¶

happiness_report_csv.shape

(156, 4)

data = corona_data.join(happiness_report_csv, how="inner")
data.head()

	max_infection rate	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Afghanistan	232.0	0.350	0.517	0.361	0.000
Albania	34.0	0.947	0.848	0.874	0.383
Algeria	199.0	1.002	1.160	0.785	0.086
Argentina	291.0	1.092	1.432	0.881	0.471
Armenia	134.0	0.850	1.055	0.815	0.283

Task 4.5: correlation matrix¶

data.corr()

	max_infection rate	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
max_infection rate	1.000000	0.250118	0.191958	0.289263	0.078196
GDP per capita	0.250118	1.000000	0.759468	0.863062	0.394603
Social support	0.191958	0.759468	1.000000	0.765286	0.456246
Healthy life expectancy	0.289263	0.863062	0.765286	1.000000	0.427892
Freedom to make life choices	0.078196	0.394603	0.456246	0.427892	1.000000

Task 5: Visualization of the results¶

our Analysis is not finished unless we visualize the results in terms figures and graphs so that everyone can understand what you get out of our analysis

data.head()

	max_infection rate	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Afghanistan	232.0	0.350	0.517	0.361	0.000
Albania	34.0	0.947	0.848	0.874	0.383
Algeria	199.0	1.002	1.160	0.785	0.086
Argentina	291.0	1.092	1.432	0.881	0.471
Armenia	134.0	0.850	1.055	0.815	0.283

Task 5.1: Plotting GDP vs maximum Infection rate¶

x = data["GDP per capita"]
y = data["max_infection rate"]

sns.scatterplot(x, np.log(y))

<matplotlib.axes._subplots.AxesSubplot at 0xfeaf910>

 sns.regplot(x, np.log(y))

<matplotlib.axes._subplots.AxesSubplot at 0xfd2b388>

Welcome to Covid19 Data Analysis Notebook¶

Let's Import the modules¶

Task 2¶

Task 2.1: importing covid19 dataset¶

Let's check the shape of the dataframe¶

Task 2.2: Delete the useless columns¶

Task 2.3: Aggregating the rows by the country¶

Task3: Calculating a good measure¶

task 3.1: caculating the first derivative of the curve¶

task 3.2: find maxmimum infection rate for China¶

Task 3.3: find maximum infection rate for all of the countries.¶

Task 3.4: create a new dataframe with only needed column¶

Task4:¶

Task 4.1 : importing the dataset¶

Task 4.2: let's drop the useless columns¶

Task 4.3: changing the indices of the dataframe¶

Task4.4: now let's join two dataset we have prepared¶

Corona Dataset :¶

wolrd happiness report Dataset :¶

Task 4.5: correlation matrix¶

Task 5: Visualization of the results¶

Task 5.1: Plotting GDP vs maximum Infection rate¶

Task 5.3: Plotting Healthy life expectancy vs maximum Infection rate¶

Task 5.4: Plotting Freedom to make life choices vs maximum Infection rate¶

Welcome to Covid19 Data Analysis Notebook¶

Let's Import the modules¶

Task 2¶

Task 2.1: importing covid19 dataset¶

Let's check the shape of the dataframe¶

Task 2.2: Delete the useless columns¶

Task 2.3: Aggregating the rows by the country¶

Task 2.4: Visualizing data related to a country for example China¶

Task3: Calculating a good measure¶

task 3.1: caculating the first derivative of the curve¶

task 3.2: find maxmimum infection rate for China¶

Task 3.3: find maximum infection rate for all of the countries.¶

Task 3.4: create a new dataframe with only needed column¶

Task4:¶

Task 4.1 : importing the dataset¶

Task 4.2: let's drop the useless columns¶

Task 4.3: changing the indices of the dataframe¶

Task4.4: now let's join two dataset we have prepared¶

Corona Dataset :¶

wolrd happiness report Dataset :¶

Task 4.5: correlation matrix¶

Task 5: Visualization of the results¶

Task 5.1: Plotting GDP vs maximum Infection rate¶

Task 5.2: Plotting Social support vs maximum Infection rate¶

Task 5.3: Plotting Healthy life expectancy vs maximum Infection rate¶

Task 5.4: Plotting Freedom to make life choices vs maximum Infection rate¶