One type of data that’s easier to find on the net is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, wind_speed, visibility, etc.
My goal in this Internship (of data analysis) is to transform the raw data into information and then convert it into knowledge. We will be performing some basic tasks to perform our analysis such as data cleaning ,data normalizing ,testing the hypothesis and finally inferring certain information.
Today our aim will be to perform Data Analysis on a Meteorological Dataset which you can find here.
Source URL: https://www.kaggle.com/muthuj7/weather-dataset
The Dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 to 2016–09–09 . It corresponds to Finland, a country in the Northern Europe.
“Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming” following is the Hypothesis for the analysis.
The Hypothesis means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So you are basically resampling your data from hourly to monthly, then comparing the same month over the 10 year period. Support your analysis by appropriate visualizations using matplotlib and / or seaborn library.
The Null Hypothesis H0 is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.”
The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.
1. Firstly we will import all the necessary libraries.
2. Next step is to load the downloaded dataset using pandas library.By using the head() function we will get the first five rows of data.
3. Check the datatypes of our dataset.
4. Analysing the datatypes of the columns present in our dataset to understand our dataset in a better manner.
5. Cleaning dataset.
6. Resample our Formatted Date into Monthly period of time.
For better visualization we need to change the datatype of the ‘Formatted Date’ column to datetime and set this column as a index in our dataset.
7. Plotting of Data.
Plot of variation in Apparent Temperature and Humidity with time.
> Firstly we will plot the whole dataset for all months.
8. plot graph for a specific month(April).
Plotting the variation in Apparent Temperature and Average Humidity of each month over the years.
10. Plot of Apparent Temperature and Humidity with time(in year).
11. Monthly analysis has to be done for all 12 months over the 10 year period.
From this analysis we infer that there are either sharp rise in temperatures or sharp falls over the 10 yrs. Hence we can conclude that Global Warming has caused a major difference and unreliability in temperature predictability also taking humidity into consideration we can say that it has almost remained same throughout the past years.