Wednesday, November 4, 2015

Continuing Research on U.S Gasoline and Crude Oil Prices, Part Two

Exploratory Analysis, cont'd.

In my previous post, I began to explore the relationship between the spot prices of West Texas Intermediate (WTI) crude oil and Brent crude oil with the average of the spot prices of conventional gasoline in the New York  and Gulf hubs.

To continue with my analysis, the question of whether or not there was a statistically significant difference between the dollar-per-gallon price of WTI and Brent crude oil should be addressed.  The first thing I did was create a boxplot of the two price variables:


You can see that the range of the prices are about the same, and the means appear to be only slightly different.  Just to be sure, however, I calculated the confidence interval between the two means using the assumption that the variances were unequal (which they are) and then testing the hypothesis that the difference between the mean prices is zero:

Lower C.I., 95% | Upper C.I., 95% 
   -0.008423387 |     0.280491840 


t-stat, 189.208df | comparative t-stat | comparative p-value
            1.973 |           1.857953 |          0.03236876

So, you can see that the confidence intervals of the difference of the means includes zero, ranging from just more and negative (-) $0.01 to $0.28.  The hypothesis test allows me to have confidence in my decision to not reject the null hypothesis -- which appear to not be statistically significantly different from one another.

However, we know that the two prices are viewed as significantly different in the market, so further exploration is necessary.  First is to check for randomness because it is a basic assumption in statistical modelling; and, if the data are not random, than any statistical tests comparing them may produce questionable results.  For this task, I looked at the autocorrelation and partial autocorrelation of the WTI and Brent crude oil spot prices and the averaged conventional gasoline spot price:


The top row of plots are the autocorrelation plots, which suggest that there is a moderate autocorrelation between the variables and their respective lagged values.  The data do not exhibit randomness and we can infer that the current month's price has a correlation with its own lagged values.

The second row of plots are the partial autocorrelation of the variables in this analysis.  This can be used to find the appropriate lag order for the modeling to be done in the next step of my analysis.  In this case, it appears that the first and second lags are statistically significant for all of the variables (wti, brent, and avgConvGas) and only WTI data has a lagged value beyond that, and that only the third lagged value.

So, it would appear that an appropriate course of action would be to use an autoregressive model, AR(2), for all three variables and an additional model for WTI modeled as an AR(3).

----------------------------------------------------------------------------------------

I'm going to conclude this entry here.  I will continue in my analysis in my next post, which will include moving on to actually modelling the data.

EDIT (05 Nov 2015): I uploaded my R script and dataset to my GitHub account so that it is available for everyone.  I will be updating it as I go along, with the final submission sent when I conclude my work in this blog series.

No comments:

Post a Comment

Creative Commons LicenseJust A Data Geek Blog by Richard Ian Carpenter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.