Answer To: Higher Nationals Assignment Brief – BTEC (RQF) Higher National Diploma in Business...
Pritam Kumar answered on Jul 10 2021
Introduction
In today's world, data is the new oil. Analysing data and taking business decisions based on the analysis is a common practice in almost every organisation. With an ever-changing technological advancement in every sector, various statistical tools help us find key insights from the data that is scientific and precise. These analyses are such that a manager can rely on such information and can take business decisions without much second thoughts. Some of the use cases (David R. Anderson, 2011) of statistics for businesses are Accounting, Finance, Marketing, Production, and Economics, to name a few.
Figure 1 is a snippet extracted from a published source (David R. Anderson, 2011) which is a data set with information on 25 mutual funds. We have five variables in this dataset.
Fund Type, Net Asset Value ($), 5-Year Average Return (%), Expense Ratio, and Morningstar Rank are the five variables in the data set. In terms of statistical evaluation, we see the business data is from the finance industry, with a typical case of a classification task. We have “fund type” as the dependent variable and rest of the variables as independent variables. For a statistical analysis of such classification problems, we often use logistic regression. In descriptive analysis (Nassaji, n.d.), the analysis gives us the idea about the distribution of data, outliers in the data, similarities and association among variables. Measures of frequency, central tendency (mean, median, and mode), dispersion or variation (range, standard deviation), and position are four key measures in descriptive analysis.Figure 1: Illustration of a finance data set
Exploratory data analysis (EDA) is a step forward after descriptive analysis. It is used to analyse the data sets on the main characteristics. Data visualization (charts, plots) methods are used in exploratory analysis to look for patterns in the data. This analysis helps in determining the ways to prepare/manipulate/transform the data to check our assumptions on the data. The process entails (Notre Dame of Maryland University, n.d.) “figuring out what to make of the data, establishing the questions you want to ask and how you’re going to frame them, and coming up with the best way to present and manipulate the data you have to draw out those important insights.”
Finally, in confirmatory data analysis (CDA), we evaluate the evidence (Notre Dame of Maryland University, n.d.) by challenging the assumptions (after exploratory analysis) about the data. Hypothesis test, regression analysis, and variance analysis are some of the CDA processes.
For the above data set (Figure 1), descriptive and exploratory data analyses can be finding the mean and standard deviation of the continuous variables such as Net Asset Value ($), 5-Year Average Return (%), and Expense Ratio. Boxplots can be utilized to detect if there are any outliers in each of these three variables. Similarly, we can count the frequency of categorical variables Fund Type and Morningstar Rank. After we had made some assumptions about the data, we can use hypothesis testing to confirm whether our assumptions about the data were correct or not. For example, we can check whether the mean of Net Asset Value ($) is equal to $28 or not (using a one-sample t-test).
Figure 2: A Feature Dataset Table (ResearchGate)
Figure 2 is a snippet extracted from a published source (MA Jayaram, 2016) which gives information about retinal images, a valuable information for sectors such as bioinformatics and digital image processing. Unlike Figure 1, this represents a classic regression problem. We have five variables: No. of Exudates, Area of Largest Span, Largest Spot Major & Minor Axes, and “yellowness.” The dependent variable here is yellowness. In this problem, a regression equation is formed between the dependent variable and the independent variables No. of Exudates, Area of Largest Span, Largest Spot Major & Minor Axes. Based on the parameter estimates for each of the independent variables, predictions are made. Like Figure 1, we can do various hypothesis tests also for confirming the assumptions that we made during our descriptive analysis and EDA steps.
About the Data Set
For our data analysis task, we will use a data set from Rdatasets. This dataset contains sales data (John Wiley and Sons, n.d.) on clothing. Figure 3 gives an overview of the data set.
Figure 3: Sales Data of Men's Fashion Stores
Our data set has 400 observations and 13 variables. These 13 variables are:
· tsales (annual sales in Dutch guilders)
· sales (sales per square meter)
· margin (gross-profit-margin)
· nown (number of owners (managers))
· nfull (number of full-timers)
· npart (number of part-timers)
· naux (number of helpers (temporary workers))
· hoursw (total number of hours worked)
· hourspw (number of hours worked per worker)
· inv1 (investment in shop-premises)
· inv2 (investment in automation)
· ssize (sales floor space of the store)
· start (year start of business)
As a garment production unit, it is very important to get track of the sales numbers on daily basis. In order to know better about the data, some statistical analyses are also important.
All the variables are continuous variables. The dependent variables are tsales and sales and others are independent variables.
Descriptive...