Hi, Can you please let me know how much you will charge for this assignment.
INSY4325 LAB 4 (4 points) Data Analytics using Python Resources in Canvas: · Week 13 This is an open-ended lab. Using Python, run a linear regression analysis on data you have collected from public domain. Recommended packages: · scikit-learn · numpy · matplotlib · pandas Deliverables: 1. python code [.py file(s)] – 1.5 points 2. Explanation of work: 2.5 points Create an original how-to document with step by step instructions you have followed to create your program. Your document should be used as an adequate tutorial for someone to reproduce your work by following the steps/instructions. Python for Data Analysis Python Libraries for Data Science Many popular Python toolboxes/libraries: NumPy SciPy Pandas SciKit-Learn Visualization libraries matplotlib Seaborn and many more … 1 Python Libraries for Data Science NumPy: introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance many other python libraries are built on NumPy 2 Link: http://www.numpy.org/ Python Libraries for Data Science SciPy: collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more part of SciPy Stack built on NumPy 3 Link: https://www.scipy.org/scipylib/ Python Libraries for Data Science Pandas: adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R) provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc. allows handling missing data 4 Link: http://pandas.pydata.org/ Link: http://scikit-learn.org/ Python Libraries for Data Science SciKit-Learn: provides machine learning algorithms: classification, regression, clustering, model validation etc. built on NumPy, SciPy and matplotlib 5 matplotlib: python 2D plotting library which produces publication quality figures in a variety of hardcopy formats a set of functionalities similar to those of MATLAB line plots, scatter plots, barcharts, histograms, pie charts etc. relatively low-level; some effort needed to create advanced visualization Link: https://matplotlib.org/ Python Libraries for Data Science 6 Seaborn: based on matplotlib provides high level interface for drawing attractive statistical graphics Similar (in style) to the popular ggplot2 library in R Link: https://seaborn.pydata.org/ Python Libraries for Data Science 7 numpy Let’s start with NumPy. Among other things, NumPy contains: A powerful N-dimensional array object. Sophisticated (broadcasting/universal) functions. Tools for integrating C/C++ and Fortran code. Useful linear algebra, Fourier transform, and random number capabilities. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. numpy The key to NumPy is the ndarray object, an n-dimensional array of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences: NumPy arrays have a fixed size. Modifying the size means creating a new array. NumPy arrays must be of the same data type, but this can include Python objects. More efficient mathematical operations than built-in sequence types. Numpy datatypes To begin, NumPy supports a wider variety of data types than are built-in to the Python language by default. They are defined by the numpy.dtype class and include: intc (same as a C integer) and intp (used for indexing) int8, int16, int32, int64 uint8, uint16, uint32, uint64 float16, float32, float64 complex64, complex128 bool_, int_, float_, complex_ are shorthand for defaults. These can be used as functions to cast literals or sequence types, as well as arguments to numpy functions that accept the dtype keyword argument. Numpy datatypes Some examples: >>> import numpy as np >>> x = np.float32(1.0) >>> x 1.0 >>> y = np.int_([1,2,4]) >>> y array([1, 2, 4]) >>> z = np.arange(3, dtype=np.uint8) >>> z array([0, 1, 2], dtype=uint8) >>> z.dtype dtype('uint8') Numpy arrays There are a couple of mechanisms for creating arrays in NumPy: Conversion from other Python structures (e.g., lists, tuples). Built-in NumPy array creation (e.g., arange, ones, zeros, etc.). Reading arrays from disk, either from standard or custom formats (e.g. reading in from a CSV file). and others … Numpy arrays In general, any numerical data that is stored in an array-like container can be converted to an ndarray through use of the array() function. The most obvious examples are sequence types like lists and tuples. >>> x = np.array([2,3,1,0]) >>> x = np.array([2, 3, 1, 0]) >>> x = np.array([[1,2.0],[0,0],(1+1j,3.)]) >>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]]) Numpy arrays There are a couple of built-in NumPy functions which will create arrays from scratch. zeros(shape) -- creates an array filled with 0 values with the specified shape. The default dtype is float64. ones(shape) -- creates an array filled with 1 values. arange() -- creates arrays with regularly incrementing values. >>> np.zeros((2, 3)) array([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> np.arange(2, 10, dtype=np.float) array([ 2., 3., 4., 5., 6., 7., 8., 9.]) >>> np.arange(2, 3, 0.1) array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9]) Numpy arrays linspace() -- creates arrays with a specified number of elements, and spaced equally between the specified beginning and end values. random.random(shape) – creates arrays with random floats over the interval [0,1). >>> np.linspace(1., 4., 6) array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ]) >>> np.random.random((2,3)) array([[ 0.75688597, 0.41759916, 0.35007419], [ 0.77164187, 0.05869089, 0.98792864]]) Numpy arrays Printing an array can be done with the print statement. >>> import numpy as np >>> a = np.arange(3) >>> print a [0 1 2] >>> a array([0, 1, 2]) >>> b = np.arange(9).reshape(3,3) >>> print b [[0 1 2] [3 4 5] [6 7 8]] >>> c = np.arange(8).reshape(2,2,2) >>> print c [[[0 1] [2 3]] [[4 5] [6 7]]] indexing Single-dimension indexing is accomplished as usual. Multi-dimensional arrays support multi-dimensional indexing. >>> x = np.arange(10) >>> x[2] 2 >>> x[-2] 8 >>> x.shape = (2,5) # now x is 2-dimensional >>> x[1,3] 8 >>> x[1,-1] 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 indexing Using fewer dimensions to index will result in a subarray. This means that x[i, j] == x[i][j] but the second method is less efficient. >>> x[0] array([0, 1, 2, 3, 4]) indexing Slicing is possible just as it is for typical Python sequences. >>> x = np.arange(10) >>> x[2:5] array([2, 3, 4]) >>> x[:-7] array([0, 1, 2]) >>> x[1:7:2] array([1, 3, 5]) >>> y = np.arange(35).reshape(5,7) >>> y[1:5:2,::3] array([[ 7, 10, 13], [21, 24, 27]]) Array operations Basic operations apply element-wise. The result is a new array with the resultant elements. Operations like *= and += will modify the existing array. >>> a = np.arange(5) >>> b = np.arange(5) >>> a+b array([0, 2, 4, 6, 8]) >>> a-b array([0, 0, 0, 0, 0]) >>> a**2 array([ 0, 1, 4, 9, 16]) >>> a>3 array([False, False, False, False, True], dtype=bool) >>> 10*np.sin(a) array([ 0., 8.41470985, 9.09297427, 1.41120008, -7.56802495]) >>> a*b array([ 0, 1, 4, 9, 16]) Array operations Since multiplication is done element-wise, you need to specifically perform a dot product to perform matrix multiplication. >>> a = np.zeros(4).reshape(2,2) >>> a array([[ 0., 0.], [ 0., 0.]]) >>> a[0,0] = 1 >>> a[1,1] = 1 >>> b = np.arange(4).reshape(2,2) >>> b array([[0, 1], [2, 3]]) >>> a*b array([[ 0., 0.], [ 0., 3.]]) >>> np.dot(a,b) array([[ 0., 1.], [ 2., 3.]]) Array operations There are also some built-in methods of ndarray objects. Universal functions which may also be applied include exp, sqrt, add, sin, cos, etc… >>> a = np.random.random((2,3)) >>> a array([[ 0.68166391, 0.98943098, 0.69361582], [ 0.78888081, 0.62197125, 0.40517936]]) >>> a.sum() 4.1807421388722164 >>> a.min() 0.4051793610379143 >>> a.max(axis=0) array([ 0.78888081, 0.98943098, 0.69361582]) >>> a.min(axis=1) array([ 0.68166391, 0.40517936]) Array operations An array shape can be manipulated by a number of methods. resize(size) will modify an array in place. reshape(size) will return a copy of the array with a new shape. >>> a = np.floor(10*np.random.random((3,4))) >>> print a [[ 9. 8. 7. 9