Need help with these 2 assignments its best to do assignment 1 then do assignment 2 cause it requires you to have assignment 1 created already
Assignment 1: Raw Data to Feature Space Task 1: Download or generate a fruits and vegetables image dataset! • http://www.ic.unicamp.br/~rocha/pub/downloads/tropical-fruits-DB-1024x768.tar.gz (Download and store them in a folder) – You may use your own image datasets if you wish. • https://www.vicos.si/Downloads/FIDS30 (Download and store them in a folder) • Cite the sources as stated in their website in your document • Select three types of fruits from FIDS30 to perform an initial study (e.g. cantaloupe, banana, and tomato). You could also use your cell phone to take pictures of fruits and vegetables and generate your own fruits and vegetable image datasets. You must justify the selection of your images with computational reasoning. I will use image0, image1, and image2 as labels to explain the rest of this assignment. ➢ Add the screenshots of these data folders to the report (pdf document) - use a suitable subheading for the section in the report. Also include the images that you selected Task 2: Write a simple code to read your selected images and display them on the environment! • Select and add suitable Python and OpenCV libraries to your code to perform this task • Write a code to read these color images and display their R, G, and B channel images • Convert these color images to grayscale and display them while printing their dimensions ➢ Add the code and the results to the document is being generated - use a suitable subheading Task 3: Resize the images to reduce their dimensions! • Write a function (Python) to resize (reduce) the grayscale images such that the output dimensions are divisible by 8 without changing their original aspect ratios • You must parametrize the function appropriately to generalize the reduction process • Use this parametric function to reduce the size of the grayscale images to have height of 256 pixels – maintain the aspect ratio for the width, but it must be divisible by 8 ➢ Add the code and the results to the document - use a suitable subheading Task 4: Generate block-feature vectors! • Write a code to divide each image into blocks of 8x8 pixels and transform them to vectors of size 64 • Assign a label to each feature vector with 0, 1, and 2 for the first, second, and third images, respectively • Generate a spreadsheet by storing a feature vector per row in the spreadsheet for each image • A visual illustration is provided in Figure 1 for generating block-feature vectors from a raw image ➢ Add the code and the results to the document - use a suitable subheading Task 5: Generate sliding block-feature vectors! • Write a code to divide an image into sliding blocks of 8x8 pixels and transform them to vectors of size 64 • Assign a label to each feature vector with 0, 1, and 2 for the first, second, and third images, respectively • Generate a spreadsheet by storing a feature vector per row in the spreadsheet for each image • A visual illustration is shown in Figure 2 for generating sliding block-feature vectors from a raw image ➢ Add the code and the results to the document - use a suitable subheading Task 6: Derive statistical descriptors! • Extract statistical information (e.g. number of observations, dimension of the data, mean of each feature, etc.) from these datasets. Also present visual representations (e.g. histogram, scatter plot, etc.) of the data. • Answer the following questions -- Is the dataset imbalanced, inaccurate or incomplete? Is it a trivial data or possibly a big data? Does it have scalability problem? Are they high dimensional? Do you need to standardize? Do you need to normalize? How do they affect the data characteristics? • You must think about the above questions/problems and provide your explanation scientifically. You need to write programs to read the data and generate results to explain all of the above – since you need to show/justify. • You can follow chapter 3 discussions to answer the above questions. This chapter provides required details based on the analysis of two sets of images: (a) carpet and hardwood floor and (b) Biltmore and PrismaColors. ➢ Add the code and the results to the document - use a suitable subheading Task 7: Construct a feature space! • Merge the feature vectors in image0.csv and image1.csv to create a feature space for these images. Each feature and label columns must align vertically to generate the correct feature space for these image classes. Name the feature space file (spreadsheet) as image01.csv • Similarly merge the feature vectors in image0.csv, image1.csv, and image2.csv to create a feature space for these images. Each feature and label columns must align vertically to generate the correct feature space for these three classes. Name the feature space file (spreadsheet) as image012.csv • Randomize the placement of the feature vectors in the files image01.csv and image012.csv files. Note that you don’t randomize the content of a feature vector, but the placement (rows) of the csv files. You can now see the labels are randomized and it will help the training of ML in the later assignment goals ➢ Add the code and the results to the document is being generated - use a suitable subheading. Task 8: Display subspaces! • Select two features and plot the two-dimensional feature space with labeling the observations (vectors) of the fruits or vegetables that you selected by using the spreadsheets that you generated • Select three features and plot the three-dimensional feature space with labeling the observations (vectors) of the fruits or vegetables that you selected by using the spreadsheets that you generated • You must generate separate plot to show two class labels (meaning two fruits or vegetables) and three class labels (meaning three fruits or vegetables) • Discuss these figures and describe your observations in terms of their separable features ➢ Add the code and the results to the document is being generated - use a suitable subheading. Task 10: Make appropriate changes to your Python code such that it can read any number of images from a folder that consists of many similar images, generate a feature spaces, and generate a spreadsheets for the feature spaces! ➢ Add the code and the results to the document is being generated - use a suitable subheading. Task 11: Describe the effects of block size on the dimensionality of the feature space and the number of vectors in the domain. Also, describe how these effect may influence the classifier that divides the domain ➢ Add the discussion to the document that is being generated - use a suitable subheading. Task 12: Submit required documents! • Prepare a Latex document using one of the IEEE, ACM, Springer, or Elsevier Latex formats. However, make sure you select two-column format • Submit a zipped folder that consists of subfolders: (a) Latex subfolder that consists of all the necessary scripts and the pdf output (i.e., the report in two-column format); (b) Data subfolder that consists of all the images (both input and output), the spreadsheets with feature vectors and feature spaces that you have created based on the assigned tasks and the answers to all the questions; (c) a Code subfolder that consists of programs/modules that you developed to complete the task; and (d) a Screenshot subfolder that shows the programming environment that you created and the results that you obtained when you run your code • This is an evidenced-based assessment; hence, it is your responsibility to submit all the required documents that show the completion of all the required tasks. Submit them as a zipped folder via Canvas. • If you are in doubt or have questions send me an email:
[email protected] or visit during my virtual office hours. You can also ask questions and clear your doubts during zoom meetings. It is important that you do not make assumptions on assignment/test requirements based on the discussions with other students. Figure 1: Generating block feature vectors from a raw image data – only three block-features are shown here, but you need to generate all the possible features from the entire image Figure 2: Generating sliding block feature vectors from a raw image data – only two sets of sliding block-features are shown here, but you need to generate all the possible features from the entire image Assignment 2: Feature Space to a Classifier Due Date: March 21st, 2021 In assignment 1, you generated four distinct categories of datasets: non-overlapping feature vectors for two-class classification, overlapping feature vectors for two-class classification, non-overlapping feature vectors for multi- class classification, and overlapping feature vectors for multi-class classification. The main objectives of this assignment 2 are the development of classifiers under the four categories of datasets (or feature learning), and evaluate them by using suitable qualitative measures. You must produce a written technical report that includes your work, results, and findings. The report must be prepared using Latex by updating your assignment 1 report. Task 1: Complete and extend the tasks of assignment 1 • Make sure to complete all the tasks in assignment 1. It is important to check if you have already generated all the four categories of datasets so that you can perform a machine learning task in this assignment 2. • Divide the data domain of the datasets into 80:20, where 80% is assigned to training datasets and 20% is assigned to testing datasets. Save these subsets for all the categories of datasets with suitable file names. • Select two features in each category of training and testing datasets, and plot their histograms to see if they follow the same distribution – you may determine that from the shape, mean values, and variance. • Use the same two features to generate scatter plots for each category of training and testing datasets as well. Highlight the corresponding labels with distinct colors. If the plot is too dense then use subsets. ➢ Create subsections in the reports and explain the steps that you have performed to achieve these tasks. Include the results (plots) that you obtained. Don’t include the entire source code in the report. Task 2: Implementing a regression-based model • Implement and train lasso regression or elastic-net regression as a two-class classifier using the training sets of the overlapping and non-overlapping feature vectors (feature spaces) that you created. • Apply the trained models to the test sets of all the categories of datasets and add the predicted labels next to their actual labels in the corresponding spreadsheets (i.e., to the 66th column of the test sets). • Construct confusion matrices using the actual