Python coding homework assignment
Microsoft Word - Project11.docx CSE 231 Spring 2019 Programming Project 11 This assignment is worth 55 points (5.5% of the course grade) and must be completed and turned in before 11:59 on Wednesday, April 24, 2019. Assignment Overview (learning objectives) This assignment will give you more experience on the use of: 1. Lists 2. Dictionaries 3. Iteration 4. String 5. Classes The goal of this project is to do some social network analysis. Assignment Background Social network analysis is the process of investigating social structures through the use of networks and graph theory. One of the common ways to analyze social networks is through the use of Ego Nets. Ego Nets are special kind of networks which consists of a Central Node called “Ego” and other nodes which are connected to the Ego and each other; these other nodes are called “Alters”. Example of an Ego Net. Fig1. http://www.analytictech.com/networks/egonet.htm Some important definitions regarding EgoNets related to this project. Ego - Central node of the EgoNet Alter(s) - Auxiliary nodes connected to the Ego and each other. Each alter node may be connection to some, all or none of other alter nodes in an EgoNet. However, every alter node needs to be connected to the Ego. Feature - Property of a alter or Ego. Examples of properties might be gender, location, or educational institution. Alters and Egos can have different sets of features in an EgoNet, but in our project all the alters and the Ego will have the same set of features. Circle - A collection of alter nodes with their connected Ego. They form a group with one or more common features. In computer science a “graph” is a collection of nodes (Egos and alters in this case) and edges—the connections among them. We are going to use an undirected graph for our EgoNet. That means that there is no ordering among nodes. (The difference between “directed” and “undirected” graph is described here https://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture21/sld014.htm.) There are multiple ways to represent a graph in code. One is to use a dictionary. In this method the key of the dictionary represents a node (Ego or alter) and the value is a list which represents the nodes that the key node is connected to. For example, the EgoNet in Figure 1 can be built like this: social_network ={} social_network[EGO] = [a,b,c,d,e,f,g] social_network[a] = [b,g,EGO] social_network[b] = [a,c,f,g,EGO] social_network[c] = [b,EGO] social_network[d] = [EGO] social_network[e] = [EGO] social_network[f] = [b,g,EGO] social_network[g] = [a,b,f,EGO] You will notice that this representation models our undirected graph well. If we want to see which nodes a is connected to, all we have to do is get the value in our dictionary with a as the key, i.e. a_connections = social_network[a] You will also notice that our EGO node is connected to every other. Circles represent a group of people who form a cohort due to some similar feature. If a social network has the people p1, p2, p3, p4, p5, and p6 in it, and the threesome p1, p2, and p5 are from the same city, then they (p1, p2, and p5) make a circle because they share a feature (“from the same city”). Also, if p2, p3, and p4 go to the same university, then they would make a circle. Notice that each person can be a part of zero, one or more circles (p6 is not part of any circle). Similarity in a circle is defined by how many features the alters in one circle have in common. For example, a circle with alters a1, a2, a3 and a4 have these values for features f1, f2, f3, and f4. A 1 in a column indicates that the alter has that feature; a 0 indicates that is doesn’t have that feature. f1 f2 f3 f4 a1.features = [0 ,1 ,0 ,1] a2.features = [0 ,1 ,1 ,1] a3.features = [1 ,1 ,0 ,0] a4.features = [0 ,1 ,1 ,1] The similarity for this circle can be represented with a dictionary of counts. Looking at each column, how many alters share this feature (i.e. how many ones are there in the column) similarity = {f1:1, f2:4, f3:2, f4: 3} As we can see the strongest feature (highest count) for this circle is f2 followed by f4, f3 and f1. The Krackhardt and Stern’s EI index for the Ego is defined as the ratio of homophily and heterophily for the said Ego. The EI index is defined as: EI = (E − I)/(E + I) Here the E stands for the number of external ties – that is, the number of ties Ego has to alters in a category different from their own. The I stands for the number of internal ties – that is, the number of ties Ego has to alters in the same category as they are in. E+I is the total number of ties (aka degree of ego). Hence, EI is the number of external ties minus the number of internal ties divided by the total number of ties. It varies from –1 to +1. A score of –1 means that Ego only has ties with alters in the same category as themselves (said to show perfect homophily), and a score of +1 means that the Ego only has ties to alters in different categories (said to show perfect heterophily). For example, consider an EgoNet with 4 different types of school features namely school_1, school_2, school_3, and school_4. The Ego is connected to 25 alters, 9 of which attend school_1, 8 attend school_2, 4 attend school_3, and 4 attend school_4. If Ego attends school_3 then the EI would have I=4 (4 internal ties because 4 attend school_3) and E=21 (external ties = 25 total ties to alters – 4 internal ties). Therefore, the EI index would be (21-4)/(21+4) = 0.68 which indicates high degree of heterophily. The effective size of an Ego Net is one of the ways to do a structural analysis of the Ego Net. Effective size of the network (EffSize) is the number of alters that Ego has, minus the average number of ties that each alter has to other alters (called the redundancy). Suppose that an Ego A has ties to three alters. Suppose that none of these three alters has ties to any of the other alters. The EffSize (effective size) of Ego A's network is three (3 – 0). Alternatively, suppose that an Ego B has ties to three others, and that all of the others are tied to one another. B's network has 3 alters and each alter is tied to two other alters so the redundancy is (2+2+2)/3 = 2. So, the effective size of the network is its actual size (3), reduced by ties of the alters (2), to yield an EffSize of 1. EffSize in Figure 2. There are six alters (A,B,C,D,E,F), alter A has 3 ties to other alters (B,E,F), alter B has 2 ties to other alters (A,D), alter C has 0 ties to other alters, alter D has 1 tie to other alters (B), alter E has 1 tie to other alters (A), E has 1 tie to other alters (A). So redundancy = (3+2+0+1+1+1)/6 = 8/6 = 1.33333 EffSize = 6 – 1.33333 = 4.6666 Fig 2. http://www.analytictech.com/e-net/pdwhandout.pdf Efficiency of the Ego Net normalizes the effective size of ego's network by its actual size. That is, what proportion of an Ego's ties to its neighborhood are "non-redundant." The effective size of ego's network may tell us something about ego's total impact; efficiency tells us how much impact ego is getting for each unit invested in using ties. An actor can be effective without being efficient; and actor can be efficient without being effective. Fig 3. http://www.analytictech.com/e-net/pdwhandout.pdf Project Description In this project we are going to be building an EgoNet using object oriented paradigm. We have provided some files that contain the required classes to build an EgoNet. The files contain class definitions for the classes you’re going to need. Some of the functions in the classes are filled out for you. You need to complete the rest of the functions and write ‘proj11.py’ to use the EgoNet. Note: unlike earlier assignments there are two simplifications made for files: 1. Files do not have headers 2. Files do not have erroneous data, i.e. no need to check for bad data Feature.py This file is provided for you. This file contains definitions for class Feature whose objects are going to represent the features of alters and Ego. The Feature class has three attributes, __feature_id - The id of the feature object. type is str __feature - Name of the feature. type is str __feature_value - Value of the feature. type is int whose value is 1 if the alter or Ego has this feature and 0 if not.