MeanNearestNeighbors (MNN) - algorithm for balancing dataset - In progress #1

Image
One of the challenges in classification problems are unbalanced datasets. I was Data Science Intern when the company that I worked for, assigned me such an interesting challenge where the dataset was unbalanced.  However, I realized this type of problem like unbalanced dataset is а common thing in real life. I tried most of the algorithms (undersampling, oversampling) like SMOTE, NearMiss, CondensedNearestNeighbors, RandomUnderSampler, RandomOverSampler,  KMeansSMOTЕ and rest of them. Anyway, they didn't help me in that case, on the contrary, they worsened my model.  I was like: "but, but, you should have been helpful in creating the predictive model" So, I'm trying to create another algorithm based on undersampling concept when it comes to balancing datasets. I called it Mean Nearest Neighbors (MNN). What's the initial idea: It's simple. Actually, the algorithm is just a modification of the other undersampling algorithms. In the data where target labe...

Competitive Programming #9 : Depth-First Search (DFS) & Breadth-First Search (BFS)

Today I was learning  about Graph traversal. And my opinion about graphs is that very interesting but it's a little bit hard . So now I will try to present the Depth-First Search .
We have this graph :

We have this code:

 We use DFS + BackTracking .
The algorithm begins at a starting node (in our case Depth_First_Search(1)) and proceeds to all other nodes that are reachable from the starting node using the edges of the graph.

DFS always follow a single path in the graph as long as it finds new nodes. After this, it returns to previous nodes and begins to explore other parts of graph. The algorithm keeps track of visited nodes, so that is processes each node only once .

The time complexity of DFS is O(n+m) where n is the number of nodes and m is the number of edges, cause the algorithm processes each node and edge once.

So in the code we maintain the adjacency lists in an array and also maintain an array boolean 
 bool visited[N] (the most important part of code);
cause keeps track of the visited nodes. Initially, each array value is false and when the search arrives at node s, the value of visited[s] becomes true. 

Another algorithm , BFS ( Breadth-First Search) which is more difficult to implement than DFS.
BFS goes through the nodes one level after another. First the search explores the nodes whose distance from starting node is 1, then the nodes whose distance is 2 and so on. This process continues until all nodes have been visited.

Like  in DFS , the time complexity of BFS is O(n+m), where n is the number  of nodes and m is the number of edges.
 Using same graph
Here is the  code BFS
 
Its based on a queue that contains nodes. At each step, the next node in the queue will be processed.

The queue q contains nodes to be processed in increasing order of their distance. New nodes are always added to the end of the queue, and the node at the beginning of the queue is the next node to be processed. 

The array visited  indicates which nodes the search has already visited and the array distance will contain the distances from the starting node to all nodes of the graph.

Popular posts from this blog

Math Problem -> Combinatorics: Foreign alphabet

Competitive Programming #29 : [LineUp]

Intro to Quantum Computing: Што ќе ми треба ова сега? #1