Posts

Showing posts from June, 2019

MeanNearestNeighbors (MNN) - algorithm for balancing dataset - In progress #1

Image
One of the challenges in classification problems are unbalanced datasets. I was Data Science Intern when the company that I worked for, assigned me such an interesting challenge where the dataset was unbalanced.  However, I realized this type of problem like unbalanced dataset is а common thing in real life. I tried most of the algorithms (undersampling, oversampling) like SMOTE, NearMiss, CondensedNearestNeighbors, RandomUnderSampler, RandomOverSampler,  KMeansSMOTŠ• and rest of them. Anyway, they didn't help me in that case, on the contrary, they worsened my model.  I was like: "but, but, you should have been helpful in creating the predictive model" So, I'm trying to create another algorithm based on undersampling concept when it comes to balancing datasets. I called it Mean Nearest Neighbors (MNN). What's the initial idea: It's simple. Actually, the algorithm is just a modification of the other undersampling algorithms. In the data where target labe...

Math Problem -> Combinatorics: Foreign alphabet

You are given 12 letters in a foreign alphabet.  Eight of the letters are consonants while the remaining four are vowels.  You are asked to create only 5 letter words using these 12 letters.  Each word must contain exactly two vowels and no repetition of letters is allowed.  How many different arrangements can be made? . . . . . . . . . . . . . . . . . . Solution: C(5,2) * (4*3) * (8*7*6) C(5,2) -> 5 letter word , we find the number of words with two vowels. (4*3) - > 2-permutation of 4 vowels (8*7*6) - > 3-permutation of 8 consonants The final step we use rule product .