Newest 'clustering' Questions

3 votes

0 answers

766 views

Rust code implementing cosine similarity

I've been trying to create a piece of code which consists of looping through each element of a list of questions, preprocess it, and then calculate the Cosine similarity with the rest of the elements (...

Shodai Thox

31

asked Jun 24, 2022 at 13:04

1 vote

1 answer

79 views

Dividing shared resources for homogeneous multithread processing

I'm trying to implement a homogeneous multithreading example that multiple threads process portion of a huge task. In order to achieve this, I thought of clustering data/resource and multiple threads ...

YoonSeok OH

111

asked Jan 1, 2022 at 10:03

1 vote

1 answer

413 views

Implement 2D and 1D std::array in opencl kernel

I am asked to implement the following part of code into kernel code. Actually, I have tried but not sure about the std::array. This is the original code for the ...

user247399

11

asked Aug 22, 2021 at 9:38

0 votes

2 answers

467 views

Multithreaded implementation of K-means clustering algorithm in Java

Hello I have written a multi-threaded implementation of the K-means clustering algorithm. The main goals are correctness and scalable performance on multi-core CPUs. I expect to code to not have race ...

Teodor Dyakov

477

asked Feb 26, 2021 at 18:50

1 vote

1 answer

252 views

Calculation of the Distance Matrix in the K-Means Algorithm in MATLAB

Purpose of the code : To assign the corresponding label of the centroids to the points which are close to it. Below is a graphical (2D) example. Variable X is a matrix, rows represent the points, ...

the_illuminated2003

125

asked Jan 1, 2021 at 19:16

2 votes

1 answer

297 views

A Tiny Nearest Neighbor Classification Implementation in C#

I am practicing to implement the KNN classification tool in C#. The basic point structure is constructed by the class Point, and there are two members in ...

JimmyHu

7,575

asked Oct 11, 2020 at 10:36

4 votes

1 answer

318 views

K-clustering algorithm using Kruskal MST with Disjoint Set in place to check for cycles

here below a working implementation that finds the minimal distance between k(set =4 below) clusters in a graph. I have doubts mainly on the implementation of the ...

Giogre

515

asked Sep 13, 2020 at 15:38

6 votes

0 answers

147 views

K nearest neighbours algorithm

Here is a project that I worked on for a few days in June 2020. Since the algorithm is extremely slow, I looked into methods in order to parallelize operations but did not obtain any satisfactory ...

Aaron John Sabu

163

asked Jul 1, 2020 at 3:53

3 votes

1 answer

279 views

Implementation of K-means

I have recently built a class that is an implementation of kMeans from scratch. I believe there is room for improvement and I would happily receive some feedback. The project can be found at: https://...

Empan

73

asked Feb 9, 2020 at 15:33

6 votes

1 answer

607 views

Schelling's model of Segregation Python implementation with Geopandas

If you don't know what is Schelling's model of segregation, you can read it here. The Schelling model of segregation is an agent-based model that illustrates how individual tendencies regarding ...

Kartikeya Sharma

265

asked Aug 5, 2019 at 2:37

5 votes

2 answers

5k views

Grouping sorted coordinates based on proximity to each other

I created an algotrithm that groups a sorted list of coordinates into buckets based on their proximity (30) to one another. Steps: Create a new key with a list ...

Josh Sharkey

153

asked Jul 22, 2019 at 21:50

1 vote

1 answer

4k views

Clustering using k-medoids

This is the program function code for clustering using k-medoids ...

dodo

21

asked Jun 24, 2019 at 16:34

2 votes

0 answers

134 views

Machine learning, kNN and Naïve Bayes algorithm

This is the task I am working on: In this assignment you will implement the K-Nearest Neighbour and Naïve Bayes algorithms and evaluate them on a real dataset using the stratified cross validation ...

Rafiul Nakib

49

asked Jun 19, 2019 at 16:42

1 vote

1 answer

781 views

Welford's online variance calculation algorithm for vectors

I'm developing a face recognizing application using the face_recognition Python library. The faces are encoded as 128-dimension floating-point vectors. In addition to this, each named known person ...

DannyNiu

320

asked Mar 22, 2019 at 8:18

3 votes

0 answers

557 views

Locality Sensitive Hash (similar to k-Nearest Neighbor), in Python+Numpy

I've tried implementing Locality Sensitive Hash, the algorithm that helps recommendation engines, and powers apps like Shazzam that can identify songs you heard at restaurants. LSH is supposed to run ...

Josh.F

187

asked Jan 30, 2019 at 23:50

0 votes

1 answer

67 views

Analyzing distances between clusters of orders [closed]

I wrote the Python class below, which does what I want it to do, but the data structure is a mess. Was wondering if there was a better structure I could use to get the same results but with better ...

Mike Sivalls

11

asked Nov 22, 2018 at 20:10

1 vote

2 answers

347 views

Top k closest pairs in a set of million 128-dimensional points [closed]

I have a set of 1 million points in 128-dimensional space. Among all trillion pairs of the points in the set, I need to get a subset of 100 million pairs whose cosine distances are less than that of ...

jayadeepk

19

asked Oct 23, 2018 at 15:17

1 vote

1 answer

208 views

Efficiently determining maximum allowed euclidean distance between lists of colors

I was recently tasked with determining which hex RGB colors in list color_list are nearest to each hex RGB color in list target_colors, using euclidean distance as the measuring stick, and only ...

Billy Kimble

113

asked Oct 9, 2018 at 1:06

5 votes

1 answer

6k views

k-means using numpy

This is k-means implementation using Python (numpy). I believe there is room for improvement when it comes to computing distances (given I'm using a list comprehension, maybe I could also pack it in a ...

Adel Redjimi

381

asked Oct 7, 2018 at 10:05

6 votes

1 answer

4k views

Closest Pair algorithm implementation in C++

I had been working on the implementation of closest pair algorithm in a 2-D plane. My approach has been that of divide and conquer O(nlogn) : ...

Ang

131

asked Aug 30, 2018 at 18:18

3 votes

1 answer

106 views

Simple natural language classifier

This program estimates the likelihood for a string to belong to a certain natural language by computing the cosine similarity between an input string's and several natural languages' letter frequency, ...

Passa

33

asked Aug 26, 2018 at 23:54

6 votes

1 answer

526 views

Haskell K-means implementation

The following is a Haskell backend to a K-means visualisation: I have omitted the API code (exists in a separate module), the relevant endpoints simply call ...

danbroooks

247

asked Jul 23, 2018 at 20:55

2 votes

1 answer

169 views

R function to generate predictions from ratings

I am trying to improve the run time of a program I wrote in R. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns....

user2355903

157

asked Jul 18, 2018 at 2:12

2 votes

0 answers

122 views

Regression on Pandas DataFrame

I am working on the following assignment and I am a bit lost: Build a regression model that will predict the rating score of each product based on attributes which correspond to some very common ...

Stefano Pozzi

121

asked Jul 14, 2018 at 15:15

4 votes

1 answer

415 views

K-Nearest Neighbors in pure Python

I want a general criticism on this code. Using external modules is not an option, I can only use what comes with CPython. ...

Gabriel

1,053

asked Jul 9, 2018 at 13:53

1 vote

1 answer

1k views

Finding the distance between the two closest points in a 2-D plane

Here is my code: ...

BuluBestTapu

13

asked Jun 29, 2018 at 19:10

0 votes

1 answer

2k views

Determining the similarity between two documents

I've made some code that reads in text files (which hold quite large vectors of word frequencies), which in turn stores each index of a vector within an ArrayList ...

FeelingLikeAJabroni

119

asked Jun 24, 2018 at 15:39

13 votes

2 answers

5k views

Removing neighbors in a point cloud

I have written a program to optimize a point cloud in dependency of their distances to each other. The code works very well for smaller number of points. For 1700 points it takes ca. 6 minutes. But I ...

Mohamad Reza Salehi Sadaghiani

345

asked Jun 8, 2018 at 9:30

11 votes

2 answers

3k views

K-Means image segmentation algorithm

I am a new C++ programmer and I have some experience in Python and C but I was almost completely self taught (I learned C++ with OpenClassrooms). I would like to learn the conventions and how things ...

Timéo Pochin

111

asked Apr 2, 2018 at 17:26

1 vote

0 answers

7k views

Fuzzy c Means in Python

This is my implementation of Fuzzy c-Means in Python. In the main section of the code, I compared the time it takes with the sklearn implementation of kMeans. ...

Avisek

195

asked Feb 27, 2018 at 14:53

5 votes

1 answer

336 views

Calculation of clustering metric in Python

When I try to run the following code for arrays with more than 10k elements, it takes hours and I don't know how to make it in the most efficient way. Any ideas? ...

kibs

51

asked Feb 5, 2018 at 21:08

3 votes

0 answers

581 views

Compute distance matrix using DTW acceptable for scipy.cluster.hierarchy

I am new to both data science and python. I have a dataset of the time-dependent samples, which I want to run agglomerative hierarchical clustering on them. I have found that Dynamic Time Warping (DTW)...

user3933607

211

asked Dec 26, 2017 at 12:47

2 votes

0 answers

149 views

Cluster tweet texts

I am using using the following code to cluster tweet texts. The input is a dictionary containing tweet-id and tweet text as key value pairs. Example: ...

Jay

21

asked Dec 12, 2017 at 21:45

8 votes

1 answer

930 views

k-means implementation in python

This is the first mini-project that I'm working on python, where I implement k-means. I'm planning to parallelize it as soon as I've written a good serial version. Code description: Below you will ...

user6321

347

asked Sep 19, 2017 at 13:01

3 votes

1 answer

632 views

R - Outlier Detection Algorithm

I am trying to implement an algorithm for detecting outliers in R and I am pretty new to the language. The outlier algorithm is described in this paper in detail on page 10-11, but to summarize it ...

guy

133

asked Sep 15, 2017 at 22:33

4 votes

1 answer

641 views

DBSCAN "region query" too slow; implement a tree?

My current DBSCAN in Python works...but its indexing is far too slow; its a linear scan: ...

pstatix

159

asked Sep 6, 2017 at 14:36

2 votes

0 answers

486 views

A Simple K-Means Cluster Analyzer v0.2

This code is a revision of a previous post and works well. This my first attempt at creating a robust, computationally lean K-Means Cluster analyzer. I first saw this algorithm in an intermediate ...

Miller

533

asked Sep 2, 2017 at 21:53

4 votes

1 answer

235 views

Grouping orders by similarity (cluster) of their items

My task is to write an algorithm for grouping list of orders into batches, each batch consisting of 4 orders. Orders are grouped by similarity of their items, which means the more items from Order X ...

bigb055

141

asked Aug 24, 2017 at 14:30

3 votes

0 answers

2k views

A Simple K-Means Cluster Analyzer v0.1

[NOTE] This question can be depreciated in favor of version 0.2. This code works well. This my first attempt at creating a robust, computationally lean K-Means Cluster analyzer. I first saw this ...

Miller

533

asked Aug 22, 2017 at 3:43

5 votes

1 answer

197 views

A Simple Cluster Generator v0.31

[NOTE] This question can be depreciated in favor of version 0.32. This is a code revision of a previous post and works well. The purpose of this code is to produce a universe of points, randomly ...

Miller

533

asked Aug 17, 2017 at 17:39

3 votes

1 answer

179 views

A Simple Cluster Generator v0.3

[NOTE] This question can be depreciated in favor of version 0.31. This is a code revision of a previous post and works well. The purpose of this code is to produce a universe of points, randomly ...

Miller

533

asked Aug 16, 2017 at 19:58

6 votes

2 answers

199 views

A Simple Cluster Generator v0.2

[NOTE] This question can be depreciated in favor of version 0.3. This is a code revision of a previous post and works well. Code has been reworked to be far more clear and concise, thanks to ...

Miller

533

asked Aug 11, 2017 at 21:01

1 vote

1 answer

165 views

A Simple Cluster Generator v0.1

[NOTE] This question can be depreciated in favor of version 0.2. The purpose of this code is to produce a universe of points, randomly generated around predetermined centroids, provided as a vector ...

Miller

533

asked Aug 11, 2017 at 6:08

6 votes

1 answer

174 views

Java find minimum range

Question Description: Given a list of companies, eg {ABC, BBC} and a large list of data looks like below: ...

Angela P

299

asked Jul 31, 2017 at 19:42

7 votes

1 answer

1k views

PANDAS nearest site algorithm

I have got CSVs full of property transactions in the UK from 1995 to 2017, separated by year such as "RS2015.csv". I have a 2nd CSV with a list of wind turbines in the UK. Both have coordinates in WGS ...

CTaylor19

173

asked Jul 23, 2017 at 11:48

4 votes

1 answer

100 views

Getting hex colours from a image

I am trying to get the hex colours from an image. The problem I am having is that for some reason randomly the code causes high CPU usage, which times out the browser and I am not sure how to optimise ...

Limpep

141

asked Jun 26, 2017 at 15:15

2 votes

0 answers

64 views

Identifying local peaks among some 2D points

The following MATLAB code takes in multiple peak coordinates and heights and eliminates lesser peaks that are within a certain distance threshold of the highest peak of the vicinity. Is there a better ...

Lord Loh.

339

asked Jun 21, 2017 at 0:28

2 votes

1 answer

303 views

Predicting outliers in network data [closed]

I'm implementing KNN algorithm to predict outliers in network data which contains the following columns: source IP address, source port number, protocol and total bytes transferred. To achieve this, I'...

iprof0214

151

asked Jun 14, 2017 at 21:59

7 votes

1 answer

178 views

Speeding up maximum self-similarity test for heavy tail-exponents

I am trying to reproduce results from a research paper using python. I've checked my method and it works on relatively small sample datasets. However, the code does not run for my actual dataset, ...

user127168

asked Jun 8, 2017 at 1:51

1 vote

1 answer

5k views

DBSCAN c++ implementation

For work I had to implement the DBSCAN algorithm in the 3D space for clusters finding. It works, now I wonder how is the quality of the code. I'm especially concerned about incrementing the size of ...

Alberto89

165

asked May 13, 2017 at 10:31

Questions tagged [clustering]