Bumped by Community user

occurred Oct 2, 2023 at 3:01

Bumped by Community user

occurred Jun 4, 2023 at 2:08

Bumped by Community user

occurred Feb 4, 2023 at 1:07

Bumped by Community user

occurred Oct 7, 2022 at 0:05

Bumped by Community user

occurred Jun 9, 2022 at 0:02

Bumped by Community user

occurred Feb 9, 2022 at 0:01

Bumped by Community user

occurred Oct 11, 2021 at 23:07

Bumped by Community user

occurred Jun 13, 2021 at 23:00

Bumped by Community user

occurred Feb 13, 2021 at 22:05

Notice removed Draw attention by CommunityBot

occurred Jan 17, 2021 at 23:00

Bounty Ended with no winning answer by CommunityBot

occurred Jan 17, 2021 at 23:00

Tweeted twitter.com/StackCodeReview/status/1349868914436665345

occurred Jan 15, 2021 at 0:00

Added some detail and clarity

Source Link

edited Jan 11, 2021 at 12:56

artemis

203
1
11

While this code works well and outputs results properly, it takes an incredibly largelong time to parseprocess large datasets. The dataset in particular that I am using is an NLP dataset, particularparticularly of TFterm frequency values, so there are a LOT of zeroes and the data does not follow a normal distribution (not a single feature does) (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

While this code works well and outputs results properly, it takes an incredibly large time to parse large datasets. The dataset in particular that I am using is an NLP dataset, particular of TF values, so there are a LOT of zeroes (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

While this code works well and outputs results properly, it takes an incredibly long time to process large datasets. The dataset in particular that I am using is an NLP dataset, particularly of term frequency values, so there are a LOT of zeroes and the data does not follow a normal distribution (not a single feature does) (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

added 137 characters in body

Source Link

edited Jan 9, 2021 at 21:11

artemis

203
1
11

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space. .

Please note, if you believe the algorithm was implemented incorrectly please feel free to fix that. Principally concerned with speed

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space.

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space. .

Please note, if you believe the algorithm was implemented incorrectly please feel free to fix that. Principally concerned with speed

Notice added Draw attention by artemis

occurred Jan 9, 2021 at 21:10

Bounty Started worth 50 reputation by artemis

occurred Jan 9, 2021 at 21:10

edited title

Link

edited Jan 7, 2021 at 13:08

artemis

203
1
11

Algorithm Optimization of MLE for-- Automatic Choice of Dimensionality of PCA

Source Link

asked Jan 6, 2021 at 19:48

artemis

203
1
11

Loading

Stack Exchange Network

Return to Question

Algorithm Optimization of MLE for-- Automatic Choice of Dimensionality of PCA