Skip to main content
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Notice removed Draw attention by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Tweeted twitter.com/StackCodeReview/status/1349868914436665345
Added some detail and clarity
Source Link
artemis
  • 203
  • 1
  • 11

While this code works well and outputs results properly, it takes an incredibly largelong time to parseprocess large datasets. The dataset in particular that I am using is an NLP dataset, particularparticularly of TFterm frequency values, so there are a LOT of zeroes and the data does not follow a normal distribution (not a single feature does) (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

While this code works well and outputs results properly, it takes an incredibly large time to parse large datasets. The dataset in particular that I am using is an NLP dataset, particular of TF values, so there are a LOT of zeroes (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

While this code works well and outputs results properly, it takes an incredibly long time to process large datasets. The dataset in particular that I am using is an NLP dataset, particularly of term frequency values, so there are a LOT of zeroes and the data does not follow a normal distribution (not a single feature does) (not sure if that makes a difference). My dataset's size is (550683, 10891). That is estimated to take more than 10 days to finish on my current hardware.

added 137 characters in body
Source Link
artemis
  • 203
  • 1
  • 11

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space. .

Please note, if you believe the algorithm was implemented incorrectly please feel free to fix that. Principally concerned with speed

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space.

How can I optimize this code to improve performance? Using the make_multilabel_classification call above, even that takes a fair amount of time given the feature space. .

Please note, if you believe the algorithm was implemented incorrectly please feel free to fix that. Principally concerned with speed

Notice added Draw attention by artemis
Bounty Started worth 50 reputation by artemis
edited title
Link
artemis
  • 203
  • 1
  • 11

Algorithm Optimization of MLE for-- Automatic Choice of Dimensionality of PCA

Source Link
artemis
  • 203
  • 1
  • 11
Loading