115 questions
1
vote
1
answer
52
views
Pandas keep every nth row with special rule
For example, I want to keep every 3rd row, but I must keep numbers divisible by 3(or some special rule like that). When I see a number divisible by 3, that restarts the count, meaning I will start ...
0
votes
2
answers
65
views
Is there a way to balance data in R without reordering a dataframe?
First, here is some toy data:
df <- data.frame(
"stim" = c("face", "object", "pareidolia", "face", "face", "object", "...
0
votes
1
answer
130
views
How to create a loop that subsamples dataset, runs a specific equation, and gives you a list of p-values for each sub-samples?
There is a previous question which asks a similar question (Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?)...
0
votes
2
answers
146
views
Sample from a dataset to generate a subset that has similar properties to another dataset
Let's say I have a large dataset of numeric values:
big_dataset = rnorm(n = 500, mean = 20, sd = 10)
I want to pull out a subset of observations from big_dataset that have similar values (within 5 ...
0
votes
2
answers
74
views
Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?
I am not sure what the correct word for this would be, so apologies for getting the terminology horribly wrong. Basically I have about 1000 datapoints, and I want to randomly subsample 100 data points ...
0
votes
1
answer
179
views
Efficient way to create stratified subsamples of a data frame, depending on frequency of a category
I want to create a sub-sample of data frame df, depending on the frequency of a given category in one of its columns, e.g. a.
Let's assume we have a data frame like this:
df <- data.frame(a = rep(1:...
0
votes
1
answer
22
views
How to subsample a data set to determine new detection rates
I have been tasked with subsampling a data set of cameras to determine whether we can get away with fewer cameras in our camera grid. The dataset already has detection rates for each species at each ...
0
votes
1
answer
104
views
Reweight observations after subsetting
I have a dataset containing a weight column, which I would like to subset while adjusting these weights to keep it representative of the original dataset.
Let us say I have the dataframe :
data....
0
votes
1
answer
92
views
Subsampling from a set with the assumption that each member would be picked at least one time in r
I need a code or idea for the case that we have a dataset of 1000 rows. I want to subsample from rows with the size of 800 for multiple times (I dont know how many times should I repeat).
How should I ...
1
vote
0
answers
104
views
How to get an even subsample from a dataframe in R with multiple variable
I have a dataframe with 67 items that looks like this:
df <- data.frame("item"= c("item1", "item2", "item3", "item4", "item5"), "...
0
votes
1
answer
320
views
subsampling formula skipgram NLP
I'm studying how to implement a Skip-Gram model using Pytorch, I follow this tutorial, in the subsampling part the author used this formula:
import random
import math
def subsample_prob(word, t=1e-3):...
2
votes
1
answer
74
views
R Obtain quantile and mean from a tailored subset in the dataset
I would like to obtain quantile in a tailored subset. For example in the following dataset:
data = data.table(x=c(rep(1,9),rep(2,9)),y=c(rep(1:6,each=3)),z=1:18)
For each row i, I want to know, in the ...
0
votes
1
answer
389
views
How to generate monthly period index with annual frequency?
How would generate in the most concise way a monthly period index that is observed only every 12 months?
I came up with the following solution
pd.period_range(start=pd.Period('1975-07'),
...
0
votes
2
answers
262
views
How to subsample time series (bursts of GPS locations)
I have a time series as below:
**Date_time**
2018-06-26 17:19:30
2018-06-26 17:20:40
2018-06-26 17:20:41
2018-06-26 17:20:42
[...]
2018-06-26 17:21:36
2018-06-26 17:21:37
2018-06-26 17:21:38
2018-06-...
3
votes
1
answer
652
views
In Matlab, how can I use chroma subsampling to downscale a 4:4:4 image to 4:1:1 when the image is in YCbCr?
Following this exact question
In Matlab, how can I use chroma subsampling to downscale a 4:4:4 image to 4:2:0 when the image is in YCbCr?
where he is performing chroma downscaling from 4:4:4 to 4:2:0, ...
0
votes
1
answer
459
views
Statistical reasoning: how and why does tf.keras.preprocessing.sequence skipgrams use sampling_table this way?
The sampling_table parameter is only used in the tf.keras.preprocessing.sequence.skipgrams method once to test if the probability of the target word in the sampling_table is smaller than some random ...
0
votes
0
answers
244
views
Create a subsample from a data frame in R
I have five data frames among which I want to run regressions:
df1: stock returns
df2: housing returns
df3: actual inflation rate
df4: expected inflation rate
df5: unexpected inflation rate
...
0
votes
0
answers
227
views
Subsample a large Armadillo matrix or vector
I've been skimming through the Armadillo documentation and examples, but it seems there is no real efficient way to subsample (or resample) a large vector or matrix, such that if you had N elements ...
0
votes
1
answer
106
views
Drawing a random sub-sample from a df proportionally to categories
I have a dataframe like this
names = ["Patient 1", "Patient 2", "Patient 3", "Patient 4", "Patient 5", "Patient 6", "Patient 7"]
...
0
votes
1
answer
107
views
More efficient way of subsampling sound files?
Apologies in advance if this has already been asked and for my wording of this question as I am new to R.
Is there any way of making my code for subsampling sound files more efficient? I have 148 ...
1
vote
1
answer
2k
views
Retrieve 100 samples closest to the centroids of each cluster after K means clustering using R
I'm trying to reduce the input data size by first performing a K-means clustering in R then sample 50-100 samples per representative cluster for downstream classification and feature selection.
The ...
0
votes
1
answer
198
views
How to subsample windows of a DataSet in Spark?
Let's say I have a DataSet that look like this:
Name | Grade
---------------
Josh | 94
Josh | 87
Amanda | 96
Karen | 78
Amanda | 90
Josh | 88
I would like to create a new DataSet ...
1
vote
1
answer
174
views
Loop over left joins
I've been trying to loop over left joins (using R). I need to create a table with columns representing samples from a larger table. Each column of the new table should represent each of these samples.
...
1
vote
0
answers
351
views
How to use tf.data.Dataset.interleave to subsample from multi dataset objects in tf2?
I tried to replicate the solution posted here with tf.data.Dataset.interleave, but not quite sure how to apply the interleave method to already created dataset objects.
here is the code:
import ...
1
vote
1
answer
344
views
Sample random rows evenly spaced apart in R
I have a df of measurements over 50 years. I am trying to subsample the data to see what patterns I would have found had I only sampled in 2 years, or in 3, 4, 5, etc, instead of in all 50. I made a ...
0
votes
1
answer
413
views
How to resample without replacement considering consecutive three as one unit for each choice
The goal is to sample the n number of data points from the original population. But the original population has serial correlation (consider it as time series data) and I want to choose neighboring ...
1
vote
1
answer
167
views
Subsampling a 1D array of integer so that the sum hits a target value in python
I have two 1D arrays of integers whose some differ, for example:
a = [1,2,2,0,3,5]
b = [0,0,3,2,0,0]
I would like the sum of each array to be equal to that of the smallest of the two. However I want ...
1
vote
1
answer
1k
views
Gensim word2vec downsampling sample=0
Does sample= 0 in Gensim word2vec mean that no downsampling is being used during my training? The documentation says just that
"useful range is (0, 1e-5)"
However putting the threshold to 0 would ...
2
votes
2
answers
163
views
variable length df subsampling function r
I need to write a function involving subsetting a df by a variable n bins. Like, if n is 2, then subsample the df some number of times in two bins (from the first half, then from the second half). If ...
1
vote
0
answers
41
views
Get all possible combinations of numpy array elements [duplicate]
I need to get all possible combinations nCr of all possible sizes of a numpy array.
[1,2,3,4,5]
should give us a set of arrays:
[1],[2],[3],[4],[5]
[1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[2,5],[3,4],[3,...
1
vote
1
answer
2k
views
How to convert Y Cb Cr to RGB in MATLAB manually?
I've been tasked with performing a 4:2:0 chroma subsampling (color compression) on a series of JPEGs.
The first step is to ensure that I can generate my Y, Cb, and Cr values and then convert back ...
0
votes
1
answer
3k
views
In Matlab, how can I use chroma subsampling to downscale a 4:4:4 image to 4:2:0 when the image is in YCbCr?
I have already converted the jpg images from RGB to YCbCr but must now use Chroma Subsampling to make them 4:2:0. I have searched but have not found any information on how to do this (note: I am very ...
0
votes
1
answer
531
views
Creating overlapping, square patches for rectangular images
Given be a rectangular image img and patch s. Now I would like to cover the whole image with square patches of side length s, so that every pixel in img is in at least one patch using the minimal ...
1
vote
1
answer
3k
views
Chroma Subsampling with ffmpeg
I want to create an .mp4 output. But it doesn't work...
I'm using ffmpeg. My input video is a raw video and I want to have an raw video .mp4 at the end.
My code that i use:
ffmpeg.exe -i input.y4m -...
0
votes
1
answer
651
views
How does Gensim implement subsampling in Word2Vec?
I am trying to reimplement wor2vec in pytorch. I implemented subsamping according to the code of the original paper. However, I am trying to understand how subsampling is implemented in Gensim. I ...
0
votes
1
answer
1k
views
Is there an "easy" way to create stratified split of frames using h2o.ai?
Stratified sampling is old, and very significant.
Donald Knuth (high priest of computer science) uses it for evaluating the work of his PhD students, and for teaching his deeply and sincerely held ...
0
votes
0
answers
1k
views
two stage cluster sampling in R
In R,a data set with 30 categories (N cluster=30),in each cluster there are unequal number of units (in ith cluster, there can be 24, 25,26,27, or 28 units). I want to take two stage sampling, first ...
2
votes
1
answer
3k
views
Word2Vec Subsampling -- Implementation
I am implementing the Skipgram model, both in Pytorch and Tensorflow2. I am having doubts about the implementation of subsampling of frequent words. Verbatim from the paper, the probability of ...
0
votes
1
answer
713
views
Randomly divide df in list of df into equal subsets [duplicate]
yesterday I already asked a similar question: R - Randomly split a dataframe in n equal pieces
The answer I got is nearly what I need, but there are still problems with it. Also I thought about ...
1
vote
0
answers
80
views
spark efficient distribution pairing to compare cohorts
How can I efficiently compare matched cohorts in spark?
In python for each observation of the minority class in a highly imbalanced dataset sampling k observations from the majority class can be ...
0
votes
1
answer
505
views
How to disable sub-sampling when saving jpg image using PHP GD library?
I noticed that each time I save a jpg file in PHP, it is saved with sub-sampling. How to remove that? I'm using GD library.
0
votes
1
answer
1k
views
How to calculate the size of the video based on given chroma information?
I was asked in the test on what will be the size of the video of 10 seconds displayed at 25fps assuming each chroma sample takes 4 bits, luminance component takes 8 bits and 4:2:0 chroma sampling is ...
1
vote
1
answer
320
views
Subsampling 3D array using the neighbourhood sum
The title is probably confusing. I have a reasonably large 3D numpy array. I'd like to cut it's size by 2^3 by binning blocks of size (2,2,2). Each element in the new 3D array should then contain the ...
1
vote
1
answer
2k
views
python 1:1 stratified sampling per each group
How can a 1:1 stratified sampling be performed in python?
Assume the Pandas Dataframe df to be heavily imbalanced. It contains a binary group and multiple columns of categorical sub groups.
df = pd....
1
vote
1
answer
856
views
How to subsample different numbers by ID and bootstrap in R
First, I'm trying to subsample a large dataset with many individuals, but each individual requires a different subsample size. I'm comparing across two time periods, so I want to subsample each ...
4
votes
2
answers
3k
views
How can I subsample an array according to its density? (Remove frequent values, keep rare ones)
I have this problem that I want to plot a data distribution where some values occur frequently while others are quite rare. The number of points in total is around 30.000. Rendering such a plot as png ...
1
vote
1
answer
887
views
How can I control subsampling such that xgb.cv and cross_validate produce the same results?
xgb.cv and sklearn.model_selection.cross_validate do not produce the same mean train/test error even though I set the same seed/random_state and I make sure both methods use the same folds. The code ...
5
votes
4
answers
3k
views
R (and dplyr?) - Sampling from a dataframe by group, up to a maximum sample size of n
I have a dataframe which contains multiple samples (1-n) per group. I would like to sample this dataset, without replacement, so that I have a maximum of 5 samples per group (1-5).
This problem has ...
1
vote
0
answers
222
views
Is there any difference between using WeightedRandomSampler with a big num_samples or doing more epoch with a num_samples lower?
I don't understand when the sampling is make:
Does the first mini batch will be the same for each epoch?
Or there no difference at all?
1
vote
1
answer
2k
views
Android zoomable constraintLayout
I'm new in android and have some questions. The idea is to simulate a book page with some images and text on it and animations that zoom on a column and after clicking a button zooms on a different ...