21,952 questions
0
votes
0
answers
20
views
Attribution Error when using Huggingface transformers Trainer with FSDP
I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like
train_dataset = Mydataset(...)
args = TrainingArguments(...)
model = LlamaForCausalLM....
0
votes
0
answers
19
views
OptimisticLockingException when using multiInstanceLoopCharacteristics for parallel execution of subprocess
I have the following process definition I try to execute on Camunda 7.24 / CibSeven 2.1 which currently logs during execution many OptimisticLockingException. I could already trace it down that it ...
0
votes
1
answer
123
views
Why are items not written to console immediately after being processed?
I have the following C# code :
var rand = new Random(1);
var range = Enumerable.Range(1, 8);
var partition = Partitioner.Create(range, EnumerablePartitionerOptions.NoBuffering);
foreach (var x in ...
-4
votes
0
answers
38
views
Need help running a modified LLM framework [closed]
I am using OpenAI's API to run the following framework called LLM-SR, but program gets stuck or only one sample results are generated. Is anyone else able to try and run it on their end? Below is the ...
0
votes
0
answers
46
views
Taking advantage of memory contiguousness in HLSL
This is a bit of a slog so bare with me.
I'm currently writing a 3D S(moothed) P(article) H(ydrodynamics) simulation in Unity with a parallel HLSL backend. It's a Lagrangian method of fluid simulation,...
Tooling
0
votes
0
replies
24
views
ComfyUI + Flux 1 dev + limited RAM + same workflow: With 2 GPUs?
I am running Flux 1 dev text to image model through ComfyUI in Kaggle. Everything works but I noticed that Kaggle offers a second GPU inside the notebook. If I try to run two instances of the ComfyUI ...
1
vote
0
answers
74
views
Intuition over TBB parallel scan/parallel prefix requirements
I am reading a paragraph about the tbb::parallel_scan algorithm from the book Intel Threading Building Blocks, and I understood what the operation does serially, but I am not understanding what are ...
0
votes
0
answers
70
views
Simple TBB example where tbb::affinity_partitioner gives a measurable speedup
While looking at this TBB guide webpage: https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-9/bandwidth-and-cache-affinity.html, they mention this ...
Best practices
0
votes
2
replies
124
views
Looping Datasets in R
Essentially I am trying to create a dataset that is dependent on prior rows to generate values for any given row. I then would like to run this loop over many IDs for an entire dataset. Current set up ...
0
votes
0
answers
50
views
Parallel Equations Expansion in TFORM
TFORM is considered a great tool for manipulating large and symbolic equations. In this thread, I’d like to share my optimization problem, which concerns a very simple operation — equations expansion.
...
Advice
1
vote
1
replies
31
views
Is there a way to transfer MBTiles formatted maps faster?
I need to transfer a MBTiles map tiles from a disk to another, is there a faster way then just mv? It is huge and takes time.
0
votes
1
answer
92
views
C++ segmentation fault when throwing in ordered OMP parallel for
the code below crashes with
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Aborted ...
0
votes
1
answer
50
views
Parallel Processing in Sclang
I am doing some calculations on spectra, doing some maths on each partial. This is taking a lot of time, but since the partials are all calculated independently, I wonder how to do parallel processing ...
1
vote
0
answers
34
views
run single notebook with different objects lists dict parallel on spark, when called from master notebook in synapse
Anyone know how to call single notebook with different parameters parallel, and all notebooks should appear on spark UI to make the trouble shooting easier? I have one child notebook, calling from ...
0
votes
0
answers
25
views
Integrate socket.io namespaces with Node Cluster
I am trying to integrate socket.io with Node's HTTP alongside Node's Cluster Module. Consider the reproducible example:
index.js:
let cluster = require('cluster')
let fs = require('fs')
let http = ...
1
vote
3
answers
245
views
How to convert a batch to Powershell parallel processing to run over thousands of input files?
I have a large number of files (about 3800) that I want to run a program over. The program reads a WAV file and makes a short .TSV text file containing the WAV's lip-sync data (but that is by-the-by ...
0
votes
0
answers
31
views
How to prepare a set of equations for parallel expansion in TFORM?
I’m working with FORM/TFORM to automatically expand a large set of symbolic equations.
My goal is to make the expansion process run in parallel on multiple CPU cores using TFORM.
Here’s a simplified ...
2
votes
1
answer
84
views
Python multiprocessing parallelization
I have a class with methods to simulate sources across 16 detectors using the Gelsa package. In my main script, I call the method generate.sources. I am trying to use multiprocessing to speed up the ...
15
votes
0
answers
620
views
Why does my C++ N-body simulation have a pulsating performance slowdown?
I've been developing a 2D N-body gravity simulation in C++, and I've run into an interesting performance issue. Instead of a stable frame rate, the application's update time systematically pulsates ...
1
vote
1
answer
98
views
AMP deprecated code doesn't compile with lambdas
I am trying to compile some of my older applications, and it doesn't compile where it encounters lambdas. I am well aware that this api is deprecated, but it compiled ok some time ago on the same ...
0
votes
0
answers
63
views
How to decide the data size handled by each processor/core in SIMD?
I’m learning how to use SIMD (Single Instruction, Multiple Data) for parallel data processing.
Suppose I have a large dataset (e.g., an array of 1 million floats), and I want to process it efficiently ...
8
votes
1
answer
238
views
Does std::find still guarantee first element with std::execution::par?
Parallel policy states that order of iterator evaluation is not guaranteed. However, std::find*, std::search and std::mismatch all say that they return first iterator matching condition. How do those ...
1
vote
1
answer
107
views
Is CPU multithreading effected by divergence?
Building on this question here
The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...
0
votes
1
answer
80
views
Spawning multiple tasks but inner function is not executed without error message
I'm trying to spawn multiple parallel (not concurrent) tasks. Every task is running a PUT operation to a custom S3 storage using AWS SDK for Rust.
The function body looks the following (the different ...
0
votes
1
answer
159
views
For Xunit 3, when using [assembly: CaptureConsole], does that work with parallel tests
We have several tests which have been switched to using XUnit 3. We have an assembly-level
[assembly: CaptureConsole(CaptureOut = true, CaptureError = true)]
and we have several "more ...
0
votes
0
answers
48
views
What is the correct way to use the Eval monad in haskell? [duplicate]
I am learning parallel programming in haskell.
My example code:
module Quicksort where
import Control.Parallel.Strategies
import Control.Parallel
qsort :: [Int] -> [Int]
qsort [] = []
qsort lst@(...
1
vote
0
answers
69
views
How to properly parallelise JFreeChart's PNG generation?
I use JFreeChart to create various kinds of charts en masse, based on huge amounts of data. Those charts are only meant to be written to PNG files on the hard drive; no JavaFX, Swing, AWT or other GUI ...
2
votes
0
answers
80
views
Why large number of sparks are GC’d? [duplicate]
I have function (RSA encryption) where end result m depend on m1 and m2 where m1 and m2 can be computed in parallel. I tried to use par and pseq but the result is weak.
Total time 20.062s ( 18....
0
votes
2
answers
94
views
Nested slurm jobs in R using future.batchtools
I am trying to test future.batchtools for parallelisation in R.
I have a small test job (run_futurebatchtools_job.R) as:
library(future)
library(future.batchtools)
# Set up the future plan to use ...
4
votes
1
answer
137
views
Setting seed for nested parallel simulation in R and storing the states of the random-number generator
In R, I am parallelising my simulation using packages foreach, doFuture, and doRNG.
I have two nested foreach loop: the inner loop generate and analyse data for each iteration, and the outer loop ...
0
votes
0
answers
110
views
How to make parallel within the parallel?
Right now I am running a model optimization to optimize one set of parameter for several sites (in total 47 sites, i.e. the cost function sum over these 47 results). Site computation is independent ...
2
votes
1
answer
190
views
Can `Stream.allMatch()` call the predicate multiple times for the same element?
I'm trying to implement a short-circuited processing for an external input of java.util.Stream (think Stream.forEach() but with short-circuiting). I do not care about order of the elements, but if ...
1
vote
1
answer
116
views
How to remove intermediate results after execution of the chord without blocking execution?
Here’s the pattern I want:
Dispatch multiple tasks in parallel.
Aggregate all their results into a final result.
Remove the intermediate results right after the chord result is ready, without ...
-2
votes
1
answer
181
views
will C++ ever have keyword such as "for_parallel"? [closed]
While learning about parallelism, I learned that C++ support parallelism through functions such as std::for_each, std::transform and execution policy. So if, for example, we want to divide elements of ...
0
votes
0
answers
49
views
terra::app() error: 'Not compatible with requested type' when applying function returning numeric vector to SpatRaster in R
I'm working on a function to detect positive and negative events using a time series of cumulative anomalies. The function seems to work fine for vectors and produces the correct number of outputs as ...
0
votes
0
answers
50
views
How to parallelize R code that imports in a spark table so you can collect the data within the foreach loop?
I'm trying to parallelize my R code that pulls in data from a Snowflake table, but when I do, I get an error that I have an invalid connection. I don't receive this error when I do NOT have it ...
2
votes
3
answers
110
views
Open-MP Parallel for (three-dimensional array)
We are working with the following code:
int i, j, k;
for (i = 2; i < n; i++){ // S1
for (j = 3; j < n - 3; j++){ // S2
for (k = 4; k < n - 4; k++){ // S3
A[...
1
vote
0
answers
76
views
cache-efficient partitioning for multithreaded processing in arm
Suppose you are processing a large data set using several cores in parallel. I am looking for the most memory-efficient way to break up the data among the processors.
Specifically, this would be for ...
2
votes
0
answers
60
views
Persistent parallel threads in Panda3d, with args at runtime
I'm writing a physics game, and I'm trying to speed up my motion calculations. Every tick of the update cycle, I call an rk4 routine which calls an ODE function 4 times, passing updated values for dt/...
2
votes
2
answers
353
views
`purrr::in_parallel` extremely slow for rowwise operation on data.frames
I was excited when I read about the latest update of purrr (1.1.0) with its in_parallel capabilities.
I just happen to have a time-consuming data task that runs for several minutes, because:
my data ...
1
vote
0
answers
41
views
Track number of in-use vs. idle workers in Matlab with fmincon
I'm currently in Matlab doing a set of fixed point iterations using fmincon. I use parfor multithreading to do so. However, one of my iterations in the parfor loop goes particularly slow just because ...
1
vote
1
answer
78
views
CoroutineTestExtension not working when parallel test execution is enabled in JUnit5
I have multi module android project where I am doing junit4 to junit5 migration
For CoroutineTestRule I have added CoroutineTestExtension but it's not working well when parallel test execution is ...
2
votes
1
answer
250
views
Why is std::execution::par_unseq slow compared with other parallel options
Sorry I was busy and made a few mistakes. First is that the logics for the various implementations were not the same, and I adjusted accordingly, second is that there is an overflow with float and I ...
1
vote
1
answer
359
views
Unable to set cpu device count for jax parallelisation?
I have been trying to generalise this jax program for solving on both CPU and GPU depending on the machine it's running on (essentially need cpu parallelisation to speed up testing versus gpu for ...
0
votes
0
answers
78
views
Adding OpenMP reduction clause to loop inside a function
I have a function that contains an OpenMP-parallelized for loop, which calls a callback at each iteration, similar to this:
template<class Callback>
void iterate(const Callback& callback, ...
0
votes
0
answers
38
views
Generating vertices of binary hypercube
What are the possible ways to generate all the vertices of boolean hypercube of dimension n (0 to 2^n - 1 in binary representation) using parallel programming? Also, among all the possible ways, ...
1
vote
2
answers
178
views
How to parallel process an apply() function that iterates through rows of data but calls on a list object?
I am attempting to run a calculation on each row of a large dataset using the carcass package. Each row of the dataset contains the values to feed to the function, and a call number for an associated ...
-1
votes
1
answer
75
views
How to avoid interleaved logs in parallel processing?
Naively parallelizing some tasks that involve logging results in interleaved logs:
import logging
from concurrent.futures import ThreadPoolExecutor
from time import sleep
logger = logging.getLogger(...
2
votes
0
answers
104
views
Disable Numpy parallelization inside Numba JIT
The problem is illustrated by the following script, which works correctly if MKL is used for linear algebra operations:
from numba import njit, prange
from numpy import random, dot, empty
from ...
1
vote
0
answers
138
views
Python processes arent being evenly distributed over cpu nodes
I wasn't able to find anything regarding this on the internet: I am using multiprocessing (concurrent.futures. ProcessPoolExecutor(max_workers=(...)) as executor) to execute several DRL training ...