How to efficiently modify blocks in Rust array using rayon in parallel?

Ask Question

Asked 9 months ago

Modified 9 months ago

Viewed 83 times

I'm trying to build a scientific computing software with Rust, which requires manipulation of the matrix during operation. A typical matrix operation is to append sub-blocks of small matrices to a specific position of a large matrix. Let me demonstrate this process using the following formula:

Suppose I have a matrix M_{master} with 8 rows and 8 columns, and all elements are 0 in the initial state：

There are also two 6-row 6-column matrices M_{slave1} and M_{slave2}:

Step 1) Append write the four sub-blocks in M_{slave1} into M_{master}, we get:

Step 2) Append write the four sub-blocks in M_{slave2} into M_{1}, we get:

The green element in M_{master12} is located at the position where both M_{slave1} and M_{slave2} need to be written.

For the problem described above, I have implemented a single-threaded solution with the help of nested for loops. Here is my code and the results:（You can run the single-threaded version of the code directly in the Rust Playground and see the results: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=cc5db512a4445ae3a92b1668e0611eba

use rayon::prelude::*;
use std::time::Instant;

fn main() {
    const R: usize = 8;
    const C: usize = 8;

    const R_SUB1: usize = 6;
    const C_SUB1: usize = 6;

    const R_SUB2: usize = 6;
    const C_SUB2: usize = 6;

    let mut arr1: [[u8; C]; R] = [[0; C]; R];
    for row in 1..=R {
        for col in 1..=C {
            arr1[row - 1][col - 1] = 0;
        }
    }
    arr2d_print(arr1);

    let sub1: [[u8; C_SUB1]; R_SUB1] = [[9; C_SUB1]; R_SUB1];
    arr2d_print(sub1);
    let cpld1: Vec<usize> = vec![0, 1, 3]; //Parameters related to the position where the sub-block is written, not the direct position

    let sub2: [[u8; C_SUB2]; R_SUB2] = [[90; C_SUB2]; R_SUB2];
    arr2d_print(sub2);
    let cpld2: Vec<usize> = vec![0, 2, 3];

    let subs = vec![sub1, sub2];
    let cplds = vec![cpld1, cpld2];

    let t1 = Instant::now();
    let _ = cplds
        .iter()
        .zip(subs.iter())
        .map(|(cpld, sub)| sqare_matrix_block_fill(&mut arr1, sub, cpld))
        .count();
    let tot1 = t1.elapsed();
    arr2d_print(arr1);
    println!("The single-threaded version takes time: {:?}\n", tot1);
}

fn sqare_matrix_block_fill<const D1: usize, const D2: usize>(
    master: &mut [[u8; D1]; D1],
    slave: &[[u8; D2]; D2],
    cplds: &[usize],
) {
    if D2 != 2 * cplds.len() {
        panic!("!!! Wrong !!!");
    }

    for idx in 0..cplds.len() {
        for idy in 0..cplds.len() {
            for row in 0..2 {
                for col in 0..2 {
                    master[cplds[idx] * 2 + row][cplds[idy] * 2 + col] +=
                        slave[idx * 2 + row][idx * 2 + col];
                }
            }
        }
    }
}

fn arr2d_print<const D: usize>(arr: [[u8; D]; D]) {
    for i in 0..D {
        for j in 0..D {
            print!("{:-4.1} ", arr[i][j]);
        }
        print!("\n");
    }
    println!("");
}

The result of running the above code：

Standard Output
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 
   0    0    0    0    0    0    0    0 

   9    9    9    9    9    9 
   9    9    9    9    9    9 
   9    9    9    9    9    9 
   9    9    9    9    9    9 
   9    9    9    9    9    9 
   9    9    9    9    9    9 

  90   90   90   90   90   90 
  90   90   90   90   90   90 
  90   90   90   90   90   90 
  90   90   90   90   90   90 
  90   90   90   90   90   90 
  90   90   90   90   90   90 

  99   99    9    9   90   90   99   99 
  99   99    9    9   90   90   99   99 
   9    9    9    9    0    0    9    9 
   9    9    9    9    0    0    9    9 
  90   90    0    0   90   90   90   90 
  90   90    0    0   90   90   90   90 
  99   99    9    9   90   90   99   99 
  99   99    9    9   90   90   99   99 

The single-threaded version takes time: 4.561µs

Now my question is: how can I build a parallel version of the sqare_matrix_block_fill function using Rayon?

I know that there will inevitably be data races on the locations that are written repeatedly in large matrices, but hey, this is exactly why I came to Stack Overflow, sincerely begging for help!

edited Mar 4 at 1:56

asked Mar 4 at 1:34

Mike

455 bronze badges

The reason I have this requirement is: when a large number of append operations need to be done on sub-blocks at different positions of a large matrix, the single-threaded version of the program takes a lot of time. I hope to use Rayon to implement a parallel computing matrix assembly function to cope with the scenario of large-scale matrix sub-block operations.

Mike
– Mike

2025-03-04 01:57:53 +00:00
Commented Mar 4 at 1:57
My first thought would be use rayon with an array of AtomicU8 elements. You'd have to test whether the overhead of threads is worth it. I would also consider breaking the main matrix into chunks, each protected by a Mutex, and lock a whole chunk at a time.

BallpointBen
– BallpointBen

2025-03-04 03:36:54 +00:00
Commented Mar 4 at 3:36
@BallpointBen Great idea! Could you please provide some code that solves the above example problem so I can understand your thinking more concretely?

Mike
– Mike

2025-03-04 06:32:09 +00:00
Commented Mar 4 at 6:32
You can use split_at_mut to split the data into non-overlapping parts that you can distribute to scoped threads.

Jmb
– Jmb

2025-03-04 07:40:44 +00:00
Commented Mar 4 at 7:40
See e.g. this answer for an example of combining split_at_mut() with Rayon parallel processing using safe code.

user4815162342
– user4815162342

2025-03-04 10:16:50 +00:00
Commented Mar 4 at 10:16

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to efficiently modify blocks in Rust array using rayon in parallel?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked