Introduction
There are a couple of questions to be answered before offering a final solution (e.g. the ordering of the elements in colleagues array after replacing some), but I don't want to drag this along too long. Let's have a look at the very common approach to solve problems like this.
Solution
Since colleagues column is an array column (and Spark it very effective at queries over rows) you should first explode (or posexplode) it. With rows per array element you can do necessary changes and in the end collect_list to have the array column back.
explode(e: Column): Column
Creates a new row for each element in the given array or map column.
posexplode(e: Column): Column
Creates a new row for each element with position in the given array or map column.
Let's use the following names dataset:
val names = Seq((Array("guy1", "guy2", "guy3"), "Thisguy")).toDF("colleagues", "name")
scala> names.show
+------------------+-------+
| colleagues| name|
+------------------+-------+
|[guy1, guy2, guy3]|Thisguy|
+------------------+-------+
scala> names.printSchema
root
|-- colleagues: array (nullable = true)
| |-- element: string (containsNull = true)
|-- name: string (nullable = true)
Let's explode, do the changes and in the end collect_list.
val elements = names.withColumn("elements", explode($"colleagues"))
scala> elements.show
+------------------+-------+--------+
| colleagues| name|elements|
+------------------+-------+--------+
|[guy1, guy2, guy3]|Thisguy| guy1|
|[guy1, guy2, guy3]|Thisguy| guy2|
|[guy1, guy2, guy3]|Thisguy| guy3|
+------------------+-------+--------+
That's what Spark SQL can handle with ease. Let's use regexp_replace (What? regexp?! And now you've got two problems :)).
val replaced = elements.withColumn("replaced", regexp_replace($"elements", "guy2", "guy10"))
scala> replaced.show
+------------------+-------+--------+--------+
| colleagues| name|elements|replaced|
+------------------+-------+--------+--------+
|[guy1, guy2, guy3]|Thisguy| guy1| guy1|
|[guy1, guy2, guy3]|Thisguy| guy2| guy10|
|[guy1, guy2, guy3]|Thisguy| guy3| guy3|
+------------------+-------+--------+--------+
In the end, let's group by the initial array column and use collect_list grouping function.
val solution = replaced
.groupBy($"colleagues" as "before")
.agg(
collect_list("replaced") as "after",
first("name") as "name")
scala> solution.show
+------------------+-------------------+-------+
| before| after| name|
+------------------+-------------------+-------+
|[guy1, guy2, guy3]|[guy1, guy10, guy3]|Thisguy|
+------------------+-------------------+-------+
Alternative Solutions
User-Defined Function (UDF)
Alternatively, you could also write a custom user-defined function, but that would not benefit from as many optimizations as the solution above so I'd not recommend it (and will only show on request).
Custom Logical Operator
The best approach would be to write a custom logical operator (a LogicalPlan) that would do all this and participate in optimizations, but avoid exchanges (introduced by groupBy). That'd however be a fairly advanced Spark development that I have not done yet.