1

I have a DataFrame and I want to count the uniqe lines of two columns in this Data Frame. For example:

a x
a x
a y
b y 
b y
b y

should be to:

a x 2
a y 1
b y 3

I know the solution for this operation in pandas DataFrame, but now I want to do it direct in Java (the best way is Java 8).

2 Answers 2

3

I am not sure what input type you have, but assuming you have a List<DataFrame> list and DataFrame implements equals/hashcode as expected, you could use a combination of two collectors:

Map<DataFrame, Long> count = list.stream().collect(groupingBy(x -> x, counting()));

which requires the following static imports:

import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
Sign up to request clarification or add additional context in comments.

1 Comment

I have input type DataFrame and the type of the columns is String.
0

I have found the next solution by myself. Copy here, if somebody has an interest....

DataFrame df2 = df.groupBy("Column_one", "Column_two").count();
df2.show();

2 Comments

Where’s the relationship to Java 8?
If you have the shorter solution in Java 8, you are welcome.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.