How can I remove duplicate lines based on specific strings or characters?
For example, I have a file which contains the followings:
https://example.com/?first=one&second=two&third=three
https://example.com/?first=only&second=cureabout&third=theparam
https://example.com/?fourth=four&fifth=five
https://stack.com/?sixth=six&seventh=seven&eighth=eight
https://stack.com/?sixth=itdoesnt&seventh=matter&eighth=something
I want it to make lines unique based on strings parameters, and print the only one URL with the same parameters, and of course recognize their domains. Values are not important.
The desired result:
https://example.com/?first=one&second=two&third=three
https://stack.com/?sixth=six&seventh=seven&eighth=eight
UPDATE
In the following code I'm trying to grep 3 characters before = and if lines contain that specific character then unique lines and print the result.
Actually goal is to make the file unique if they have certain number of similar parameters.
for url in $(cat $1); do
# COUNT NUMBER OF EQUAL CHARACTER "="
count_eq=$(echo $url | sed "s/=/=\n/g" | grep -a -c '=')
if [[ $count_eq == "3" ]]; then
# GREP 3 CHARACTERS BEFORE "="
same_param=$(printf $url | grep -o -P '.{0,3}=.{0,0}' | sort -u)
if [[ $url == *"$same_param"* ]];then
sort -u "$url" | printf "$url\n"
fi
fi
done
Thanks.