1

I don't usually work in bash but grep could be a really fast solution in this case. I have read a lot of questions on grep and variable assignment in bash yet I do not see the error. I have tried several flavours of double quotes around $pattern, used `...`` or $(...) but nothing worked.

So here's what I try to do: I have two files. The first contains several names. Each of them I want to use as a pattern for grep in order to search them in another file. Therefore I loop through the lines of the first file and assign the name to the variable pattern. This step works as the variable is printed out properly. But somehow grep does not recognize/interpret the variable. When I substitute "$pattern" with an actual name everything is fine as well. Therefore I don't think the variable assignment has a problem but the interpretation of "$pattern" as the string it should represent.

Any help is greatly appreciated!

#!/bin/bash

while IFS='' read -r line || [[ -n $line ]]; do
    a=( $line )
    pattern="${a[2]}"
    echo "Text read from file: $pattern"
    var=$(grep "$pattern" 9606.protein.aliases.v10.txt)
    echo "Matched Line in Alias is: $var"
done < "$1"


> bash match_Uniprot_StringDB.sh ~/Chromatin_Computation/.../KDM.protein.tb

output:

Text read from file: "UBE2B" 
Matched Line in Alias is: 
Text read from file: "UTY"
Matched Line in Alias is: 

EDIT The solution drvtiny suggested works. It is necessary to get rid of the double quotes to match the string. Adding the following lines makes the script work.

pattern="${pattern#\"}"
pattern="${pattern%\"}"
3
  • Without seeing the pattern and a sample of your input files (KDM.protein.tb and 9606.protein.aliases.v10.txt), it's difficult to determine where the problem is. What I see in your question doesn't look particularly wrong, so the issue may be in how you're interpreting the regular expression that is in $pattern. You know that grep uses a regex, right? Commented Jul 23, 2015 at 10:17
  • Also, are the lines in KDM.protein.tb perhaps only two fields long (i.e. $pattern is the last "word" on the line) and the file was generated in Microsoft Windows? If that sounds true, then you may be dealing with Ctrl-M characters on the ends of each line, which Windows considers part of the "newline" but unix considers part of the last word of each line. You haven't mentioned what platform you're using or where the files came from, so there's no way for us to know if this is the problem. Commented Jul 23, 2015 at 10:21
  • The lines in KDM.protein.tb are three fields long; it is a tab-separated text file generated in ubuntu. Interesting point, though. Commented Jul 23, 2015 at 13:40

1 Answer 1

1

Please, look at "-f FILE" option in man grep. I advise that this option do exactly what you need without any bash loops or such other "hacks" :)

And yes, according to the output of your code, you read pattern including double quotes literally. In other words, you read from file ~/Chromatin_Computation/.../KDM.protein.tb this string:

"UBE2B"

But not

UBE2B
  • as you probably expect.

Maybe you need to remove double quotes on the boundaries of your $pattern?

Try to do this after reading pattern:

pattern=${pattern#\"}
pattern=${pattern%\"}
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly that was the problem. Thanks for the hint. But adding a backslash after # and % is additionally needed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.