0

here is 1st input - which contain 20000 strings.

X   10063445    10098579    X:10063445|10098579 
X   101020487   101021315   X:101020487|101021315   
X   101041317   101042312   X:101041317|101042312   
X   101120402   101120784   X:101120402|101120784   
X   101126709   101148161   X:101126709|101148161   
X   107088436   107088839   X:107088436|107088839   
X   110020352   110067396   X:110020352|110067396

2nd input file-

X   10063445    10098579    2
X   11055936    11110981    2
X   13666317    13680598    5
X   14843660    14859334    13
X   14850505    14859334    5
X   16818574    16829770    2
X   19541925    19546050    4
X   19683823    19695741    4
X   19965044    19970298    2
X   20188497    20204103    2
X   24073601    24074959    11
X   24172715    24179770    9
X   24179183    24179770    2
X   24540246    24546477    2
X   24809898    24843677    4
X   24809898    24888122    3
X   38666121    38687674    2
X   44524002    44527365    8
X   45010961    45020730    3
X   45010961    45037689    2
X   46984884    46998277    2
X   47222261    47228644    2

till now i used bedtools intersect to intersect of both file but it give result only of intersect and i also want which are not intersect also in the same result file. i use command--

bedtools intersect -wa -wb -a input1 -b input2 -f 1 -r >intersect.bed

So is there any way to include result of both intersect and not intersect in same intersect.bed file like this i want my result --

X   10063445    10098579    X:10063445|10098579     X   10063445    10098579    2
X   101020487   101021315   X:101020487|101021315   
X   101041317   101042312   X:101041317|101042312   X   101041317   101042312   3
X   101120402   101120784   X:101120402|101120784   
X   101126709   101148161   X:101126709|101148161   X   101126709   101148161   4
X   107088436   107088839   X:107088436|107088839   X   107088436   107088839   4
X   110020352   110067396   X:110020352|110067396   
X   110020352   110109146   X:110020352|110109146   X   110020352   110109146   3
X   110067347   110109146   X:110067347|110109146   X   110067347   110109146   4
X   11055936    11110981    X:11055936|11110981 

so here i expected output result like this which include both intersect and not intersect . thanks

3
  • 1
    Please read How do I format my code blocks and then edit your question. Commented May 17, 2020 at 13:22
  • Am I right in thinking that the sample output cannot be achieved with your sample input? Commented May 18, 2020 at 12:40
  • What is bedtools? Can you link to some documentation? Commented May 18, 2020 at 12:40

1 Answer 1

1

Im quite sure it can be done with awk..anyways I liked the problem. Its not the most time effective solution.

file1='file1'
file2='file2'
file_new='new_file'
file_not_matched='not_matched'
delimiter='\t' #when joining strings in the new file

true > $file_new 
true > $file_not_matched

IFS=$'\n'
#walk file1
for line1 in `cat $file1`; do
        line1_match=`echo $line1 | awk '{print $2 FS $3}'`
        echo -n "$line1" >> new_file

        #walk file2
        for line2 in `cat $file2`; do
                line2_match=`echo $line2 | awk '{print $2 FS $3}'`

            #test lines
            if [ "$line1_match" == "$line2_match" ];
                then

                    echo -e "$delimiter$line2" >> new_file
                    continue 2
            fi
        done
        echo "" >> new_file
        echo $line1 >> not_matched
done

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.