I have a dataframe df which contains a single column GO. Each row in df contains either one term or multiple terms (separated by ;) and each term has a specific format - it starts with either P, C or F and is followed by a : and then the actual term.
df <- data.frame(
GO = c("C:mitochondrion; C:kinetoplast", "", "F:calmodulin binding; C:cytoplasm; C:axoneme",
"", "P:cilium movement; P:inner dynein arm assembly; C:axoneme", "", "F:calcium ion binding"))
GO
1 C:mitochondrion; C:kinetoplast
2
3 F:calmodulin binding; C:cytoplasm; C:axoneme
4
5 P:cilium movement; P:inner dynein arm assembly; C:axoneme
6
7 F:calcium ion binding
I want to split this column into three columns BP, CC, MF based on whether the terms start with a P, C or an F respectively. Also I want the three columns to have only the terms and not the other identifiers (P, C, F and :).
This is what I want my new dataframe to look like:
BP CC MF
1 mitochondrion; kinetoplast
2
3 cytoplasm; axoneme calmodulin binding
4
5 cilium movement; inner dynein arm assembly axoneme
6
7 calcium ion binding