6

I have a text file containing data like this:

This is just text
-------------------------------
Username:          SOMETHI           C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        2028aaB           Start time:        31-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEGG            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        20dd33DB          Start time:        12-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEYY            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        202223DB          Start time:        15-DEC-2010 20:27:15.30

Is there a way to extract Username, Finish time, Start time from this kind of data? I'm looking for some starting point usign R or Powershell.

4 Answers 4

8

R may not be the best tool to process text files, but you can proceed as follows: identify the two columns by reading the file as a fixed-width file, separate the fields from their value by splitting the strings on the colons, add an "id" column, and put everything back in order.

# Read the file
d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE)

# Separate fields and values
d <- d[grep(":", d$V1),]
d <- cbind( 
  do.call( rbind, strsplit(d$V1, ":\\s+") ), 
  do.call( rbind, strsplit(d$V2, ":\\s+") ) 
)

# Add an id column
d <- cbind( d, cumsum( d[,1] == "Username" ) )

# Stack the left and right parts
d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] )
colnames(d) <- c("id", "field", "value")
d <- as.data.frame(d)
d$value <- gsub("\\s+$", "", d$value)

# Convert to a wide data.frame
library(reshape2)
d <- dcast( d, id ~ field )
Sign up to request clarification or add additional context in comments.

3 Comments

What would be your tool for working with text files? Perl, Ruby perhaps?
@RomanLuštrik: I would personally use Perl, because I am familiar with it, but Python or Ruby should prove equally good solutions. I usually prefer to do all the preprocessing separately, so that R only has to read csv files or tables in a database.
R is insanely slow at parsing text files go with Perl or Python instead.
2

These are just guidelines of how I would approach the problem. I'm sure there's a more fancy way of doing it. Possibly including plyr. :)

rara <- readLines("test.txt") # you could use readLines(textConnection = "text"))

# find usernames
usn <- rara[grepl("Username:", rara)]
# you can find a fancy way to split or weed out spaces
# I crudely do it like this:
unlist(lapply(strsplit(usn, "      "), "[", 2)) # 2 means "extract the second element"

# and accounts
acc <- rara[grepl("Account:", rara)]
unlist(lapply(strsplit(acc, "      "), "[", 2))

You can use str_trim() to remove whitespace before/after the word. Hope there's enough pointers to get you going.

Comments

2

Here's a Powershell solution:

$result = @()

get-content c:\somedir\somefile.txt |
foreach {
    if ($_ -match '^Username:\s+(\S+)'){
        $rec = ""|select UserName,FinishTime,StartTime
        $rec.UserName = $matches[1]
        }
    elseif ($_ -match '^Account.+Finish\stime:\s+(.+)'){
        $rec.FinishTime = $matches[1]
        }
    elseif ($_ -match '^Process\sID:\s+\S+\s+Start\stime:\s+(.+)'){
        $rec.StartTime = $matches[1]
        $result += $rec
        }
}
$result

Comments

0

Do you have your file in a data frame? Like the column names would be Username, Process ID, Start time... If so, you can easly extract it by

df$Username (where df is your data frame and if you want to see all your usernames)
df$FinishTime

If you want to know everything about a user with a certain name, use this

df[df$username == "SOMETHI",]

If you want to know a user with a finish time..

Hope this can be a starting point. Let me know if sth is not clear.

1 Comment

I think he's trying to extract the data so that he can put it in a data.frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.