3

New to PowerShell and trying hard to get something out of the text file.

Here is sample data:

RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1 
DEFSDLSERD FSGHS CONRFGL CEERTE 
ASDF DFGF ASDA ERDFG REEVHT 
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS 
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION 
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- --------------------- 
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH 
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH 
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH 
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY 
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH 
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~, 
- w 
J, _ 
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7 
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH 

I'm trying to extract the bold fields from each line.

The output I'm looking for:

MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133

What I'm getting currently:

MK12001 3160
C2100123
MK13103 3160
C2100124
MRDOOP 3160
C21005237
JPPK 3160
C2123133

Problem:

Because I'm using "3160" for my match criteria for my first field so it's showing up in the results as well and for my second field which is a ticket# (C1234567), due to the use of "pipe" operator or "second" search/match criteria its going in the next line, If someone can help me to keep the ticket # in the same field then I guess I can live with having "3160" in between so it will look like

MK12001 3160 C2100123

or if someone can suggest me to only display the bold fields i.e before the 3160 then that would be awesome.

MK12001 C2100123

P.S: with my script, I'm already changing the "0" to "C" in ticket # field (C1234567)

Here is the code so far:

#Location of original file
$Location = "C:\Temp\Ap8.txt"
#Location of file where the "0" is replaced with "C"
$Location2 = "C:\Temp\results9.txt"
#Final results
$Location3 = "C:\Temp\tickets9.txt"

#get the original file
$Change = Get-Content $Location

# replace C with 0
$Change | ForEach-Object {$_ -Replace "3160 311 50 0",  "3160 311 50 C"} | 

#write the results to staging file
Set-Content $Location2

#get the staging/udpated file
Get-Content $Location2 -Raw |

#look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
Select-String "\s\w{1,8}\s3160| C\d{7}" -AllMatches |


  % { $_.Matches.Groups.Value } |
  Out-File $Location3 -Encoding ascii -Force
8
  • 2
    how many lines are supposed to be in the data? the way you posted it seems to have an extra line break in each "line" of data. Commented Apr 8, 2022 at 23:04
  • 1
    Can we assume that RD32 and 3160 are constant values and what you're looking for, for the first match is between these constants? And for the second match, can we assume it's always a 8 numeric digits value followed by 001 ? Commented Apr 8, 2022 at 23:31
  • 1
    What created that text file. Are we seeing it exactly as-is? (including the typo MFiNDATORY) ? Please open it in notepad. Copy a couple of lines and paste it in your question Commented Apr 9, 2022 at 9:10
  • Hi @Lee_Dailey, in the original text file there are around 500-600 lines what I showed here is a sample set. Commented Apr 9, 2022 at 15:37
  • Hi @SantiagoSquarzon for the first field we can't assume RD32 as constant as if you see in the 6th line RD32 and my field has some other field in between so that is why I was trying to match it from behind and yes for the second match it's always an alphanumeric and I was able to successfully fetch that item. Commented Apr 9, 2022 at 15:39

4 Answers 4

2

i am truly bad with complex regex patterns, so this is done with string operators and only very simple regex patterns. [grin]

the code ...

#region >>> fake reading in a text file
#    when ready to do thiw with real data, use Get-Content
$InStuff = @'
RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY 
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
'@ -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file

$Result = foreach ($IS_Item in $InStuff)
    {
    $TempBlock = ($IS_Item -split 'server')[0].trim()
    $First = (($TempBlock -split '3160')[0].Trim().Split())[-1]
    $Second = (($TempBlock -split '3160')[1].Trim().Split())[-1] -replace '\d{3}$' -replace '^0', 'C'

    '{0} {1}' -f $First, $Second
    }

$Result

output ...

MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133

what it does ...

  • fakes reading in a text file
    when doing this with real data, use Get-Content.
  • iterates thru the collection of lines
  • grabs the block that has the wanted data
  • splits out, trims, and saves the 1st target value
  • splits out, trims, and saves the 2nd target value
  • builds the output string from the 2 above values
  • sends that to the $Result collection
  • shows that collection on screen
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you so much @Lee_Dailey for your help. but I'm getting an error message that may be not using the get-content properly. below is the code I'm trying
' '$Location = "C:\Temp\Ap8.txt" #region >>> fake reading in a text file # when ready to do thiw with real data, use Get-Content Get-Content $Location $InStuff = @' '@ -split [System.Environment]::NewLine #endregion >>> fake reading in a text file $Result = foreach ($IS_Item in $InStuff) { $TempBlock = ($IS_Item -split 'server')[0].trim() $First = (($TempBlock -split '3160')[0].Trim().Split())[-1] $Second = (($TempBlock -split '3160')[1].Trim().Split())[-1] -replace '\d{3}$' -replace '^0', 'C' '{0} {1}' -f $First, $Second } $Result '
and i also want to print this file to some output location
@GR - [1] test the code i posted FIRST as is. then, after it works as needed, switch to your own data source. ///// [2] remove the ENTIRE region block and replace it with the command to load your text file. ///// [3] send the $Result collection to a file with Set-Content.
@GR - you are welcome! [grin] ///// test things at each step to find out where the failure is. if you suspect that split is not working correctly, check that step to see what is actually happening as opposed to what you expected.
|
2

I suggest a regex approach based on the -match and -replace operators:

# The substring that the lines of interest must contain
# Note:
#  [regex]::Escape() escapes the literal string so that the regex
#  engine uses it literally - which isn't strictly necessary in this case.
#  Alternatively, omit [regex]::Escape() and formulate the string
#  *as a regex* to begin with.
$searchStr = [regex]::Escape(' 3160 311 50 ')

# Filter the lines down to those of interest with -match,
# then use -replace to extract the tokens on either side of the search string.
@(Get-Content $Location) `
  -match $searchStr `
  -replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'

The above outputs to the display; pipe to Out-File (or, with text input, preferably, Set-Content) as needed.

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

Output with your sample data:

MK56502 C2156334
CD15622 C3214114
AK12102 C2652224
HJPOOL C6585527
RAAK C2197133
NUMJ13 C5501950
AY51619 C2177107

Alternatively, use a switch statement, which requires only one regex operation, but requires calling via a script block (& { ... }) in order to be able to pipe to a file-writing cmdlet:

& {
  switch -Regex -File $Location {
    ' (\w+) 3160 311 50 0(\w+)\d{3} ' { '{0} C{1}' -f $Matches[1], $Matches[2] }
  }
} # | Set-Content ...

11 Comments

Hello, @mklement0 really appreciate you taking the time and helping. I tried the code but it's not displaying anything and even with 'Set-Content' and 'Out-File' it's generating the empty file. ` $Location = "C:\Temp\ap8.txt" $Location2 = "C:\Temp\file00.txt" $searchStr = ' 3160 211 50 ' @(Get-Content $Location) ` -match $searchStr ` -replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2' | Set-Content $Location2 #Out-File $Location2 -Encoding ascii -Force `
you are correct i saved the above file and it worked smoothly, I want to clear few things: 1. the match string is (' 3160 211 50 ') the reason I'm using '3160 311 50 0' at one place is as I have to replace all the zero's coming at that spot to C 2. I added sort of version of my text file but it also contain more data as well, Let me add snapshot of data here as well as I will see if I can update the question and post it there as well and when I try to use this smaller version of data then the code doesn't work, this data still have the same match query but for some reason its not working.
@GR, scratch my previous comment: I am confused as to what your real requirements are: What does "zeros coming at that spot to C 2" mean? Do you need to match both 211 and 311, selectively? Your current sample data contains only 311, so I've adjusted my answer for now, but please clarify your requirements and also update the expected output part of your question.
Thank you so much dear and totally make sense, probably I will open a separate question.
@GR, please follow the rege101.com link in the answer to learn how the regex works. There you can also experiment with it interactively. If you need assistance after that, please ask a new question.
|
1

Thanks, everyone for your help. Obviously, sometimes it's hard to post the original data (privacy) and also to fully express the problem but I'm so glad to be part of this community as everyone tries their best to help.

Original Problem: We receive a hard copy of a pdf, then we scan it and convert it using the OCR functionality of the printer and then convert it into a text file. But during the whole process, I lose some content and some typos are created. and I want to fetch two fields out of the whole text files which are available in random files.

  1. The first solution is working to fetch all the good matching tickets but doesn't include typo's in the output

  2. The second solution gives me all the data including the good matching tickets as well as tickets that are not matching but it doesn't allow me to add an extra exception by which I can add a condition to lower the number of bad/typo tickets.

#Solution 1:

  #Location of original file
    $Location = "C:\Temp\Ap8.txt"
  #Location of file where the "0" is replaced with "C"
    $Location2 = "C:\Temp\file2.txt"
  #Location of the file where the in-between string ' 3160 311 50 ' is replaced with nothing so that you are left with only the fields you need.
    $Location3 = "C:\Temp\file3.txt"
#Add the final results
    $Location4 = "C:\Temp\file4.txt"
  
    
    #get the original file
    $Change = Get-Content $Location
    
    # replace C with 0
    $Change | ForEach-Object {$_ -Replace "3160 311 50 0",  "3160 311 50 C"} | 
    
    #write the results to staging file
    Set-Content $Location2
    
    #get the staging/udpated file
    Get-Content $Location2 -Raw |
    
    #look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
    Select-String "\s\w{1,8} 3160 311 50 \w{1,8}" -AllMatches |
    #Select-String "\s\w{1,8} 3160 311 50 \w{7}" -AllMatches |
    
    
      % { $_.Matches.Groups.Value } |
    
      #Set-Content $Location3
    
      # to write the content on location3 the code line just below also works start with out-file
    
      Out-File $Location3 -Encoding ascii -Force
    
    
    $PlateTicket =  Get-Content $Location3  
    
    $PlateTicket | ForEach-Object {$_ -Replace " 3160 311 50",  ""} | 
    
    Set-Content $Location4 `

#Solution 2:

# The substring that the lines of interest must contain
$Location = "C:\Temp\Ap8.txt"
$Location2 = "C:\Temp\Results.txt"
$searchStr = [regex]::Escape(' 3160 311 50 ')


@(Get-Content $Location) `
  -match $searchStr `
  -replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'|

    Set-Content $Location2

Credits to all the contributors/ original posters.

Thanks again, everyone. Just posting a summary, it's not my original work.

Comments

0

you could use a "simpler" one line command

So given your both example as input.txt

RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET 
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY 
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME 
PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY 
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME 
SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1 
DEFSDLSERD FSGHS CONRFGL CEERTE 
ASDF DFGF ASDA ERDFG REEVHT 
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS 
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION 
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- --------------------- 
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH 
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH 
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH 
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY 
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH 
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~, 
- w 
J, _ 
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7 
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH 

we can use a cmd file where you could pre filter out the bad lines but here I am accepting it is every line with a valid 3160 and you don't need the second line starting echo: its just for explanation of search result

@echo off & SETLOCAL EnableDelayedExpansion
echo: & echo      Filtered lines with 3160 & echo: & findstr /c:3160 input.txt & echo: & echo      Modified filtered output & echo:
for /f "tokens=1,2,3,4,5,6,7 usebackq" %%A in (`type input.txt ^|findstr " 3160"`) do @if %%C==3160 (set "num=%%F" & echo %%B C!num:~1,10! ) ELSE ( if %%D==3160 (set "num=%%G" & echo %%C C!num:~1,10!))

Result (including what seems to be one rogue one, and one bad one) NOTE I left the length in both 2nd cases as 10! to highlight the error one but you will need to change both to 7! :-)


     Filtered lines with 3160

RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH

     Modified filtered output

MK12001 C2100123001
MK13103 C2100124001
MRDOOP C2100523700
JPPK C2123133001
SDFGTE: CGHJINCIAL
MK56502 C2156334001
CD15622 C3214114001
AK12102 C2652224001
HJPOOL C6585527001
RAAK C2197133001
NUMJ13 C5501950001
AY51619 C2177107001

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.