0

I have a batch file that processes scanned PDFs using ghostscript. One of the user prompts is for the resolution of the desired output. I wrote a crude autodetect routine like this:

for /f "delims=" %%a in ('findstr /C:"/Height 1650" %1') do set resdect=150
for /f "delims=" %%a in ('findstr /C:"/Height 3300" %1') do set resdect=300
for /f "delims=" %%a in ('findstr /C:"/Height 6600" %1') do set resdect=600
echo %resdect% DPI detected.

%1 is the filename passed to the batch script.

This should return the the highest resolution detected of some common sizes we see. My question to the community is: Is there a faster or more efficient way to do this other than search the file multiple times?

4
  • 1
    1. it's %%a but not %%aa. 2. write "%~1" instead of %1. 3. resdect is the /Height value divided by 11, right? Commented Apr 5, 2018 at 19:32
  • @aschipfl - "%~1 is not needed - %1 will simply preserve any quotes that may or may not be there. If the file path contains spaces or poison characters, then the value will already be quoted, so it should work. If no space or poison character, then it works either way, with or without quotes. Commented Apr 5, 2018 at 20:10
  • @aschipfl the %%aa was a typo (I manually transcribed the batch from a different machine). Edited code above Commented Apr 5, 2018 at 20:32
  • @dbenham, there might be cases where %1 and "%~1" differ: if a file foo&bar.ext is provided as an unquoted argument, hence foo^&bar.ext, the & is going to appear unquoted when using %1; that is why I recommended "%~1"; I have to admit it's a constructed case though... Commented Apr 5, 2018 at 23:24

4 Answers 4

4

Assuming that the value of RESDECT is the /Height value divided by 11, and that no line contains more than one /Height token, the following code might work for you:

@echo off
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height  *[0-9][0-9]*" "%~1"') do (
    set "LINE=%%A"
    setlocal EnableDelayedExpansion
    set "RESDECT=!LINE:*/Height =!"
    set /A "RESDECT/=11"
    echo/!RESDECT!
    endlocal
)

If you only want to match the dedicated /Height values 1650, 3300, 6600, you could use this:

@echo off
for /F delims^=^ eol^= %%A in ('findstr /I /C:"/Height 1650" /C:"/Height 3300" /C:"/Height 6600" "%~1"') do (
    set "LINE=%%A"
    setlocal EnableDelayedExpansion
    set "RESDECT=!LINE:*/Height =!"
    set /A "RESDECT/=11"
    echo/!RESDECT!
    endlocal
)

To gather the greatest /Height value appearing in the file, you can use this script, respecting the aforementioned assumptions:

@echo off
set "RESDECT=0"
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height  *[0-9][0-9]*" "%~1"') do (
    set "LINE=%%A"
    setlocal EnableDelayedExpansion
    set "HEIGHT=!LINE:*/Height =!"
    for /F %%B in ('set /A HEIGHT/11') do (
        if %%B gtr !RESDECT! (endlocal & set "RESDECT=%%B") else endlocal
    )
)
echo %RESDECT%

Of course you can again exchange the findstr command line like above.


Here is another approach to get the greatest /Height value, using (pseudo-)arrays, which might be faster than the above method, because there are no extra cmd instances created in the loop:

@echo off
setlocal
set "RESDECT=0"
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height  *[0-9][0-9]*" "%~1"') do (
    set "LINE=%%A"
    setlocal EnableDelayedExpansion
    set "HEIGHT=!LINE:*/Height =!"
    set /A "HEIGHT+=0, RES=HEIGHT/11" & set "HEIGHT=0000000000!HEIGHT!"
    for /F %%B in ("$RESOLUTIONS[!HEIGHT:~-10!]=!RES!") do endlocal & set "%%B"
)
for /F "tokens=2 delims==" %%B in ('set $RESOLUTIONS[') do set "RESDECT=%%B"
echo %RESDECT%
endlocal

At first all heights and related resolutions are collected in an array called $RESOLUTIONS[], where the /Height values are used as indexes and the resolutions are the values. The heights become left-zero-padded to a fixed number of digits, so set $RESOLUTIONS[ return them in ascending order. The second for /F loop returns the last arrays element whose value is the greatest resolution.

I do have to admit that this was inspired by Aacini's nice answer.

Sign up to request clarification or add additional context in comments.

3 Comments

will output every Height. According to the question, just the highest one is desired. As it is not sure, whether there is just one or more occurences and if or how they are sorted, findstr with multiple strings needs some postprocessing.
You're right, @Stephan, obviously I didn't read carefully enough; see my updated answer...
I am always amazed by what can be accomplished with a batch file! I couldn't believe it until I saw the data, but for a complex PDF (color scans, lots of 1-bit overlays on 8-bit background), your 1st method is the fastest @ 5.9s, your 4th method (modified to look for specific heights) is the second fastest @ 6.5s, followed by 2nd method@ 14s. My code that I assumed to be slow clocked in at 1.6s (checking for 4 resolutions)
2

get the corresponding line to a variable and work with that instead of the whole file. Instead of your three for loops, you can use just one, when you change the logic a bit:

@echo off
setlocal enabledelayedexpansion
for /f "delims=" %%a in ('findstr /C:"/Height " %1') do (
  set "line=%%a"
  set "line=!line:*/Height =!"
  for /f "delims=/ " %%b in ("!line!") do set "hval=!hval! %%b" 
)
for %%a in (1650,3300,6600) do @(
  echo " %hval% " | find " %%a " >nul && set /a resdect=%%a/11
)
echo %resdect% DPI detected.

A solution with jrepl.bat could look something like:

for /f %a in ('type t.txt^|find "/Height "^|jrepl ".*/Height ([0-9]{4}).*" "$1"^|sort') do set /a dpi==%a / 11

(given, all valid Heights have 4 digits)
Note: for use in batchfiles, use %%a instead of %a
I barely scratched the surface of jrepl - I'm quite sure, there is a much more elegant (and probably faster) solution.

13 Comments

I'm afraid this processes only the last line containing a /Height token, we don't know how many may occur though; anyway, I'd change to find "/Height %%a" in order not to match something like /Width 1650...
@aschipfl: you are (were) completely right. Shouldn't code late at night... Corrected.
I see... ;-) Alright, so you process the entire file now (given it is not bigger than 8 KiB), but still something like /Width 6600 /Height 3300 would result in 6600 (resdect=600) erroneously...
@Stephan Your code so far clocked the fastest by a large margin with a simple (125 pages, all @300 dpi) test file but I get a The input line is too long. The syntax of the command is incorrect. error when trying a more complex file (60 pages scanned in color which usually results in a base image for each page and a number of overlay images where the pdf encoder tries to overlay a 1-bit image. In this file, the line returned by findstr is over 160 char long in some places
Ah - I assumed, there would be a space after the number, but obviously, there is a / (is it reliable?) The for /f %%b loop takes the first token, so adding delims=/ should solve it.
|
2

You may directly convert the Height value into the highest resolution in a single operation using an array. However, to do that we need to know the format of the line that contain the Height value. In the code below I assumed that the format of such a line is /Height xxxx, that is, that the height is the second token in the line. If this is not true, just adjust the "tokens=2" value in the for /F command.

EDIT: Code modified as requested in comments

In this modified code the Height value may appear anywhere in the line.

@echo off
setlocal EnableDelayedExpansion

rem Initialize "resDect" array
for %%a in ("1650=150" "3300=300" "6600=600") do (
   for /F "tokens=1,2 delims==" %%b in (%%a) do (
      set "resDect[%%b]=%%c"
   )
)

set "highResDect=0"
for /F "delims=" %%a in ('findstr "/Height" %1') do (
   set "line=%%a"
   set "line=!line:*/Height =!"
   for /F %%b in ("!line!") do set /A "thisRectDect=resDect[%%b]"
   if !thisRectDect! gtr !highResDect! set "highResDect=!thisRectDect!"
)

echo %highResDect% DPI detected.

1 Comment

Unfortunately, I can't make any assumptions about the line containing height. In one file you might have /Height 3300 on its own line, other times you might see something like 6 0 obj<< /Type /XObject /Subtype /Image /Name /Obj4 /Width 2550 /Height 3300 /ColorSpace /DeviceGray /BitsPerComponent 1 [...]. It all depends on what scanner was used. Open a few PDFs with images in a text editor to see what I mean
0

For the record, the final code was:

setlocal enabledelayedexpansion
set resdetc=0
for /f "delims=" %%a in ('findstr /C:"/Height " %1') do (
  set "line=%%a"
  set "line=!line:*/Height =!"
  for /f "delims=/ " %%b in ("!line!") do set "hval=!hval! %%b" 
)
for %%a in (1650,3300,6600) do @(
  echo " %hval% " | find " %%a " >nul && set /a resdetc=%%a/11
)
if %resdetc%==0   SET resDefault=3
if %resdetc%==150 SET resDefault=1
if %resdetc%==300 SET resDefault=3
if %resdetc%==600 SET resDefault=6

ECHO.
ECHO Choose your resolution
ECHO ----------------------
ECHO 1. 150    4. 400
ECHO 2. 200    5. 500
ECHO 3. 300    6. 600
ECHO.
IF NOT %RESDETC%==0 ECHO 7. Custom    (%resdetc% DPI input detected)
IF     %RESDETC%==0 ECHO 7. Custom
ECHO ----------------------
choice /c 1234567 /T 3 /D %resDefault% /N /M "Enter 1-7 (defaults to %resDefault% after 3 sec.): "
IF errorlevel==7 goto choice7
IF errorlevel==6 set reschoice=600 & goto convert
IF errorlevel==5 set reschoice=500 & goto convert
[...]

Thanks everyone for the help!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.