Find string after "X" date - string

I have a folder with 300+ text files; I am trying to create a batch script that will find anything after a certain date with the following lines within each text:
---------- \SC####SVR####\E$\USERS\SC####POS####\E2ELOGS\PED_20141116_110913.DBG: 1
As indicated date format would be YYYYMMDD
For example:
set filedatetime=10/11/2014 12:26
set filedatetime=%filedatetime:~6,4%%filedatetime:~3,2%%filedatetime:~0,2%
echo "%filedatetime%"
FINDSTR "%FILEDATETIME%" C:\RESULTS\*.TXT
And if the findstr result is GTR than 20141110 echo the line out to another txt file.

If the date portion of the string is always in the same location, you can simply use the string handling functions to parse it out. At that point, convert it to an integer if necessary, then compare the result with your target date.

#ECHO OFF
SETLOCAL
SET filedatetime=20141116
SET "sourcedir=U:\sourcedir\t w o"
(
FOR /f "tokens=1-3delims=_" %%a IN ('findstr /r ".*_.*_.*" "%sourcedir%\*.txt"') DO (
IF %%b geq %filedatetime% ECHO %%a_%%b_%%c
)
)>newfile.txt
TYPE newfile.txt
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances.
Produces a new file newfile.txt
I used a fixed date for convenience of testing.
geq produces lines equal to or greater than the date selected. gtr would yield strictly greater than.
Here's my test data (I've abbreviated it)
-S\PED_20140129_110913.DBG: 1
-S\PED_20140229_110913.DBG: 1
-S\PED_20140329_110913.DBG: 1
-S\PED_20140429_110913.DBG: 1
-S\PED_20140529_110913.DBG: 1
-S\PED_20140629_110913.DBG: 1
-S\PED_20140729_110913.DBG: 1
-S\PED_20140829_110913.DBG: 1
-S\PED_20140929_110913.DBG: 1
-S\PED_20141029_110913.DBG: 1
-S\PED_20141129_110913.DBG: 1
-S\PED_20141229_110913.DBG: 1
And results:
U:\sourcedir\t w o\extra.txt:-S\PED_20141129_110913.DBG: 1
U:\sourcedir\t w o\extra.txt:-S\PED_20141229_110913.DBG: 1
I would suggest that the problem is in the filenames section - if underscores appear there, then that an incorrect string would be taken for comparison.
You could test this with (replacement if statement)
IF %%b geq %filedatetime% ECHO "%%b" geq "%filedatetime%"
which would show which strings are being compared.
This should fix that problem:
#ECHO OFF
SETLOCAL
SET filedatetime=20141116
SET "sourcedir=U:\sourcedir\t w o"
(
FOR /f "tokens=1,2,*delims=:" %%p IN ('findstr /r ".*_.*_.*" "%sourcedir%\*.txt"') DO (
FOR /f "tokens=1-3delims=_" %%a IN ("%%r") DO (
IF %%b geq %filedatetime% ECHO %%p:%%q:%%a_%%b_%%c
)
)
)>newfile.txt
TYPE newfile.txt
GOTO :EOF
which separates-out the filename from the data, then processes the data alone.

#echo off
setlocal
set /a filedatetime=20141124
set sourcedir=
set outfile=E:\outfile.txt
REM There are 3 underscores per record in the input files, creating 4 tokens. The third token (%%c) is the date in the filename.
REM Due to the presence of at least one backup file named "BACK_*", which adds a fourth underscore, we filter that out.
for /f "tokens=1-4 delims=_" %%a in ('findstr /r ".*_.*_.*" "%sourcedir%\*.txt"') DO (
if %%c geq %filedatetime% echo %%a_%%b_%%c_%%d | find /I /V "BACK" >>%outfile%
)
rem type newfile.txt
goto :EOF
exit

Related

Windows Batch File - Find String In File And Append To It

So I got VERY close to what I wanted to achieve, but I couldn't seem to get past the points I am currently stuck on.
First off, I have a .ini file that has a header along with a list of items separated by a comma:
[Items]
ListOfItems = Item01, Item02, Item03
What I need to do is find that specific section (ListOfItems) and append a new item to it that doesn't exist in a reference file.
Here's what I have so far:
SET /P NewItem=
SET ListOfItems=
FOR /F delims^=^"^ tokens^=2 %%G IN ('type File.ini') DO (set ListOfItems=%%G)
echo ListOfItems ^=^"%ListOfItems%, %NewItem%^">File.ini
The issue I have is that I am using a prompt instead of taking a new line from a file (List.txt) because the last time I tried it kept repeating the same line over and over, instead of only taking one of each line, provided it isn't a duplicate.
The other issue I have is that, with the current code above, it doesn't preserve the current state of the .ini file. So you can have the example as per above, but once you add a new item via the prompt you end up with:
ListOfItems =", Item04"
The header is gone, the original values are gone, and it starts with a comma, instead of only adding a comma after each value.
So the expected result was:
[Items]
ListOfItems = Item01, Item02, Item03, Item04
Why is it not preserving the original data when first run, but subsequent runs perfectly copies the original data and adds to it?
How can we address the issue of it starting with a comma, with no previous data in front of it?
How can we have it pull the list of items to append from List.txt instead of manually entering them one at a time?
EDIT: I've managed to address the 2nd issue with the following:
(
ECHO [Items]
ECHO !ArchiveList! = !RequiredItems!
)>>File.ini
Where !RequiredItems! contain the base items Item01, Item02, Item03 which I do when the ListOfItems are not populated, as per John Kens' suggestion.
As for the 3rd issue, I solved it with:
FOR /F "delims=" %%A IN (List.txt) DO (
SET "ListAddition=!ListAddition!%%A, "
)
SET "ListAddition=!ListAddition:~0,-2!"
Which takes each item in the List.txt file, which is on a new line, and then adds it to a list, each item separated by a comma.
Alright first things first, your current for loop needs some tweaking; There are a few things we can do here. If we just wanted to grab what ListOfItems equals from the seconened line and ignore the fact there may be other objects in this file, then the proper way to grab this data will the the following:
FOR /F "skip=1 tokens=2,*" %%A IN ('type File.ini') DO (set "ListOfItems=%%B")
skip=1 - Will skip the first line
tokens=2,* - Having 2,* will cause %%A to be first two objects, and %%B will be everything after 2.
To be more proper however, the correct way will be to use a find /i statment to look for the ListOfItems object within the .ini file. Lets take the following text file bellow:
[Items]
ListOfItems = Item01, Item02, Item03
[Armor]
ListOfArmor = Iron Helmet, Iron Brestplate, Iron Pants
[Weapons]
ListOfWeapons = Sword, Dagger, Holy Water
If we used that basic for loop on this statement we would get the last line of the text file:
Sword, Dagger, Holy Water
Bellow is using the find statement along with another loop we can combine them to only extract data after ListOfItems from the whole document.
Rem | Get .ini To String
FOR /F "tokens=*" %%A IN ('type File.ini') DO (
Rem | Look For Line With Items "ListOfItems"
FOR /F "tokens=2,*" %%B IN ('echo %%A^| find /i "ListOfItems"') DO (
Rem | Echo Result
echo %%C
)
)
However, for correcting only that one line and adding a new object to the end, this is where it gets tricky! Keep in mind that batch is primitive and loosing support, it's limitation's are well, limited compared to its successor, powershell. In raw batch there is not a true command for editing just one line in the middle of a document. However, this does not make it impossible.
To get around this we will have to take the entire .ini file and use the type command to break down the document line by line and save it as a string. From there we can use syntax-replace to edit the "String" and save it to a new document. From there we just delete and rename.
To further expand on this we will need to check if ListOfItems is actually populated. The basic if exists statement will work great here. Now because your statement has an = in the equation, simple syntax-replace will not work without further complications. From my previous edit, I changed the simple function to a script that we will call too. This script will be called Replace.bat. All you need to do is make a new .bat file and paste it in from bellow. This file will never need modified.
Bellow is the entire project that should solve all your issues:
Replace.Bat:
(This entire script is the equivalent of a single 15 character command in powershell lol!)
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "FILE_I=%~1"
set "SEARCH=%~2"
set "REPLAC=%~3"
set "FILE_O=%~4"
set "FLAG=%~5"
if not defined FILE_I exit /B 1
if not defined SEARCH exit /B 1
if not defined FILE_O set "FILE_O=con"
if defined FLAG set "FLAG=#"
for /F "delims=" %%L in ('
findstr /N /R "^" "%FILE_I%" ^& break ^> "%FILE_O%"
') do (
set "STRING=%%L"
setlocal EnableDelayedExpansion
set "STRING=!STRING:*:=!"
call :REPL RETURN STRING SEARCH REPLAC %FLAG%
>> "%FILE_O%" echo(!RETURN!
endlocal
)
endlocal
exit /B
:REPL rtn_string ref_string ref_search ref_replac flag
setlocal EnableDelayedExpansion
set "STR=!%~2!"
set "SCH=!%~3!"
set "RPL=!%~4!"
if not defined SCH endlocal & set "%~1=" & exit /B 1
set "SCH_CHR=!SCH:~,1!"
if not "%~5"=="" set "SCH_CHR="
if "!SCH_CHR!"=="=" set "SCH_CHR=" & rem = terminates search string
if "!SCH_CHR!"==""^" set "SCH_CHR=" & rem " could derange syntax
if "!SCH_CHR!"=="%%" set "SCH_CHR=" & rem % ends variable expansion
if "!SCH_CHR!"=="^!" set "SCH_CHR=" & rem ! ends variable expansion
call :LEN SCH_LEN SCH
call :LEN RPL_LEN RPL
set /A RED_LEN=SCH_LEN-1
set "RES="
:LOOP
call :LEN STR_LEN STR
if not defined STR goto :END
if defined SCH_CHR (
set "WRK=!STR:*%SCH_CHR%=!"
if "!WRK!"=="!STR!" (
set "RES=!RES!!STR!"
set "STR="
) else (
call :LEN WRK_LEN WRK
set /A DFF_LEN=STR_LEN-WRK_LEN-1,INC_LEN=DFF_LEN+1,MOR_LEN=DFF_LEN+SCH_LEN
for /F "tokens=1,2,3 delims=," %%M in ("!DFF_LEN!,!INC_LEN!,!MOR_LEN!") do (
rem set "RES=!RES!!STR:~,%%M!"
if defined WRK set "WRK=!WRK:~,%RED_LEN%!"
if "!STR:~%%M,1!!WRK!"=="!SCH!" (
set "RES=!RES!!STR:~,%%M!!RPL!"
set "STR=!STR:~%%O!"
) else (
set "RES=!RES!!STR:~,%%N!"
set "STR=!STR:~%%N!"
)
)
)
) else (
if "!STR:~,%SCH_LEN%!"=="!SCH!" (
set "RES=!RES!!RPL!"
set "STR=!STR:~%SCH_LEN%!"
) else (
set "RES=!RES!!STR:~,1!"
set "STR=!STR:~1!"
)
)
goto :LOOP
:END
if defined RES (
for /F delims^=^ eol^= %%S in ("!RES!") do (
endlocal
set "%~1=%%S"
)
) else endlocal & set "%~1="
exit /B
:LEN rtn_length ref_string
setlocal EnableDelayedExpansion
set "STR=!%~2!"
if not defined STR (set /A LEN=0) else (set /A LEN=1)
for %%L in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
if defined STR (
set "INT=!STR:~%%L!"
if not "!INT!"=="" set /A LEN+=%%L & set "STR=!INT!"
)
)
endlocal & set "%~1=%LEN%"
exit /B
Main.Bat:
#ECHO OFF
#setlocal EnableDelayedExpansion
Rem | Configuration
set "CustomINI=File.ini"
set "ListHeader=[Items]"
set "Object=ListOfItems"
set "ReplaceScript=Replace.bat"
SET "ItemList=List.txt"
Rem | Check If "CustomINI" Exists
if not exist "%CustomINI%" (
echo File "%CustomINI%" Not Found!
pause
goto :EOF
)
Rem | Check If "ItemList" Exists
if not exist "%ItemList%" (
echo File "%ItemList%" Not Found!
pause
goto :EOF
)
goto StartFunction
:StartFunction
Rem | Generate the list of items from textfile
FOR /F "delims=" %%A IN (%ItemList%) DO (
set "ListAddition=!ListAddition!%%A, "
)
set "ListAddition=!ListAddition:~0,-2!"
Rem | Get .ini To String
set HeaderFound=false
FOR /F "tokens=*" %%A IN ('type !CustomINI!') DO (
Rem | First Find The Header "[Items]" & Extract "ListOfItems" Line Data
for /f "tokens=*" %%B in ('echo %%A') do (
set "item=%%B"
if /i "!item!"=="!ListHeader!" (
set HeaderFound=true
) else if not "!item!"=="!item:ListOfItems=!" if "!HeaderFound!"=="true" (
Rem | Turn Items For Line "ListOfItems" To String
for /f "tokens=2,*" %%C in ('echo %%B') do (
Rem | Set String
set "SEARCHTEXT=%%D"
)
set HeaderFound=false
)
)
)
Rem | Check If "ListOfItems" Is Actually Populated
If "%SEARCHTEXT%"=="" (
Rem | Not Populated
set "SEARCHTEXT=!Object! = "
set "REPLACETEXT=!Object! = !ListAddition!"
goto EditString
) ELSE (
Rem | Populated
set "REPLACETEXT=!SEARCHTEXT!, !ListAddition!"
goto EditString
)
:EditString
Rem | Edit Only "ListOfItems" Line
Rem | Usage: call "1" "2" "3" "4" "5"
Rem | call - Calls external script
Rem | "1" - Name of External script
Rem | "2" - File to Edit
Rem | "3" - Text to replace ex: red apple
Rem | "4" - Text to replace to ex: green apple
Rem | "5" - Output file
call "%ReplaceScript%" "%CustomINI%" "%SEARCHTEXT%" "%REPLACETEXT%" "%CustomINI%.TEMP"
Rem | Delete Original File, Restore New
del "%CustomINI%"
rename "%CustomINI%.TEMP" "%CustomINI%"
goto :EOF
PS - Keep note of the following: The above script expects that when ListOfItems = is not populated, it has a space after the =. If this is not how it is in your .ini file then change set "SEARCHTEXT=!OBJECT! = " to set "SEARCHTEXT=!OBJECT! =" from in the for statement.
EDIT: Since recent requests, The following was updated:
Firstly, since I was unsure of the OP's meaning of ListOfItems = being "Blank", I assumed that he/she was referring to it being ListOfItems =. - Not it being actually missing from the ListHeader it's self. In the example bellow.
My Vision - File.ini:
[Items]
ListOfItems =
[Armor]
ListOfArmor = Iron Helmet, Iron Brestplate, Iron Pants
[Weapons]
ListOfWeapons = Sword, Dagger, Holy Water
OP's Vision - File.ini
[Items]
[Armor]
ListOfArmor = Iron Helmet, Iron Brestplate, Iron Pants
[Weapons]
ListOfWeapons = Sword, Dagger, Holy Water
Since then, I have now updated the script to find [Items] (String) then add a new line under it. This was done using a script by Magoo.
Since there is nothing to replace, we simply can just "Add" onto the the .ini thus we call a different function.
:EditMissingString
Rem | Export SearchString
echo !ListHeader!>> %~dp0ListHeader.txt
Rem | Add Text Under %ListHeader%
(
FOR /f "delims=" %%i IN (ListHeader.txt) DO (
SET AddAfter=%%i
FOR /f "delims=" %%n IN ('findstr /n "^" %CustomINI%') DO (
SET line=%%n
SET line=!line:*:=!
ECHO(!line!
IF "!line!"=="!AddAfter!" ECHO(%AddTEXT%
)
)
)>>%CustomINI%.TEMP
Rem | Remove ListHeader.txt
DEL %~dp0ListHeader.txt
Rem | Delete Original File, Restore New
DEL %CustomINI%
REN %CustomINI%.TEMP %CustomINI%
goto :EOF
Being that we are no longer editing ListOfArmor = alone but rather adding onto it, we no longer will need the Replace.bat script. I also fixed the original script to properly reserve empty lines!
New replace function W/H Line preserve.
:EditExistingString
REM | Make sure we only edit the ListOfItems line.
FOR /F "delims=" %%n IN ('findstr /n "^" %CustomINI%') DO (
SET line=%%n
SET Modified=!line:%SearchText%=%ReplaceText%!
SET Modified=!Modified:*:=!
REM | Output the entire edited INI to a temporary file.
>> %CustomINI%.TEMP ECHO(!Modified!
)
Rem | Delete Original File, Restore New
DEL %CustomINI%
REN %CustomINI%.TEMP %CustomINI%
goto :EOF
Result In:
[Items]
ListOfItems = Item1, Item2, Item3
[Armor]
ListOfArmor = Iron Helmet, Iron Brestplate, Iron Pants
[Weapons]
ListOfWeapons = Sword, Dagger, Holy Water
Result Out:
[Items]
ListOfItems = Item1, Item2, Item3, Item4, Item5, Item6
[Armor]
ListOfArmor = Iron Helmet, Iron Brestplate, Iron Pants
[Weapons]
ListOfWeapons = Sword, Dagger, Holy Water
Final Batch Script:
#ECHO OFF
#setlocal EnableDelayedExpansion
Rem | Configuration
set "CustomINI=File.ini"
set "ListHeader=[Items]"
set "Object=ListOfItems"
SET "ItemList=List.txt"
Rem | Check If "CustomINI" Exists
if not exist "%CustomINI%" (
echo File "%CustomINI%" Not Found!
pause
goto :EOF
)
Rem | Check If "ItemList" Exists
if not exist "%ItemList%" (
echo File "%ItemList%" Not Found!
pause
goto :EOF
)
goto StartFunction
:StartFunction
Rem | Generate the list of items from textfile
FOR /F "delims=" %%A IN (%ItemList%) DO (
set "ListAddition=!ListAddition!%%A, "
)
if "%ListAddition%"=="" (
echo ERROR: File "%ItemList%" Is Empty!
pause
goto :EOF
) ELSE (set "ListAddition=!ListAddition:~0,-2!")
Rem | Get .ini To String
set HeaderFound=false
FOR /F "tokens=*" %%A IN ('type !CustomINI!') DO (
Rem | First Find The Header "[Items]" & Extract "ListOfItems" Line Data
for /f "tokens=*" %%B in ('echo %%A') do (
set "item=%%B"
if /i "!item!"=="!ListHeader!" (
set HeaderFound=true
) else if "!HeaderFound!"=="true" (
Rem | Turn Items For Line "ListOfItems" To String
for /f "tokens=2,*" %%C in ('echo %%B') do (
Rem | Set String
set "SearchText=%%D"
)
Rem | Header Was Found, End Loop & goto HeaderContinue
set HeaderFound=false
goto HeaderContinue
)
)
)
Rem | Header Was Not Found
echo ERROR: The Header "%ListHeader%" Was Not Found!
pause
goto :EOF
:HeaderContinue
Rem | Check If "ListOfItems" Is Actually Populated
If "%SearchText%"=="" (
Rem | Not Populated
set "SearchText=!ListHeader!"
set "AddTEXT=!Object! = !ListAddition!"
goto EditMissingString
) ELSE (
Rem | Populated
set "REPLACETEXT=!SearchText!, !ListAddition!"
goto EditExistingString
)
:EditExistingString
REM | Make sure we only edit the ListOfItems line.
FOR /F "delims=" %%n IN ('findstr /n "^" %CustomINI%') DO (
SET line=%%n
SET Modified=!line:%SearchText%=%ReplaceText%!
SET Modified=!Modified:*:=!
REM | Output the entire edited INI to a temporary file.
>> %CustomINI%.TEMP ECHO(!Modified!
)
Rem | Delete Original File, Restore New
DEL %CustomINI%
REN %CustomINI%.TEMP %CustomINI%
goto :EOF
:EditMissingString
Rem | Add Text Under %ListHeader%
(
FOR /f "delims=" %%i IN ('Echo !ListHeader!') DO (
SET AddAfter=%%i
FOR /f "delims=" %%n IN ('findstr /n "^" %CustomINI%') DO (
SET line=%%n
SET line=!line:*:=!
ECHO(!line!
IF "!line!"=="!AddAfter!" ECHO(%AddTEXT%
)
)
)>>%CustomINI%.TEMP
Rem | Delete Original File, Restore New
DEL %CustomINI%
REN %CustomINI%.TEMP %CustomINI%
goto :EOF
PS: I know your find command is in a different location or something just change the command find to %WINDIR%\System32\FIND.exe in the script.
DEBUG/CHANGES:
Scraped.
For help on any of the commands do the following:
call /?
set /?
for /?
if /?
find /?
So on.

Batch Script - Find words that contain a certain string withing a file

I have a file with a lot of text.
EG
Hello
This is my file
this is the end of the file
I need a script that will search the file and pull out all words (just the words and not the line into another file) that contain for example the letter e
In this case the new file would look like
Hello
file
the
end
the
file
It may also need to search for as another example bh. (including the full stop) so a file with the following
hello
bh.ah1
my file
the end
would produce a file with
bh.ah1
hope this is enough detail
#ECHO OFF
SETLOCAL
SET "target=%~1"
FOR /f "delims=" %%a IN (q22560073.txt) DO CALL :findem %%a
GOTO :EOF
:findem
SET candidate=%1
IF NOT DEFINED candidate GOTO :EOF
ECHO %1|FIND /i "%target%" >NUL
IF NOT ERRORLEVEL 1 ECHO(%1
shift
GOTO findem
I used a file named q22560073.txt for my testing.
To find the text string, use
thisbatch text
so
thisbatch e
would find the first list and
thisbatch bh.
the second.
(I combined both sample testfiles as q22560073.txt)
the /i in the find command makes the test case-insensitive.
To output to a file, simply use
thisbatch text >"filename"
where the "rabbits ears" are only required if the filenames contains spaces and other problematic characters, but do no harm in any case.
This should work for a target of any alphabetic or numeric combination plus full stop. It will not work with characters that have a special meaning to cmd.
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "target=%~1"
FOR /f "delims=" %%a IN (q22560073.txt) DO (
SET "line=%%a"
SET "line=!line:(= !"
SET "line=!line:)= !"
CALL :findem !line!
)
GOTO :EOF
:findem
SET candidate=%1
IF NOT DEFINED candidate GOTO :EOF
ECHO %1|FIND /i "%target%" >NUL
IF NOT ERRORLEVEL 1 ECHO(%1
shift
GOTO findem
revised on further information.
#echo off
set "searchfor=bh."
for /f "delims=" %%i in (t.t) do (
for %%j in (%%i) do (
echo %%j|find "%searchfor%" >nul && echo %%j
)
)
for every line (%%i) do
for every word in this line (%%j) do
if searchstring found then echo word
EDIT to your comment: replace the ( with a space in the line, before processing words
#echo off
setlocal enabledelayedexpansion
set "searchfor=bh."
for /f "delims=" %%i in (t.t) do (
set t=%%i
set t=!t:(= !
for %%j in (!t!) do (
echo %%j|find "%searchfor%" >nul && echo %%j
)
)
You can do this for more characters with additional lines like set t=!t:(= ! (replace ( with )

Batch Script - muliple duplicate strings in a row

The below script provides the output of each occurence of the token# 1 field but I need add two more conditions.
a. Output should be provided.i.e. only when it is more than one since I have millions of records in a file
b. if there are mulitple strings.i.e. combination of Key fields in a row needs to checked across all the lines for duplicates in a file.
#ECHO OFF
SETLOCAL enabledelayedexpansion
FOR %%c IN ($ #) DO FOR /f "delims==" %%i IN ('set %%c 2^>nul') DO
"SET %%i="
SET /a count=0
FOR /f "tokens=1delims=|" %%i IN (fscif.txt) DO (
SET /a count+=1
IF DEFINED $%%i (SET "$%%i=!$%%i! & !count!") ELSE (SET "$%%i=!count!")
SET /a #%%i+=1 )
FOR /f "tokens=1*delims=$=" %%i IN ('set $ 2^>nul') DO ( ECHO %%i;!#%%i! times;line no %%j
)
For Example:
Original File (Considering token 1 & 3 are key fields)
123|12|Jack
124|23|John
123|14|Jack
125|15|Sam
125|66|Sam
125|66|Sam
Ouput file:
123|Jack;2 times;line no 1 & 3
125|Sam;3 times;line no 4 & 5 & 6
#ECHO OFF
SETLOCAL enabledelayedexpansion
:: Temporary filename
:tloop
SET "temppfx=%temp%\%random%"
IF EXIST "%temppfx%*" GOTO tloop
:: Hold that tempfile name...
ECHO.>"%temppfx%_"
:: a long string of spaces note the end-of-string quote -----here--v
SET "spaces= "
SET /a count=0
(
FOR /f "tokens=1,3 delims=|" %%a IN (fscif.txt) DO (
SET /a count+=1
SET "field1=%%a%spaces%"
SET "field3=%%b%spaces%"
SET "fieldc=%spaces%!count!"
ECHO(!field1:~0,10!!field3:~0,12!^|!fieldc:~-8!^|!count!^|%%a^|%%b
)
)>"%temppfx%1"
:: Now report
SET "key=x"
SET /a count=0
(
FOR /f "tokens=1,3* delims=|" %%a IN ('sort "%temppfx%1" ') DO (
IF "!key!"=="%%a" (
SET "line=!line! %%b"
SET /a count+=1
) ELSE (IF !count! neq 0 CALL :output
SET key=%%a
SET line=%%b
SET "data=%%c"
SET /a count=1
)
)
CALL :output
)>report.txt
del "%temppfx%*"
GOTO :eof
:output
ECHO(!data!;%count% times;line nos %line: = ^& %
GOTO :eof
As I explained earlier, with millions of records, you are likely to run out of environment space. As posted above, I reckon you may still run out because the report of line numbers may be huge - no idea - you are familiar with your real data.
Essentially, the first thing to do is to establish a temporary file.
Starting with the tokens required in the input file - I followed 1 and 3 but no doubt there may be more - just follow the bouncing ball...
The selected fields are padded - on the right for text fields and on the left for the count field using the spaces variable.
Then the tempfile output is generated. I randomly chose a maximum length of 10 for the first field and 12 for the second. These two are combined to give the key field. The leading-filled count field is output as the second column so that after SORTing, the data will appear grouped by key, then line number. The remaining columns of interest are then reproduced.
The data is then sorted as input to the next for/f loop - only tokens 1 (the key), 3 (the raw line number) and "the rest" (the key without the padding) are of interest
Then it's simply a matter of counting matching keys and accumulating the line number in line and reporting when the key changes. One last output is required to report the very last data item, and we're done.
For this ugly batch job I recommend to use sed and uniq from the GNUWin Project:
#echo off&setlocal enabledelayedexpansion
set "inputfile=file"
set "outputfile=out"
set "tempfile=%temp%\%random%"
<"%inputfile%" sed "s/|.*|/|.*|/"|sort|uniq -d>"%tempfile%"
(for /f "usebackqtokens=1-3delims=|" %%i in ("%tempfile%") do (
set /a cnt=0&set "line="
for /f "delims=:" %%a in ('findstr /nr "%%i|%%j|%%k" "%inputfile%"') do set "line=!line!%%a & "&set /a cnt+=1
echo(%%i^|%%k;!cnt! times;line no !line:~0,-3!
))>"%outputfile%"
del "%tempfile%"
type "%outputfile%"
.. output is:
123|Jack;2 times;line no 1 & 3
125|Sam;3 times;line no 4 & 5 & 6
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
rem Assemble "tokensValues" and "lastToken" variables from the parameters
set letters=0abcdefghijklmnopqrstuvwxyz
set tokensValues=%%!letters:~%1,1!
set lastToken=%1
:nextArg
shift
if "%1" equ "" goto endArgs
set "tokensValues=!tokensValues!#%%!letters:~%1,1!"
set lastToken=%1
goto nextArg
:endArgs
rem Accumulate duplicated strings
set line=0
for /F "tokens=1-%lastToken% delims=|" %%a in (fscif.txt) do (
set /A line+=1
if not defined lines[%tokensValues%] (
set lines[%tokensValues%]=!line!
) else (
set "lines[%tokensValues%]=!lines[%tokensValues%]! & !line!"
)
set /A times[%tokensValues%]+=1
)
rem Show the result
for /F "tokens=2* delims=[]=" %%a in ('set lines[ 2^>NUL') do (
if !times[%%a]! gtr 1 (
set string=%%a
set "string=!string:#=|!"
echo !string!;!times[%%a]! times;line no %%b
)
)
You must provide the number of the desired key fields in the parameters. For example, to consider 1 & 3 as key fields:
prog.bat 1 3
You may provide a maximum of 26 key fields with positions from 1 to 26; this limit may be easily increased up to 52.
This Batch file does not use any external command and works over the original file, so it should run fast. If the file is large, a sort or findstr command over it will take too long (even a simple copy, for that matter).
If we take your example data as representative of the real data, lines variable should store about 2500-3000 lines (that is, number of different lines where the same key fields appear), and with a total environment space of 64 MB I think this program will be capable of process your large files.

Getting Distinct Values From Text File

I'm working with very large FIX message log files. Each message represents a set of tags separated by SOH characters.
Unlike MQ messages, individual FIX tags (and overall messages) do not feature fixed length or position. Log may include messages of different types (with a different number & sequence of tags).
Sample (of one of many types of messages):
07:00:32 -SEND:8=FIX.4.0(SOH)9=55(SOH)35=0(SOH)34=2(SOH)43=N(SOH)52=20120719-11:00:32(SOH)49=ABC(SOH)56=XYZ(SOH)10=075
So the only certain things are as follows: (1) tag number with equal sign uniquely identifies the tag, (2) tags are delimited by SOH characters.
For specific tags (just a few of them at a time, not all of them), I need to get a list of their distinct values - something like this:
49=ABC 49=DEF 49=GHI...
Format of the output doesn't really matter.
I would greatly appreciate any suggestions and recommendations.
Kind regards,
Victor O.
Option 1
The batch script below has decent performance. It has the following limitations
It ignores case when checking for duplicates.
It may not properly preserve all values that contain = in the value
EDIT - My original code did not support = in the value at all. I lessened that limitation by adding an extra SOH character in the variable name, and changed the delims used to parse the value. Now the values can contain = as long as unique values are differentiated before the =. If the values differentiate after the = then only one value will be preserved.
Be sure to fix the definition of the SOH variable near the top.
The name of the log file is passed as the 1st parameter, and the list of requested tags is passed as the 2nd parameter (enclosed in quotes).
#echo off
setlocal disableDelayedExpansion
:: Fix the definition of SOH before running this script
set "SOH=<SOH>"
set LF=^
:: The above 2 blank lines are necessary to define LF, do not remove.
:: Make sure there are no existing tag_ variables
for /f "delims==" %%A in ('2^>nul set tag_') do set "%%A="
:: Read each line and replace SOH with LF to allow iteration and parsing
:: of each tag/value pair. If the tag matches one of the target tags, then
:: define a tag variable where the tag and value are incorporated in the name.
:: The value assigned to the variable does not matter. Any given variable
:: can only have one value, so duplicates are removed.
for /f "usebackq delims=" %%A in (%1) do (
set "ln=%%A"
setlocal enableDelayedExpansion
for %%L in ("!LF!") do set "ln=!ln:%SOH%=%%~L!"
for /f "eol== tokens=1* delims==" %%B in ("!ln!") do (
if "!!"=="" endlocal
if "%%C" neq "" for %%D in (%~2) do if "%%B"=="%%D" set "tag_%%B%SOH%%%C%SOH%=1"
)
)
:: Iterate the defined tag_nn variables, parsing out the tag values. Write the
:: values to the appropriate tag file.
del tag_*.txt 2>nul
for %%A in (%~2) do (
>"tag_%%A.txt" (
for /f "tokens=2 delims=%SOH%" %%B in ('set tag_%%A') do echo %%B
)
)
:: Print out the results to the screen
for %%F in (tag_*.txt) do (
echo(
echo %%F:
type "%%F"
)
Option 2
This script has almost no limitations, but it significantly slower. The only limitation I can see is it will not allow a value to start with = (the leading = will be discarded).
I create a temporary "search.txt" file to be used with the FINDSTR /G: option. I use a file instead of a command line search string because of FINDSTR limitations. Command line search strings cannot match many characters > decimal 128. Also the escape rules for literal backslashes are inconsistent on the command line. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info.
The SOH definition must be fixed again, and the 1st and 2nd arguments are the same as with the 1st script.
#echo off
setlocal disableDelayedExpansion
:: Fix the definition of SOH before running this script
set "SOH="
set lf=^
:: The above 2 blank lines are necessary to define LF, do not remove.
:: Read each line and replace SOH with LF to allow iteration and parsing
:: of each tag/value pair. If the tag matches one of the target tags, then
:: check if the value already exists in the tag file. If it doesn't exist
:: then append it to the tag file.
del tag_*.txt 2>nul
for /f "usebackq delims=" %%A in (%1) do (
set "ln=%%A"
setlocal enableDelayedExpansion
for %%L in ("!LF!") do set "ln=!ln:%SOH%=%%~L!"
for /f "eol== tokens=1* delims==" %%B in ("!ln!") do (
if "!!"=="" endlocal
set "search=%%C"
if defined search (
setlocal enableDelayedExpansion
>search.txt (echo !search:\=\\!)
endlocal
for %%D in (%~2) do if "%%B"=="%%D" (
findstr /xlg:search.txt "tag_%%B.txt" || >>"tag_%%B.txt" echo %%C
) >nul 2>nul
)
)
)
del search.txt 2>nul
:: Print out the results to the screen
for %%F in (tag_*.txt) do (
echo(
echo %%F:
type %%F
)
Try this batch file. Add the log file name as parameter. e.g.:
LISTTAG.BAT SOH.LOG
It will show all tag id and its value that is unique. e.g.:
9=387
12=abc
34=asb73
9=123
12=xyz
Files named tagNNlist.txt (where NN is the tag id number) will be made for finding unique tag id and values, but are left intact as reports when the batch ends.
The {SOH} text shown in below code is actually the SOH character (ASCII 0x01), so after you copy & pasted the code, it should be changed to an SOH character. I have to substitute that character since it's stripped by the server. Use Wordpad to generate the SOH character by typing 0001 then press ALT+X. The copy & paste that character into notepad with the batch file code.
One thing to note is that the code will only process lines starting at column 16. The 07:00:32 -SEND: in your example line will be ignored. I'm assuming that they're all start with that fixed-length text.
Changes:
Changed generated tag list file into separate files by tag IDs. e.g.: tag12list.txt, tag52list.txt, etc.
Removed tag id numbers in generated tag list file. e.g.: 12=abc become abc.
LISTTAG.BAT:
#echo off
setlocal enabledelayedexpansion
if "%~1" == "" (
echo No source file specified.
goto :eof
)
if not exist "%~1" (
echo Source file not found.
goto :eof
)
echo Warning! All "tagNNlist.txt" file in current
echo directory will be deleted and overwritten.
echo Note: The "NN" is tag id number 0-99. e.g.: "tag99list.txt"
pause
echo.
for /l %%a in (0,1,99) do if exist tag%%alist.txt del tag%%alist.txt
for /f "usebackq delims=" %%a in ("%~1") do (
rem *****below two lines strip the first 15 characters (up to "-SEND:")
set x=%%a
set x=!x:~15,99!
rem *****9 tags per line
for /f "tokens=1,2,3,4,5,6,7,8,9 delims={SOH}" %%b in ("!x!") do (
call :dotag "%%b" %*
call :dotag "%%c"
call :dotag "%%d"
call :dotag "%%e"
call :dotag "%%f"
call :dotag "%%g"
call :dotag "%%h"
call :dotag "%%i"
call :dotag "%%j"
)
)
echo.
echo Done.
goto :eof
rem dotag "{id=value}"
:dotag
for /f "tokens=1,2 delims==" %%p in (%1) do (
set z=0
if exist tag%%plist.txt (
call :chktag %%p "%%q"
) else (
rem>tag%%plist.txt
)
if !z! == 0 (
echo %%q>>tag%%plist.txt
echo %~1
)
)
goto :eof
rem chktag {id} "{value}"
:chktag
for /f "delims=" %%y in (tag%1%list.txt) do (
if /i "%%y" == %2 (
set z=1
goto :eof
)
)
goto :eof

Displaying lines from text file in a batch file

I'm tryin' to find a script that will let me display "linenumber# and linenumber# as well as lines#-#" from a text file in a batch file? I found this script here on this site..
#echo off
setlocal enabledelayedexpansion
if [%1] == [] goto usage
if [%2] == [] goto usage
SET /a counter=0
for /f "usebackq delims=" %%a in (%2) do (
if "!counter!"=="%1" goto exit
echo %%a
set /a counter+=1
)
goto exit
:usage
echo Usage: head.bat COUNT FILENAME
:exit
And it works great :) But it grabs the number of lines from the top of the text file. I want to be able to display certain lines in the text file, as well as a range if possible..
EG: I have a text file with 30 lines, and I want to display lines 1-4; 7-11; 13; 17-20; 22; 26 & 29
Here's a simple modification of the sample batch file above. Copy the code below to file and name it "LineDisplay.bat" - it takes the FirstLineNumber and LastLineNumber as parameters. Example: LineDisplay test.txt 12 30 (to read lines 12-30)
#echo off
setlocal enabledelayedexpansion
if [%1] == [] goto usage
if [%2] == [] goto usage
if [%3] == [] goto usage
set /a FirstLineNumber = %2
set /a LastLineNumber = %3
echo Reading from Line !FirstLineNumber! to !LastLineNumber!
SET /a counter=1
for /f "usebackq delims=" %%a in (%1) do (
if !counter! GTR !LastLineNumber! goto exit
if !counter! GEQ !FirstLineNumber! echo !counter! %%a
set /a counter+=1
)
goto exit
:usage
echo Usage: LineDisplay.bat FILENAME FirstLineNumber LastLineNumber
:exit
Here's a line to a nice tutorial on creating batch files http://vtatila.kapsi.fi/batch_tutorial.html
Seems to work:
#ECHO OFF
REM Show usage and quit if no file name was given
IF [%1]==[] GOTO USAGE
REM Show entire file if no range was given
IF [%2]==[] TYPE %1 & GOTO :EOF
SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
SET FILENAME=%1
SET LASTLINE=0
REM Build the array of lines to display
SHIFT
:RANGEREADLOOP
CALL :BUILDARRAY %1
SHIFT
IF NOT [%1]==[] GOTO RANGEREADLOOP
REM Loop through the file and keep track of the lines to display
SET CURRENTLINE=0
SET INDEX=1
FOR /F "delims=" %%l in (%FILENAME%) DO (
SET LINE=%%l
CALL :PRINTLINE
)
GOTO :EOF
REM Adds the lines from the specified range to the array of lines to display
:BUILDARRAY
REM Find out whether we have a single line or a range
SET TEST=%1
SET /A TEST1=%TEST%
SET /A TEST2=%TEST:-=%
IF %TEST1%==%TEST2% (
REM Single line
SET /A LASTLINE+=1
SET LINES[!LASTLINE!]=%1
) ELSE (
REM Range
FOR /F "tokens=1,2 delims=-" %%x IN ("%1") DO (SET RANGESTART=%%x&SET RANGEEND=%%y)
REM Some sanity checking
IF !RANGESTART! GTR !RANGEEND! (
ECHO.Problem with range !RANGESTART!-!RANGEEND!:
ECHO. Ranges must have a start value smaller than the end value.
EXIT /B 1
) ELSE (
FOR /L %%i IN (!RANGESTART!,1,!RANGEEND!) DO (
SET /A LASTLINE+=1
SET LINES[!LASTLINE!]=%%i
)
)
)
GOTO :EOF
REM Prints the specified line if the current line should be printed
:PRINTLINE
SET /A CURRENTLINE+=1
IF %CURRENTLINE%==!LINES[%INDEX%]! (
REM We have a line to print, so do this
ECHO !LINE!
SET /A INDEX+=1
)
GOTO :EOF
REM Prints usage and exits the batch file
:USAGE
ECHO.%~n0 - Displays selected lines from a text file.
ECHO.
ECHO.Usage:
ECHO. %~n0 ^<filename^> ^<range^> ...
ECHO.
ECHO. ^<filename^> - the text file from which to read
ECHO. ^<range^> - the line range(s) to display.
ECHO.
ECHO.Example:
ECHO. %~n0 foo.txt 1-4 13 15 18-20
ECHO.
ECHO. will display the first four lines from the file "foo.txt",
ECHO. the 13th and 15th as well as the lines 18 to 20.
ECHO.
ECHO.Line ranges are separated by comma, semicolon or space. If no range is given,
ECHO.the entire file is displayed.
EXIT /B 1
GOTO :EOF
The whole script could use some better error checking, examples of what not to do or where error checking is a bit wonky:
dl foo.txt 1-2-4 (prints lines 1-2 but no error message)
dl foo.txt -1 (error message that the range 1- isn't correct)
Other limitations:
Ranges must be sorted. With the current implementation there is no way to do something like dl foo.txt 7-8,1-2.
No line may be selected twice. Something like dl foo.txt 1,2,2-8,11-15 will stop after the second line.
No support for UNIX-style line endings (U+000A)
EDIT: Fixed an issue with files that contain parentheses.

Resources