Delete special characters from multiple csv files using batch file

Delete special characters from multiple csv files using batch file - excel

I want to delete all the special characters in my csv file using a batch file. My csv file has one column of only keywords to be entered in google
For example
1.Ecommerce
2.dentist Melbourne cbd?
3.dentists Melbourne %
4.best dentist in Melbourne!
Sometimes I can have Aracbic/Chinese Characters as well and so on.
Here When I add these files to GoogleAdwords-Keyword Planner, it shows me an error, on ignoring error i get wrong no. of hits for keyword and to avoid error i need to remove all the special characters from my csv file.
I have Hundreds of csv files and want to save the updated(Without special characters) file to the existing file.
I tried
#echo off
set source_folder=C:\Users\Username\Documents\iMacros\Datasources\a
set target_folder=C:\Users\Username\Documents\iMacros\Datasources\keyfords-csv-file
if not exist %target_folder% mkdir %target_folder%
for /f %%A in ('dir /b %source_folder%\*.csv') do (
for /f "skip=1 tokens=1,2* delims=," %%B in (%source_folder%\%%A) do (
echo %%B>>%target_folder%\%%A
)
)
timeout /t 20
But ended up Deleting all the records from csv file.
Is there anyway by which i can either
1.Accept only Standard Characters which would be from A-Z, a-z, and 0-9.
2.Or Delete all the string where I can put special characters in that string. Like
string1="?%!##$^&*<>"
3.Or is there anyway by which i can mention in csv file to accept only Standard English Characters
Is there any way to achieve this using a batch file or any framework?
Thanks

I think this is much cleaner in Powershell.
$sourceFolder = "C:\Users\Username\Documents\iMacros\Datasources\a"
$targetFolder = "C:\Users\Username\Documents\iMacros\Datasources\keyfords-csv-file"
MkDir $targetFolder -ErrorAction Ignore
$fileList = Dir $sourceFolder -Filter *.csv
ForEach($file in $fileList)
{
$file | Get-Content | %{$_ -replace '[^\w\s,\"\.]',''} | Set-Content -Path "$targetFolder\$file"
}
I take every file from the source folder, get the contents, replace any character that is not wanted, and save it to another file. I use a little regex right in the middle '[^\w\s,\"\.]' with the replace command. The carrot ^ is a not match operator. So anything that does not match a word character \w, space character \s, a coma ,, double quote \", or a period \.
Someone may find a better regex for your needs, but I think you get the idea.

Technically you could have a series of:
set variable=%variable:"=%
set variable=%variable:(=%
set variable=%variable:)=%
set variable=%variable:&=%
set variable=%variable:%=%
And so on. I know this would be an annoyance to write all the special characters..
Seeing there would be less letters in the alphabet than "special characters" a findstr could be done on the file/folder name, if a letter from a-z is found true, write and move to the next character.
_Arescet

Related

Searching for files whose names partially match strings in a variable

First time poster. I'm seeking some help, since I very rarely dabble in scripting and only very at a very basic level.
I got asked to write a script that searches each word from a list in txt in a backup files folder. The names of the files doesn't exactly match what's in the list and there's three different patterns to match. So in case the backup file isn't there, it's supposed to generate another txt file with the missing ones.
So I wrote this very basic .bat script.
#echo off
del missing backups.txt
FOR /F %%i IN (list.txt) do (#echo %%i
IF NOT EXIST D:\Backups\*XmlConf**%%i* (echo %%i >> missing backups.txt XML)
IF NOT EXIST D:\Backups\*user**%%i* (echo %%i >> missing backups.txt USERS)
IF NOT EXIST D:\Backups\*IVS**%%i* (echo %%i >> missing backups.txt CONFIGURACION)
)
This works pretty well. The problem is I need to add a condition: the files are supposed to have been created in the last 24 hours. This is where I got into trouble. Since this seems to have many complications in bat, I tried my hand at Powershell for the first time.
This is where I'm currently at, after many modifications:
$list = Get-Content -Path 'list.txt'
$bakups24 = Get-ChildItem -name -Path 'C:\BACKUPS' | Where-Object { $_.CreationTime -le (Get-Date).AddDays(-1) }
foreach ($f in $list) {
if-not ($backups24 -like "*$f*.*" {
$f | Out-File -Append -FilePath 'C:\Backup Check\Missing backups.txt'
}
}
Clearly, the -like expression isn't working for some reason, because I have arranged for the files in c:backups to have names that include some of the elements in the list and some not. For the moment I'm forgoing the three different patterns.
Some help would be appreciated, I don't care if it's in batch or Powershell format.
Regards.

How to add sequential numbers say 1,2,3 etc. to each file name and also for each line of the file content in a directory?

I want to add sequential number for each file and its contents in a directory. The sequential number should be prefixed with the filename and for each line of its contents should have the same number prefixed. In this manner, the sequential numbers should be generated for all the files(for names and its contents) in the sub-folders of the directory.
I have tried using maxdepth, rename, print function as a part. but it throws error saying that "-maxdepth" - not a valid option.
I have already a part of code(to print the names and contents of text files in a directory) and this logic should be appended with it.
#!bin/bash
cd home/TESTING
for file in home/TESTING;
do
find home/TESTING/ -type f -name *.txt -exec basename {} ';' -exec cat {} \;
done
P.s - print, rename, maxdepth are not working
If the name of the first file is File1.txt and its contents is mentioned as "Louis" then the output for the filename should be 1File1.txt and the content should be as "1Louis".The same should be replaced with 2 for second file. In this manner, it has to traverse through all the subfolders in the directory and print accordingly. I have already a part of code and this logic should be appended with it.

There should be fail safe if you execute cd in a script. You can execute command in wrong directory if you don't.
In your attempt, the output would be the same even without the for cycle, as for file in home/TESTING only pass home/TESTING as argument to for so it only run once. In case of
for file in home/TESTING/* this would happen else how.
I used find without --maxdepth, so it will look into all subdirectory as well for *.txt files. If you want only the current directory $(find /home/TESTING/* -type f -name "*.txt") could be replaced to $(ls *.txt) as long you do not have directory that end to .txt there will be no problem.
#!/bin/bash
# try cd to directory, do things upon success.
if cd /home/TESTING ;then
# set sequence number
let "x = 1"
# pass every file to for that find matching, sub directories will be also as there is no maxdeapth.
for file in $(find /home/TESTING/* -type f -name "*.txt") ; do
# print sequence number, and base file name, processed by variable substitution.
# basename can be used as well but this is bash built in.
echo "${x}${file##*/}"
# print file content, and put sequence number before each line with stream editor.
sed 's#^#'"${x}"'#g' ${file}
# increase sequence number with one.
let "x++"
done
# unset sequence number
unset 'x'
else
# print error on stderr
echo 'cd to /home/TESTING directory is failed' >&2
fi
Variable Substitution:
There is more i only picked this 4 for now as they similar.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest
So ${file##*/} will take the variable of $file and drop every caracter * before the last ## slash /. The $file variable value not get modified by this, so it still contain the path and filename.
sed 's#^#'"${x}"'#g' ${file} sed is a stream editor, there is whole books about its usage, for this particular one. It usually placed into single quote, so 's#^#1#g' will add 1 the beginning of every line in a file.s is substitution, ^ is the beginning of the file, 1 is a text, g is global if you not put there the g only first mach will be affected.
# is separator it can be else as well, like / for example. I brake single quote to let variable be used and reopened the single quote.
If you like to replace a text, .txt to .php, you can use sed 's#\.txt#\.php#g' file , . have special meaning, it can replace any singe character, so it need to be escaped \, to use it as a text. else not only file.txt will be matched but file1txt as well.
It can be piped , you not need to specify file name in that case, else you have to provide at least one filename in our case it was the ${file} variable that contain the filename. As i mentioned variable substitution is not modify variable value so its still contain the filename with path.

Get numeric value from file name

I am a new guy of Linux. I have a question:
I have a bunch of files in a directory, like:
abc-188_1.out
abc-188_2.out
abc-188_3.out
how can a get the number 188 from those names?

Assuming (since you are on linux and are working with files), that you will use a shell / bash-script... (If you use something different (say, python, ...), the solution will, of course, be a different one.)
... this will work
for file in `ls *`; do out=`echo "${file//[!0-9]/ }"|xargs|cut -d' ' -f1`; echo $out; done
Explanation
The basic problem is to extract a number from a string in bash script (search stackoverflow for this, you will find dozens of different solutions).
This is done in the command above as (the string from which numbers are to be extracted being saved in the variable file):
${file//[!0-9]/ }
or, without spaces
${file//[!0-9]/}
It is complicated here by two things:
Do this recursively on the contents of a directory. This is done here with a bash for loop (note that the variable file takes as value the name of each of the files on the current working directory, one after another)
for file in ls *; do (commands you want done for every file in the CWD, seperated by ";"); done
There are multiple numbers in the filenames, you just want the first one.
Therefore, we leave the spaces in, and pipe the result (that being only numbers and spaces from the current file name) into two other commands, xargs (removes leading and trailing whitespace) and cut -d' ' -f1` (returns only the part of the string before the first remaining space, i.e. the first number in our filename),
We save the resulting string in a variable "out" and print it with echo $out,
out=echo "${file//[!0-9]/ }"|xargs|cut -d' ' -f1; echo $out
Note that the number is still in a string data type. You can transform it to integer if you want by using double brackets preceeded by $ out_int=$((out))

Batch file to find all files in a directory containing an HTML string then output list to a text file

I've made several attempts at this but get nothing but "can't open..." errors, so I'm asking here:
I want to find all instances of the string "SOME TEXT" within a directory full of HTML files. Then the search results should be output to a file in that same directory (D:\myfiles)

Here's a sample batch file that'll do the trick.
#echo off
setlocal
pushd D:\myfiles
rem case-insensitive search for the string "SOME TEXT" in all html files
rem in the current directory, piping the output to the results.txt file
rem in teh same directory
findstr /ip /c:"SOME TEXT" *.html > results.txt
popd
endlocal
Update: Some caveats to using findstr command.
If your string contains angle brackets, you have to escape them using the CMD escape character - ^. So, if you want to search for <TITLE>, you have to specify it as /c:"^<TITLE^>".
If you want only file names, change /ip to /im. Also, you can add /s to search subfolders. In general, you can play with the different findstr options as listed in findstr /?.
Findstr will find the text only in UTF-8 encoded files. If the HTML files are UTF-16 encoded (ie, each character takes two bytes), findstr will not find the text.
I would also suggest running the command without the piping to the results.txt first to get the right findstr options and make sure it outputs what you need.

for %%f in (*.html) do findstr /i /m /p /c:"SOME TEXT" "%%f" >> results.txt

Delete certain lines in a txt file via a batch file

I have a generated txt file. This file has certain lines that are superfluous, and need to be removed. Each line that requires removal has one of two string in the line; "ERROR" or "REFERENCE". These tokens may appear anywhere in the line. I would like to delete these lines, while retaining all other lines.
So, if the txt file looks like this:
Good Line of data
bad line of C:\Directory\ERROR\myFile.dll
Another good line of data
bad line: REFERENCE
Good line
I would like the file to end up like this:
Good Line of data
Another good line of data
Good line
TIA.

Use the following:
type file.txt | findstr /v ERROR | findstr /v REFERENCE
This has the advantage of using standard tools in the Windows OS, rather than having to find and install sed/awk/perl and such.
See the following transcript for it in operation:
C:\>type file.txt
Good Line of data
bad line of C:\Directory\ERROR\myFile.dll
Another good line of data
bad line: REFERENCE
Good line
C:\>type file.txt | findstr /v ERROR | findstr /v REFERENCE
Good Line of data
Another good line of data
Good line

You can accomplish the same solution as #paxdiablo's using just findstr by itself. There's no need to pipe multiple commands together:
findstr /V "ERROR REFERENCE" infile.txt > outfile.txt
Details of how this works:
/v finds lines that don't match the search string (same switch #paxdiablo uses)
if the search string is in quotes, it performs an OR search, using each word (separator is a space)
findstr can take an input file, you don't need to feed it the text using the "type" command
"> outfile.txt" will send the results to the file outfile.txt instead printing them to your console. (Note that it will overwrite the file if it exists. Use ">> outfile.txt" instead if you want to append.)
You might also consider adding the /i switch to do a case-insensitive match.

If you have sed:
sed -e '/REFERENCE/d' -e '/ERROR/d' [FILENAME]
Where FILENAME is the name of the text file with the good & bad lines

If you have perl installed, then perl -i -n -e"print unless m{(ERROR|REFERENCE)}" should do the trick.

It seems that using the FIND instead of the FINDSTR can support also unicode characters.
e.g. type file.txt | find /v "Ω"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Delete special characters from multiple csv files using batch file - excel

Related

Searching for files whose names partially match strings in a variable

How to add sequential numbers say 1,2,3 etc. to each file name and also for each line of the file content in a directory?

Get numeric value from file name

Batch file to find all files in a directory containing an HTML string then output list to a text file

Delete certain lines in a txt file via a batch file

Categories

Resources