How to use a text list to generate svg files? - svg

I made a svg in inkscape. It's a Hiragana text.
Is there a way to batch export svg files from a list of Hiragana text?
That can be 46 Hiragana svg files.
id="tspan849">あ</tspan></text>

I have write a py script to accomplish the task
Wish this py script can help people who need it.
#Check if the output folder is missing,it will create a new one, if output folder is not found.
import os
path = 'output'
if not os.path.isdir(path):
os.mkdir(path)
#Create a List.
ListFileName='HiraganaList.txt'
with open(ListFileName, mode="r", encoding="utf-8") as f:
#Read the contents of the file into a list.
lines=f.readlines()
#Read the sample file.
SampleFileName='Hiragana_01.svg'
with open(SampleFileName, mode="r", encoding="utf-8") as s:
#Read the sample file contents into a list.
sLines=s.readlines()
#Output directory and named variables.
OutFileFolder='Output'
OutPixFileName='Hiragana_'
#Specifies the string to find.
sTokenString='あ'
iNum=1
#Cycle through List from first line to end line.
for line in lines:
#Declares an output list.
OutputContext=[]
#Numbering
sNum='0' + str(iNum)
#Output file name + path
OutFileName=OutFileFolder + '\\'+OutPixFileName+sNum[-2:]+'.svg'
#Save a new file.
with open(OutFileName, mode="w", encoding="utf-8") as w:
#Cycle through sample contents
for sLine in sLines:
#Determine whether it is consistent with sTokenString
if sLine.find(sTokenString)>0:
print('old->'+sLine)
#Replaced by a new string
sNew=sLine.replace(sTokenString,line.replace('\n',''))
print('New->'+sNew)
#write in to list
OutputContext.append(sNew)
else:
OutputContext.append(sLine)
print(line)
w.writelines(OutputContext)
iNum+=1

Related

search and replace using a file for computer name

I've got to search for the computer name and replace this with another name in python. These are stored in a file seperated by a space.
xerox fj1336
mongodb gocv1344
ec2-hab-223 telephone24
I know this can be done in linux using a simple while loop.
What I've tried is
#input file
fin = open("comp_name.txt", "rt")
#output file to write the result to
fout = open("comp_name.txt", "wt")
#for each line in the input file
for line in fin:
#read replace the string and write to output file
fout.write(line.replace('xerox ', 'fj1336'))
#close input and output files
fin.close()
fout.close()
But the output don't really work and if it did it would only replace the one line.
u can try this way:
with open('comp_name.txt', 'r+') as file:
content = file.readlines()
for i, line in enumerate(content):
content[i] = line.replace('xerox', 'fj1336')
file.seek(0)
print(str(content))
file.writelines(content)

How to read file as .dat and write it as a .txt

So I'm making a thing where it reads data from a .dat file and saves it as a list, then it takes that list and writes it to a .txt file (basically a .dat to .txt converter). However, whenever I run it and it makes the file, it is a .txt file but it contains the .dat data. After troubleshooting the variable that is written to the .dat file is normal legible .txt not weird .dat data...
Here is my code (pls don't roast I'm very new I know it sucks and has lots of mistakes just leave me be xD):
#import dependencies
import sys
import pickle
import time
#define constants and get file path
data = []
index = 0
path = input("Absolute file path:\n")
#checks if last character is a space (common in copy+pasting) and removes it if there is a space
if path.endswith(' '):
path = path[:-1]
#load the .dat file into a list names bits
bits = pickle.load(open(path, "rb"))
with open(path, 'rb') as fp:
bits = pickle.load(fp)
#convert the data from bits into a new list called data
while index < len(bits):
print("Decoding....\n")
storage = bits[index]
print("Decoding....\n")
str(storage)
print("Decoding....\n")
data.append(storage)
print("Decoding....\n")
index += 1
print("Decoding....\n")
time.sleep(0.1)
#removes the .dat of the file
split = path[:-4]
#creates the new txt file with _converted.txt added to the end
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
#tells the user where the file has been created
close_file = str(split)+"_convert.txt"
print(f"\nA decoded txt file has been created. Run this command to open it: cd {close_file}\n\n")
Quick review; I'm setting a variable named data which contains all of the data from the .dat file, then I want to the save the variable to a .txt file, but whenever I save it to a .txt file it has the contents of the .dat file, even though when I call print(data) it tells me the data in normal, legible text. Thanks for any help.
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
When you're opening the file in wb mode, it will automatically write binary data to it. To write plain text to .txt file, use
with open(f"{split}_convert.txt", "w") as fp:
fp.write(data)
Since data is a list, you can't write it straight away as well. You'll need to write each item, using a loop.
with open(f"{split}_convert.txt", "w") as fp:
for line in data:
fp.write(line)
For more details on file writing, check this article as well: https://www.tutorialspoint.com/python3/python_files_io.htm

Read multiple text files, search few strings , replace and write in python

I have 10s of text files in my local directory named something like test1, test2, test3, and so on. I would like to read all these files, search few strings in the files, replace them by other strings and finally save back into my directory in such a way that something like newtest1, newtest2, newtest3, and so on.
For instance, if there was a single file, I would have done following:
#Read the file
with open('H:\\Yugeen\\TestFiles\\test1.txt', 'r') as file :
filedata = file.read()
#Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
#write the file out again
with open('H:\\Yugeen\\TestFiles\\newtest1.txt', 'w') as file:
file.write(filedata)
Is there any way that I can achieve this in python?
If you use Pyhton 3 you can use the scandir in os library.
Python 3 docs: os.scandir
With that you can get the directory entries.
with os.scandir('H:\\Yugeen\\TestFiles') as it:
Then loop over these entries and your code could look something like this.
Notice I changed the path in your code to the entry object path.
import os
# Get the directory entries
with os.scandir('H:\\Yugeen\\TestFiles') as it:
# Iterate over directory entries
for entry in it:
# If not file continue to next iteration
# This is no need if you are 100% sure there is only files in the directory
if not entry.is_file():
continue
# Read the file
with open(entry.path, 'r') as file:
filedata = file.read()
# Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
# write the file out again
with open(entry.path, 'w') as file:
file.write(filedata)
If you use Pyhton 2 you can use listdir. (also applicable for python 3)
Python 2 docs: os.listdir
In this case same code structure. But you also need to handle the full path to file since listdir will only return the filename.

How to I check whether a file already contains the text I want to append?

I am currently working on a project. So I want to read all the *.pdf files in a directory, extract their text and append it to a text file. So far so good. I was able to do this, yeah.
Now the problem: if I am reading the same directory again, it appends the same files again. Is there a way to check whether the extracted text is already in the file and thus, skip the whole thing?
My code for this looks like this right now (I created the directory variable already):
`
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
file = os.path.join(directory, filename)
print(file)
#parse data from file
file_data = parser.from_file(file)
#get files text content
text = file_data['content']
#print(type(text))
print("len ", len(text))
#print(text)
#save to textfile
f = open("test2.txt", "a+", encoding = 'utf-8')
f.write(text)
f.close()
else:
continue
`
Thanks in advance!
One thing you could do is load the file contents and check if the file is within the file:
if text in open("test2.txt"):
# write here
else:
# text is already in file, don't write
However, this is very inefficient. A better way is to create a file with the filenames that you have already written, and check that:
(at the beginning of your code):
files = open("files.txt").readlines()
(before parser.from_file(file)):
if file in files:
continue # don't read or write
(after f.close()):
files.append(file)
(after the whole loop has finished)
with open("files.txt", "w") as f:
f.write("\n".join(files))
Putting it all together:
files = open("files.txt").readlines()
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
file = os.path.join(directory, filename)
if file in files:
continue # don't read or write
print(file)
#parse data from file
file_data = parser.from_file(file)
#get files text content
text = file_data['content']
#print(type(text))
print("len ", len(text))
#print(text)
#save to textfile
f = open("test2.txt", "a+", encoding = 'utf-8')
f.write(text)
f.close()
files.append(file)
else:
continue
with open("files.txt", "a+") as f:
f.write("\n".join(files))
Note that you need to create a file named files.txt in the current directory.

I want to create a corpus in python from multiple text files

I want to do text analytics on some text data. Issue is that so far i have worked on CSV file or just 1 file, but here I have multiple text files. So, my approach is to combine them all to 1 file and then use nltk to do some text pre processing and further steps.
I tried to download gutenberg pkg from nltk, and I am not getting any error in the code. But I am not able to see content of 1st text file in 1 cell, 2nd text file in 2nd cell and so on. Kindly help.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for filename in filenames:
f.write(nltk.corpus.gutenberg.raw(filename))
Expected result - I should get 1 csv file with contents of these 10 texts files listed in 10 different rows.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for index, filename in enumerate(filenames):
f.write(nltk.corpus.gutenberg.raw(filename))
# Append a comma to the file content when
# filename is not the content of the
# last file in the list.
if index != (len(filenames) - 1):
f.write(",")
Output:
this,is,a,sentence,spread,over,multiple,files,and,the end
Code and .txt files available at https://github.com/michaelhochleitner/stackoverflow.com-questions-57081411 .
Using Python 2.7.15+ and nltk 3.4.4 . I had to move the .txt files to /home/mh/nltk_data/corpora/gutenberg .

Resources