How to read many files have a specific format in python

How to read many files have a specific format in python - python-3.x

I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help

I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)

Related

Extract n characters for the first match of a word in a file

I am a beginner in Python. I have a file having single line of data. My requirement is to extract "n" characters after certain words for their first occurrence only. Also, those words are not sequential.
Data file: {"id":"1234566jnejnwfw","displayId":"1234566jne","author":{"name":"abcd#xyz.com","datetime":15636378484,"displayId":"23423426jne","datetime":4353453453}
I want to fetch value after first match of "displayId" and before "author", i.e.; 1234566jne. Similarly for "datetime".
I tried breaking the line based upon index as the word and putting it into another file for further cleaning up to get the exact value.
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data file") as openfile:
for line in openfile:
tmpFileOpen.write(line[line.index(displayId) + len(displayId):])
However, I am sure this is not a good solution to work further.
Can anyone please help me on this?

This answer should work for any displayId with a similar format as in your question. I decided not to load the JSON file for this answer, because it wasn't needed to accomplish the task.
import re
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open('data_file.txt', 'r') as input:
lines = input.read()
# Use regex to find the displayId element
# example: "displayId":"1234566jne
# \W matches none words, such as " and :
# \d matches digits
# {6,8} matches digits lengths between 6 and 8
# [a-z] matches lowercased ASCII characters
# {3} matches 3 lowercased ASCII characters
id_patterns = re.compile(r'\WdisplayId\W{3}\d{6,8}[a-z]{3}')
id_results = re.findall(id_patterns, lines)
# Use list comprehension to clean the results
clean_results = ([s.strip('"displayId":"') for s in id_results])
# loop through clean_results list
for id in clean_results:
# Write id to temp file on separate lines
tmpFileOpen.write('{} \n'.format(id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne
This answer does load the JSON file, but this answer will fail if the JSON file format changes.
import json
tmpFile = 'tmpFile.txt'
tmpFileOpen = open(tmpFile, "w+")
# Load the JSON file
jdata = json.loads(open('data_file.txt').read())
# Find the first ID
first_id = (jdata['displayId'])
# Write the first ID to the temp file
tmpFileOpen.write('{} \n'.format(first_id))
# Find the second ID
second_id = (jdata['author']['displayId'])
# Write the second ID to the temp file
tmpFileOpen.write('{} \n'.format(second_id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne

If I understand correctly your question, you can achieve this by doing the following:
import json
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data.txt") as openfile:
for line in openfile:
// Loads the json to a dict in order to manipulate it easily
data = json.loads(str(line))
// Here I specify that I want to write to my tmp File only the first 3
// characters of the field `displayId`
tmpFileOpen.write(data['displayId'][:3])
This can be done because the data in your file is JSON, however if the format changes it won't work

issue in saving string list in to text file

I am trying to save and read the strings which are saved in a text file.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'D:\\Trails\\test.txt'
# writing list to txt file
thefile = open(file,'w')
for item in a:
thefile.write("%s\n" % item)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = readfile.readlines()#
print(a[0][0])
print(data[0][1]) # display data read
the output:
str1
'
both a[0][0] and data[0][0] should have the same value, reading which i saved returns empty. What is the mistake in saving the file?
Update:
the 'a' array is having strings on different lengths. what are changes that I can make in saving the file, so that output will be the same.
Update:
I have made changes by saving the file in csv instead of text using this link, incase of text how to save the data ?

You can save the list directly on file and use the eval function to translate the saved data on file in list again. Isn't recommendable but, the follow code works.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'test.txt'
# writing list to txt file
thefile = open(file,'w')
thefile.write("%s" % a)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = eval(readfile.readline())
print(data)
print(a[0][0])
print(data[0][1]) # display data read
print(a)
print(data)

a and data will not have same value as a is a list of three lists.
Whereas data is a list with three strings.
readfile.readlines() or list(readfile) writes all lines in a list.
So, when you perform data = readfile.readlines() python consider ['str1','str2','str3']\n as a single string and not as a list.
So,to get your desired output you can use following print statement.
print(data[0][2:6])

How do I replace the 4th item in a list that is in a file that starts with a particular string?

I need to search for a name in a file and in the line starting with that name, I need to replace the fourth item in the list that is separated my commas. I have began trying to program this with the following code, but I have not got it to work.
with open("SampleFile.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(str(String1), str(String2)))
with open("SampleFile.txt", "w") as f:
for line in newline :
f.writelines(line)
#this piece of code replaced every occurence of String1 with String 2
f = open("SampleFile.txt", "r")
for line in f:
if line.startswith(Name):
if line.contains(String1):
newline = line.replace(str(String1), str(String2))
#this came up with a syntax error

You could give some dummy data which would help people to answer your question. I suppose you to backup your data: You can save the edited data to a new file or you can backup the old file to a backup folder before working on the data (think about using "from shutil import copyfile" and then "copyfile(src, dst)"). Otherwise by making a mistake you could easily ruin your data without being able to easily restore them.
You can't replace the string with "newline = line.replace(str(String1), str(String2))"! Think about "strong" as your search term and a line like "Armstrong,Paul,strong,44" - if you replace "strong" with "weak" you would get "Armweak,Paul,weak,44".
I hope the following code helps you:
filename = "SampleFile.txt"
filename_new = filename.replace(".", "_new.")
search_term = "Smith"
with open(filename) as src, open(filename_new, 'w') as dst:
for line in src:
if line.startswith(search_term):
items = line.split(",")
items[4-1] = items[4-1].replace("old", "new")
line = ",".join(items)
dst.write(line)
If you work with a csv-file you should have a look at the csv module.
PS My files contain the following data (the filenames are not in the files!!!):
SampleFile.txt SampleFile_new.txt
Adams,George,m,old,34 Adams,George,m,old,34
Adams,Tracy,f,old,32 Adams,Tracy,f,old,32
Smith,John,m,old,53 Smith,John,m,new,53
Man,Emily,w,old,44 Man,Emily,w,old,44

Python - Spyder 3 - Open a list of .csv files and remove all double quotes in every file

I've read every thing I can find and tried about 20 examples from SO and google, and nothing seems to work.
This should be very simple, but I cannot get it to work. I just want to point to a folder, and replace every double quote in every file in the folder. That is it. (And I don't know Python well at all, hence my issues.) I have no doubt that some of the scripts I've tried to retask must work, but my lack of Python skill is getting in the way. This is as close as I've gotten, and I get errors. If I don't get errors it seems to do nothing. Thanks.
import glob
import csv
mypath = glob.glob('\\C:\\csv\\*.csv')
for fname in mypath:
with open(mypath, "r") as infile, open("output.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
writer.writerow(item.replace("""", "") for item in row)

You don't need to use csv-specific file opening and writing, I think that makes it more complex. How about this instead:
import os
mypath = r'\path\to\folder'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = fpath + '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace('"', '')) # Remove each " and write the line
Let me know if this works, and respond with any error messages you may have.

I found the solution to this based on the original answer provided by u/Jeff. It was actually smart quotes (u'\u201d') to be exact, not straight quotes. That is why I could get nothing to work. That is a great way to spend like two days, now if you'll excuse me I have to go jump off the roof. But for posterity, here is what I used that worked. (And note - there is the left curving smart quote as well - that is u'\u201c'.
mypath = 'C:\\csv\\'
myoutputpath = 'C:\\csv\\output\\'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = os.path.join(myoutputpath, file) #+ '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile:
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace(u'\u201d', ''))# Remove each " and write the line
infile.close()
outfile.close()

Automatically naming output txt file in Python

I have 4 lists called I_list, Itiso, ItHDKR and Itperez and I would like to receive .txt output files with the data of these lists. I am trying to make Python rename automatically the name of the .txt output files in terms of some of my input data. In this way, the .txt output files will always have different names.
Now I am programming the following commands:
Horizontal_radiation = []
Isotropic_radiation = []
HDKR_radiation = []
Perez_radiation = []
Horizontal = open("outputHorizontal.txt", 'w')
Isotropic = open("outputIsotropic.txt", 'w')
HDKR = open("outputHDKR.txt", 'w')
Perez = open("outputPerez.txt", 'w')
for i in I_list:
Horizontal_radiation.append(i)
for x in Itiso:
Isotropic_radiation.append(x)
for y in ItHDKR:
HDKR_radiation.append(y)
for z in Itperez:
Perez_radiation.append(z)
Horizontal.write(str(Horizontal_radiation))
Isotropic.write(str(Isotropic_radiation))
HDKR.write(str(HDKR_radiation))
Perez.write(str(Perez_radiation))
Horizontal.close()
Isotropic.close()
HDKR.close()
Perez.close()
As you can see, the name of the .txt output file is fixed as "outputHorizontal.txt" (the first one). Is there any way to change this name and put it according to a input? For example, one of my inputs is the latitude, as 'lat'. I am trying to make the output file name be expressed in terms of 'lat', in this way everytime I run the program the name would be different, because now I always get the same name and the file is overwritten.
Thank you very much people, kind regards.

You can pass a string variable as the output file name. For example you could move the file declarations after you add elements to the lists (and before you write them) and use
Horizontal = open(str(Horizontal_radiation[0]), 'w')
Or just add a timestamp to the file name if it's all about don't overwriting files
Horizontal = open("horizontal-%s".format(datetime.today()), 'w')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to read many files have a specific format in python - python-3.x

Related

Extract n characters for the first match of a word in a file

issue in saving string list in to text file

How do I replace the 4th item in a list that is in a file that starts with a particular string?

Python - Spyder 3 - Open a list of .csv files and remove all double quotes in every file

Automatically naming output txt file in Python

Categories

Resources