This question already has answers here:
advanced string formatting vs template strings
(6 answers)
Closed 2 years ago.
I have a text file where most of the text is boilerplate, with around two dozen variables that I'm changing in Python depending on the room that the file pertains to. Which method of replacing the text is "better", wrapping the entire text file into one big triple quoted f'string', or stacking up a bunch of .replace() ?
The file isn't very big and there's only about 300 rooms, so in my case milliseconds don't really matter. I'm thinking that for readability and future edits the .replace() way would be better, but I don't want to create a bad habit if doing it that way is a bad idea. Thanks in advance for any help.
simplified pseudo code:
class Thing:
def __init__(self, name, var1, var2, var3):
self.name = name
self.var1 = var1
self.var2 = var2
self.var3 = var3
def doing_it_the_replace_way(thing):
with open('template.txt', 'r') as file:
file_data = file.read()
file_data = file_data.replace('placeholder_name', 'name')
file_data = file_data.replace('placeholder1', 'var1')
file_data = file_data.replace('placeholder2', 'var2')
file_data = file_data.replace('placeholder3', 'var3') # etc.
with open('output file.txt', 'w') as file:
file.write(file_data)
def doing_it_the_f_string_way(thing):
file_data = f"""This is the entire template text from {thing.var1} about the time I got a
{thing.var2} stuck in my {thing.var3} at band camp."""
with open('output file.txt', 'w') as file:
file.write(file_data)
I'd use neither.
Using regex will be safer (ie you don't need to f-string and eval the entire file) and scalable (you don't need 30 calls to str.replace if you have 30 variables, just an entry in the mapping dict).
import re
table = {'<time>': '12:00',
'<date>': '1.1.1970'}
# imagine this being read from a file
string = '<time>, fixed text, <date>'
print(re.sub(r'(<.+?>)', lambda s: table.get(s.group(1), s.group(1)), string))
outputs
12:00, fixed text, 1.1.1970
Adapting to your case (where the values are attributes of an object)
All you have to do is use the object as the values for mapping dict.
...
thing = Thing('name', 'a', 'b', 'c')
table = {'<time>': thing.var1,
'<date>': thing.var2}
...
This can become cumbersome if you need to do something more complex (like if you have multiple objects) but of course it can be improved depending on your exact use-case.
For example, if the name of the placeholders coincide with the name of the attributes in the object you can just use vars as the mapping (don't forget to remove the < and > from the regex capturing group):
import re
class Thing:
def __init__(self, name, var1, var2, var3):
self.name = name
self.var1 = var1
self.var2 = var2
self.var3 = var3
thing = Thing('name', 'a', 'b', 'c')
string = '<var1>, fixed text, <var2>'
print(re.sub(r'<(.+?)>', lambda s: vars(thing).get(s.group(1), s.group(1)), string))
outputs
a, fixed text, b
Related
For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function
i have a text file i want to remove punctuation and save it as a new file but it is not removing anything any idea why?
code:
def punctuation(string):
punctuations = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
for x in string.lower():
if x in punctuations:
string = string.replace(x, "")
# Print string without punctuation
print(string)
file = open('ir500.txt', 'r+')
file_no_punc = (file.read())
punctuation(l)
with open('ir500_no_punc.txt', 'w') as file:
file.write(file_no_punc)
removing any punctuation why?
def punctuation(string):
punctuations = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
for x in string.lower():
if x in punctuations:
string = string.replace(x, "")
# return string without punctuation
return string
file = open('ir500.txt', 'r+')
file_no_punc = (file.read())
file_no_punc = punctuation(file_no_punc)
with open('ir500_no_punc.txt', 'w') as file:
file.write(file_no_punc)
Explanation:
I changed only punctuation(l) to file_no_punc = punctuation(file_no_punc) and print(string) to return string
1) what is l in punctuation(l) ?
2) you are calling punctuation() - which works correctly - but do not use its return value
3) because it is not currently returning a value, just printing it ;-)
Please note that I made only the minimal change to make it work. You might want to post it to our code review site, to see how it could be improved.
Also, I would recommend that you get a good IDE. In my opinion, you cannot beat PyCharm community edition. Learn how to use the debugger; it is your best friend. Set breakpoints, run the code; it will stop when it hits a breakpoint; you can then examine the values of your variables.
taking out the file reading/writing, you could to remove the punctuation from a string like this:
table = str.maketrans("", "", r"!()-[]{};:'\"\,<>./?##$%^&*_~")
# # or maybe even better
# import string
# table = str.maketrans("", "", string.punctuation)
file_with_punc = r"abc!()-[]{};:'\"\,<>./?##$%^&*_~def"
file_no_punc = file_with_punc.lower().translate(table)
# abcdef
where i use str.maketrans and str.translate.
note that python strings are immutable. there is no way to change a given string; every operation you perform on a string will return a new instance.
I have a big chunk of json code. I assign the needed me values to more than +10 variables. Now I want to print all variable_name = value using print how I can accomplish this task
Expected output is followed
variable_name_1 = car
variable_name_2 = house
variable_name_3 = dog
Updated my code example
leagues = open("../forecast/endpoints/leagues.txt", "r")
leagues_json = json.load(leagues)
data_json = leagues_json["api"["leagues"]
for item in data_json:
league_id = item["league_id"]
league_name = item["name"]
coverage_standings = item["coverage"]["standings"]
coverage_fixtures_events =
item["coverage"]["fixtures"]["events"]
coverage_fixtures_lineups =
item["coverage"]["fixtures"]["lineups"]
coverage_fixtures_statistics =
item["coverage"]["fixtures"]["statistics"]
coverage_fixtures_players_statistics = item["coverage"]["fixtures"]["players_statistics"]
coverage_players = item["coverage"]["players"]
coverage_topScorers = item["coverage"]["topScorers"]
coverage_predictions = item["coverage"]["predictions"]
coverage_odds = item["coverage"]["odds"]
print("leagueName:" league_name,
"coverageStandings:" coverage_standings,
"coverage_fixtures_events:"
coverage_fixtures_events,
"coverage_fixtures_lineups:"
coverage_fixtures_lineups,
"coverage_fixtures_statistics:"
coverage_fixtures_statistics,
"covage_fixtes_player_statistics:"
covage_fixres_players_statistics,
"coverage_players:"
coverage_players,
"coverage_topScorers:"
coverage_topScorers,
"coverage_predictions:"
coverage_predictions,
"coverage_odds:"coverage_odds)
Since you have the JSON data loaded as Python objects, you should be able to use regular loops to deal with at least some of this.
It looks like you're adding underscores to indicate nesting levels in the JSON object, so that's what I'll do here:
leagues = open("../forecast/endpoints/leagues.txt", "r")
leagues_json = json.load(leagues)
data_json = leagues_json["api"]["leagues"]
def print_nested_dict(data, *, sep='.', context=''):
"""Print a dict, prefixing all values with their keys,
and joining nested keys with 'separator'.
"""
for key, value in data.items():
if context:
key = context + sep + key
if isinstance(value, dict):
print_nested_dict(value, sep=sep, context=key)
else:
print(key, ': ', value, sep='')
print_nested_dict(data_json, sep='_')
If there is other data in data_json that you do not want to print, the easiest solution might be to add a variable listing the names you want, then add a condition to the loop so it only prints those names.
def print_nested_dict(data, *, separator='.', context=None, only_print_keys=None):
...
for key, value in data.items():
if only_print_keys is not None and key not in only_print_keys:
continue # skip ignored elements
...
That should work fine unless there is a very large amount of data you're not printing.
If you really need to store the values in variables for some other reason, you could assign to global variables if you don't mind polluting the global namespace.
def print_nested_dict(...):
...
else:
name = separator.join(contet)
print(name, ': ', value, sep='')
globals()[name] = value
...
I'm kind of on a time crunch, but this was one of my problems in my homework assignment. I am stuck, and I don't know what to do or how to proceed.
Our assignment was to open various text files and within each of the text files, we are supposed to add each word into a dictionary in which the key is the document number it came from, and the value is the word.
For example, one text file would be:
1
Hello, how are you?
I am fine and you?
Each of the text files begin with a number corresponding to it's title (for example, "document1.txt" begins with "1", "document2.txt" begins with "2", etc)
My teacher gave us this coding to help with stripping the punctuation and the lines, but I am having a hard time figuring out where to implement it.
data = re.split("[ .,:;!?\s\b]+|[\r\n]+", line)
data = filter(None, data)
I don't really understand where the filter(None, data) stuff comes into play, because all it does is return a code line of what it represents in memory.
Here's my code so far:
def invertFile(list_of_file_names):
import re
diction = {}
emplist = []
fordiction = []
for x in list_of_file_names:
afile = open(x, 'r')
with afile as f:
for line in f:
savedSort = filterText(f)
def filterText(line):
import re
word_delimiters = [' ', ',', ';', ':', '.','?','!']
data = re.split("[ .,:;!?\s\b]+|[\r\n]+", f)
key, value = data[0], data[1:]
diction[key] = value
How do I make it so each word is appended into a dictionary, where the key is the document it comes from, and the value are the words in the document? Thank you.
My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")
you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!