I am trying to extract a set of alpha numeric characters from a text file.
below would be some lines in the file. I want to extract the '#' as well as anything that follows.
im trying to pull #bob from a file.
this is a #line in the #file
#bob is a wierdo
the below code is what I have so far.
def getAllPeople(fileName):
#give empty list
allPeople=[]
#open TweetsFile.txt
with open(fileName, 'r') as f1:
lines=f1.readlines()
#split all words into strings
for word in lines:
char = word.split("#")
print(char)
#close the file
f1.close()
What I am trying to get is;
['#bob','#line','#file', '#bob']
If you do not want to use re, take Andrew's suggestion
mentions = list(filter(lambda x: x.startswith('#'), tweet.split()))
otherwise, see the marked duplicate.
mentions = [w for w in tweet.split() if w.startswith('#')]
since you apparently can not use filter or lambda.
Related
I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)
I am doing this as an assignment. So, I need to read a file and remove lines that start with a specific word.
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
for word in text:
words = word.split(' ')
first_word = words[0]
for first in word:
if first[0] == rec:
text = text.pop(rec)
return text
print(text)
return text
delete_lines(fajl, rec)
At the last for loop, I completely lost control of what I am doing. Firstly, I can't use pop. So, once I locate the word, I need to somehow delete lines that start with that word. Additionally, there is also one minor problem with my approach and that is that first_word gets me the first word but the , also if it is present.
Example text from a file(file.txt):
This is some text on one line.
The text is irrelevant.
This would be some specific stuff.
However, it is not.
This is just nonsense.
rec = input("Word:") --- This
Output:
The text is irrelevant.
However, it is not.
You cannot modify an array while you are iterating over it. But you can iterate over a copy to modify the original one
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
# let's iterate over a copy to modify
# the original one without restrictions
for word in text[:]:
# compare with lowercase to erase This and this
if word.lower().startswith(rec.lower()):
# Remove the line
text.remove(word)
newtext="".join(text) # join all the text
print(newtext) # to see the results in console
# we should now save the file to see the results there
with open(fajl,"w") as file:
file.write(newtext)
print(delete_lines(fajl, rec))
Tested with your sample text. if you want to erase "this". The startswith method will wipe "this" or "this," alike. This will only delete the text and let any blank lines alone. if you don't want them you can also compare with "\n" and remove them
I am a beginner in Python. I have a file having single line of data. My requirement is to extract "n" characters after certain words for their first occurrence only. Also, those words are not sequential.
Data file: {"id":"1234566jnejnwfw","displayId":"1234566jne","author":{"name":"abcd#xyz.com","datetime":15636378484,"displayId":"23423426jne","datetime":4353453453}
I want to fetch value after first match of "displayId" and before "author", i.e.; 1234566jne. Similarly for "datetime".
I tried breaking the line based upon index as the word and putting it into another file for further cleaning up to get the exact value.
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data file") as openfile:
for line in openfile:
tmpFileOpen.write(line[line.index(displayId) + len(displayId):])
However, I am sure this is not a good solution to work further.
Can anyone please help me on this?
This answer should work for any displayId with a similar format as in your question. I decided not to load the JSON file for this answer, because it wasn't needed to accomplish the task.
import re
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open('data_file.txt', 'r') as input:
lines = input.read()
# Use regex to find the displayId element
# example: "displayId":"1234566jne
# \W matches none words, such as " and :
# \d matches digits
# {6,8} matches digits lengths between 6 and 8
# [a-z] matches lowercased ASCII characters
# {3} matches 3 lowercased ASCII characters
id_patterns = re.compile(r'\WdisplayId\W{3}\d{6,8}[a-z]{3}')
id_results = re.findall(id_patterns, lines)
# Use list comprehension to clean the results
clean_results = ([s.strip('"displayId":"') for s in id_results])
# loop through clean_results list
for id in clean_results:
# Write id to temp file on separate lines
tmpFileOpen.write('{} \n'.format(id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne
This answer does load the JSON file, but this answer will fail if the JSON file format changes.
import json
tmpFile = 'tmpFile.txt'
tmpFileOpen = open(tmpFile, "w+")
# Load the JSON file
jdata = json.loads(open('data_file.txt').read())
# Find the first ID
first_id = (jdata['displayId'])
# Write the first ID to the temp file
tmpFileOpen.write('{} \n'.format(first_id))
# Find the second ID
second_id = (jdata['author']['displayId'])
# Write the second ID to the temp file
tmpFileOpen.write('{} \n'.format(second_id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne
If I understand correctly your question, you can achieve this by doing the following:
import json
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data.txt") as openfile:
for line in openfile:
// Loads the json to a dict in order to manipulate it easily
data = json.loads(str(line))
// Here I specify that I want to write to my tmp File only the first 3
// characters of the field `displayId`
tmpFileOpen.write(data['displayId'][:3])
This can be done because the data in your file is JSON, however if the format changes it won't work
I have a text file say storyfile.txt
Content in storyfile.txt is as
'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe
I have another file- hashfile.txt that contains some words separated by comma(,)
Content of hashfile.txt is:
All,mimsy,were,the,borogoves,raths,outgrabe
My objective
My objective is to
1. Read hashfile.txt
2. Insert Hashtag on each of the comma separated word
3. Read storyfile.txt . Search for same words as in hashtag.txt and add hashtag on these words.
4. Update storyfile.txt with words that are hash-tagged
My Python code so far
import in_place
hashfile = open('hashfile.txt', 'w+')
n1 = hashfile.read().rstrip('\n')
print(n1)
checkWords = n1.split(',')
print(checkWords)
repWords = ["#"+i for i in checkWords]
print(repWords)
hashfile.close()
with in_place.InPlace('storyfile.txt') as file:
for line in file:
for check, rep in zip(checkWords, repWords):
line = line.replace(check, rep)
file.write(line)
The output
can be seen here
https://dpaste.de/Yp35
Why is this kind of output is coming?
Why the last sentence has no newlines in it?
Where I am wrong?
The output
attached image
The current working code for single text
import in_place
with in_place.InPlace('somefile.txt') as file:
for line in file:
line = line.replace('mome', 'testZ')
file.write(line)
Look if this helps. This fulfills the objective that you mentioned, though I have not used the in_place module.
hash_list = []
with open("hashfile.txt", 'r') as f:
for i in f.readlines():
for j in i.split(","):
hash_list.append(j.strip())
with open("storyfile.txt", "r") as f:
for i in f.readlines():
for j in hash_list:
i = i.replace(j, "#"+j)
print(i)
Let me know if you require further clarification on the same.
I need to search for a name in a file and in the line starting with that name, I need to replace the fourth item in the list that is separated my commas. I have began trying to program this with the following code, but I have not got it to work.
with open("SampleFile.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(str(String1), str(String2)))
with open("SampleFile.txt", "w") as f:
for line in newline :
f.writelines(line)
#this piece of code replaced every occurence of String1 with String 2
f = open("SampleFile.txt", "r")
for line in f:
if line.startswith(Name):
if line.contains(String1):
newline = line.replace(str(String1), str(String2))
#this came up with a syntax error
You could give some dummy data which would help people to answer your question. I suppose you to backup your data: You can save the edited data to a new file or you can backup the old file to a backup folder before working on the data (think about using "from shutil import copyfile" and then "copyfile(src, dst)"). Otherwise by making a mistake you could easily ruin your data without being able to easily restore them.
You can't replace the string with "newline = line.replace(str(String1), str(String2))"! Think about "strong" as your search term and a line like "Armstrong,Paul,strong,44" - if you replace "strong" with "weak" you would get "Armweak,Paul,weak,44".
I hope the following code helps you:
filename = "SampleFile.txt"
filename_new = filename.replace(".", "_new.")
search_term = "Smith"
with open(filename) as src, open(filename_new, 'w') as dst:
for line in src:
if line.startswith(search_term):
items = line.split(",")
items[4-1] = items[4-1].replace("old", "new")
line = ",".join(items)
dst.write(line)
If you work with a csv-file you should have a look at the csv module.
PS My files contain the following data (the filenames are not in the files!!!):
SampleFile.txt SampleFile_new.txt
Adams,George,m,old,34 Adams,George,m,old,34
Adams,Tracy,f,old,32 Adams,Tracy,f,old,32
Smith,John,m,old,53 Smith,John,m,new,53
Man,Emily,w,old,44 Man,Emily,w,old,44