Split the file and further split specific index - python-3.x

I have a big configuration file which needs to be splitted per hierarchy based on specific syntax; completed. Among that on a specific string match I am pulling specific index and that index needs to be splitted further based on regex match"\s{8}exit" which is not working
import re
def config_readFile(filename):
F=open(filename,"r")
s=F.read()
F.close()
return re.split("#\-{50}",s)
return s
C = config_readFile('bng_config_input.txt')
print ('printing C:', C)
print (C.__len__())
vAAA_FIXED = []
for i in C:
if "CRS-1-SUB1" in i:
vAAA_FIXED.append(i)
# print (vAAA_FIXED)
print (vAAA_FIXED.__len__())
print(vAAA_FIXED)
vAAA_FIXED = vAAA_FIXED.split(" ")
print (vAAA_FIXED.__len__())
Need to get new list from the original list

Related

Instead of printing to console create a dataframe for output

I am currently comparing the text of one file to that of another file.
The method: for each row in the source text file, check each row in the compare text file.
If the word is present in the compare file then write the word and write 'present' next to it.
If the word is not present then write the word and write not_present next to it.
so far I can do this fine by printing to the console output as shown below:
import sys
filein = 'source.txt'
compare = 'compare.txt'
source = 'source.txt'
# change to lower case
with open(filein,'r+') as fopen:
string = ""
for line in fopen.readlines():
string = string + line.lower()
with open(filein,'w') as fopen:
fopen.write(string)
# search and list
with open(compare) as f:
searcher = f.read()
if not searcher:
sys.exit("Could not read data :-(")
#search and output the results
with open(source) as f:
for item in (line.strip() for line in f):
if item in searcher:
print(item, ',present')
else:
print(item, ',not_present')
the output looks like this:
dog ,present
cat ,present
mouse ,present
horse ,not_present
elephant ,present
pig ,present
what I would like is to put this into a pandas dataframe, preferably 2 columns, one for the word and the second for its state . I cant seem to get my head around doing this.
I am making several assumptions here to include:
Compare.txt is a text file consisting of a list of single words 1 word per line.
Source.txt is a free flowing text file, which includes multiple words per line and each word is separated by a space.
When comparing to determine if a compare word is in source, is is found if and only if, no punctuation marks (i.e. " ' , . ?, etc) are appended to the word in source .
The output dataframe will only contain the words found in compare.txt.
The final output is a printed version of the pandas dataframe.
With these assumptions:
import pandas as pd
from collections import defaultdict
compare = 'compare.txt'
source = 'source.txt'
rslt = defaultdict(list)
def getCompareTxt(fid: str) -> list:
clist = []
with open(fid, 'r') as cmpFile:
for line in cmpFile.readlines():
clist.append(line.lower().strip('\n'))
return clist
cmpList = getCompareTxt(compare)
if cmpList:
with open(source, 'r') as fsrc:
items = []
for item in (line.strip().split(' ') for line in fsrc):
items.extend(item)
print(items)
for cmpItm in cmpList:
rslt['Name'].append(cmpItm)
if cmpItm in items:
rslt['State'].append('Present')
else:
rslt['State'].append('Not Present')
df = pd.DataFrame(rslt, index=range(len(cmpList)))
print(df)
else:
print('No compare data present')

Splitting a list entry in Python

I am importing a CSV file into a list in Python. When I split it into list elements then print a index,the entry is printed like this.
2000-01-03,3.745536,4.017857,3.631696,3.997768,2.695920,133949200
How would I split this list so if I wanted to just print a solo element like this?
2000-01-03Here is my code so far.
def main():
list = []
filename = "AAPL.csv"
with open(filename) as x:
for line in x.readlines():
val = line.strip('\n').split(',')
list.append(val)
print(list[2])
Your current code build a list of lists, precisely a list (of rows) of lists (of fields).
To extract one single element, say first field of third row, you could do:
...
print(list[2][0])
But except for trivial tasks, you should use the csv module when processing csv file, because it is robust to corner cases like newlines or field separarors contained in fields. Your code could become:
def main():
list = []
filename = "AAPL.csv"
with open(filename) as x:
rd = csv.reader(x)
for val in rd: # the reader is an iterator of lists of fields
list.append(val)
print(list[2][0])

Is there a way to pass variable as counter to list index in python?

Sorry if i am asking very basic question but i am new to python and need help with below question
I am trying to write a file parser where i am counting number of occurrences(modified programs) mentioned in the file.
I am trying to then store all the occurrences in a empty list and putting counter for each occurrence.
Till here all is fine
Now i am trying to create files based on the names captured in the empty list and store the lines that are not matching between in separate file but i am getting error index out of range as when i am passing el[count] is taking count as string and not taking count's value.
Can some one help
import sys
import re
count =1
j=0
k=0
el=[]
f = open("change_programs.txt", 'w+')
data = open("oct-released_diff.txt",encoding='utf-8',errors='ignore')
for i in data:
if len(i.strip()) > 0 and i.strip().startswith("diff --git"):
count = count + 1
el.append(i)
fl=[]
else:
**filename = "%s.txt" % el[int (count)]**
h = open(filename, 'w+')
fl.append(i)
print(fl, file=h)
el = '\n'.join(el)
print(el, file=f)
print(filename)
data.close()

Forming a target string using minimal substrings of words from a word list

I have a text file with first names but there are new names added every year.
I need a program in Python that takes parts of names from the text file and finds some combination of substrings of these names that can be concatenated to create a string that matches a user's input.
The program should do this using the fewest possible available names from the text file.
For example, if the text file contains this:
Joppe
Fien
Katrijn
Sven
Kobe
The program asks for a name that isn't already in the text file. For example:
Please fill in a name: Katrien
Then it should print this:
Katri => Katrijn
ien => Fien
Not like this--it builds the name correctly, but there is a better solution that uses fewer words:
K => Kobe
a => Katrijn
tr => Katrijn
ien => Fien
If the text file contains this:
Joppe
Fien
Makatrijn
Sven
Kobe
It could also print this:
Katr => Makatrijn
ien => Fien
I tried this but with no result:
name_input = input('Fill in a name: ')
with open('namen.txt', 'r') as file:
for name in file.readlines():
for letter_name_input in name_input:
for letter in name:
if letter == letter_name_input:
print(letter)
You can use a function that takes a target name and a set of names as input, tries matching a prefix of the target name with each name in the set of names, from the longest to the shortest, and for each matching name, recursively finds the names that would form the target name with the prefix removed, from the set of names with the matching name removed, and yields each of the returning combinations with the current prefix and name prepended as a tuple:
def form_name(target, names):
if target:
for i in range(len(target), 0, -1):
prefix = target[:i]
matching_names = [name for name in names if prefix.lower() in name.lower()]
if matching_names:
for name in matching_names:
for fragments in form_name(target[i:], names - {name}):
yield [(prefix, name), *fragments]
else:
yield []
so that you can use the min function with len as the key function to obtain the combination with the fewest names:
from io import StringIO
file = StringIO('''Joppe
Fien
Katrijn
Sven
Kobe''')
for fragment, name in min(form_name('Katrien', set(file.read().split())), key=len):
print(fragment, '=>', name)
outputs:
Katri => Katrijn
en => Fien
Demo: https://repl.it/repls/IllustriousTrustingIntegrationtesting
Note that both Fien and Sven in your example input would match the en fragment and make for valid answers with the fewest names, so the min function would arbitrarily return one of them (which is fine per your requirement). Also note that you shouldn't expect the fragments of the target name to overlap, so instead of ien the second fragment should be en after the first fragment Katri is removed from the target name Katrien.
If you're interested in seeing all the valid answers, you can calculate the minimum length of all the combinations first and then output all the combinations with the minimum length:
combinations = list(form_name('Katrien', set(file.read().split())))
min_len = min(map(len, combinations))
for combination in combinations:
if len(combination) == min_len:
for fragment, name in combination:
print(fragment, '=>', name)
print()
This outputs:
Katri => Katrijn
en => Sven
Katri => Katrijn
en => Fien
Katr => Katrijn
ien => Fien
Assuming you'd want to stop searching as soon as you find a shortest answer, here's my solution:
First you need a function to break the word into all possible parts starting from the biggest possible set:
def breakWord(word, n):
list = []
for k in range(len(word)):
subword = word[k:]
out = [(subword[i:i+n]) for i in range(0, len(subword), n)]
if(k > 0):
out.append(word[:k])
list.append(out)
return list
Notice that if you use:
breakWord(yourWord, len(yourWord)-1)
It will break the word into all possible sets of two parts.
Then a function to check if a given string is in the list of names:
def isInNames(word):
for name in name_list:
if(word in name):
return true
return false
Finally iterate over the whole possible combination of characters:
def findWordCombination(word):
resultSet = []
resultSize = 50 #Something large to ensure it gets changed
for i in range(len(word)-1, 0, -1): #Will go from max to minimum
testSet = breakWord(word, i)
for set in testSet:
isValid = true #assumes true at first
for part in set:
if(not isInNames(part)):
isValid = false
#Once all parts of the set are checked we find
#If the set is valid. i.e. it is a valid combination.
if(isValid and len(set) < resultSize):
resultSize = len(set)
resultList = set
return resultList
This will return the first set that finds with the minimum possible combination of subwords from your search query. You can tweak it to have it store the words names from the list that yielded the resulting set.
Yet another approach (I upvoted #blhsing's recursive solution already, very elegant, I love it)
import itertools as it
from collections import defaultdict
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
names = ['Joppe', 'Fien', 'Katrijn', 'Sven', 'Kobe']
d = defaultdict(list) # each key is a substring of any of the names and the value is the list of names that contain it
for name in names:
for subname in get_all_substrings(name):
d[subname].append(name)
input_name = 'Katrien'
input_subs = get_all_substrings(input_name)
sub_combs = [it.combinations(input_subs, n) for n in range(1,len(input_name))]
whole_combs = [el for co in sub_combs for el in co if ''.join(el) == input_name] # those combs that can form the input name
saved = [wc for wc in whole_combs if all((c in d for c in wc))] # those whole combinations that actually appear
shortest_comb = min(saved, key=len)
shortest_sub_and_name = [(s, d[s]) for s in shortest_comb]
for s, ns in shortest_sub_and_name:
print(f"{s} => {ns}")
produces
Katr => ['Katrijn']
ien => ['Fien']
Note: as you can see, the output shows all the names that can contribute to each specific substring
you could try:
import difflib
name = input('Please fill in a name: ')
with open('namen.txt', 'r') as file:
file_data = file.readlines()
# either you are looking for
print([i for i in file_data if difflib.SequenceMatcher(a = i,b = name).ratio() >= 0.5])
#or you are looking for
print(difflib.get_close_matches(name,file_data,len(file_data),0.5))
['Katrijn\n', 'Fien\n']

How to Read Multiple Files in a Loop in Python and get count of matching words

I have two text files and 2 lists (FIRST_LIST,SCND_LIST),i want to find out count of each file matching words from FIRST_LIST,SCND_LIST individually.
FIRST_LIST =
"accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST=
"accessorize","accessorized","accessorizes","accessorizing"
text File1 contains:
This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.
text File2 contains:
is more applied,using accessorize accessorized,accessorizes,accessorizing
output
File1 first list count=2
File1 second list count=0
File2 first list count=0
File2 second list count=4
This code i have tried to achive this functionality but not able to get the expected output.
if any help appreciated
import os
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
two_files=[]
for filename in files:
for line in open(filename):
#two_files.append(remove_punctuation(line))
print(remove_punctuation(line),end='')
two_files.append(remove_punctuation(line))
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
for match in FIRST_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
c.append(match)
print(c)
len(c)
d=[]
for match in SCND_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
d.append(match)
print(d)
len(d)
Using Counter and some list comprehension is one of many different approaches to solve your problem.
I assume, your sample output being wrong since some words are part of both lists and both files but are not counted. In addition I added a second line to the sample strings in order to show how that is working with multi-line strings which might be the typical contents of a given file.
io.StringIO objects emulate your files, but working with real files from your file system works exactly the same since both provide a file-like object or file-like interface:
from collections import Counter
list_a = ["accessorizes", "accessorizing", "accessorized", "accessorize"]
list_b = ["accessorize", "accessorized", "accessorizes", "accessorizing"]
# added a second line to each string just for the sake
file_contents_a = 'This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.\nThis is the second line in file a'
file_contents_b = 'is more applied,using accessorize accessorized,accessorizes,accessorizing\nThis is the second line in file b'
# using io.StringIO to simulate a file input (--> file-like object)
# you should use `with open(filename) as ...` for real file input
file_like_a = io.StringIO(file_contents_a)
file_like_b = io.StringIO(file_contents_b)
# read file contents and split lines into a list of strings
lines_of_file_a = file_like_a.read().splitlines()
lines_of_file_b = file_like_b.read().splitlines()
# iterate through all lines of each file (for file a here)
for line_number, line in enumerate(lines_of_file_a):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# iterate through all lines of each file (for file b here)
for line_number, line in enumerate(lines_of_file_b):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# actually, your two lists are the same
lists_are_equal = sorted(list_a) == sorted(list_b)
print(lists_are_equal)

Resources