How to achieve below situation using python list comprehension? - python-3.x

rows = [(d = re.split("\s{2,}|\|", line)) for line in lines if len(d) > 5 and d[0]!='' ]
As in the code snippet shown, I am splitting a list of lines by spaces in each line. I am trying to assign split to a variable d so that I can use it later in if condition and can avoid repetitive split.
Is there way to achieve it?

rows = [d for d in [re.split("\s{2,}|\|", line) for line in lines] if len(d) > 5 and d[0]!='']

Related

Reads a series of lines Python

Can someone enlighten me how to do this?
Write a Python program that reads a series of lines one by one from the keyboard (ending by an empty line) and, at the end, outputs the number of times that the first line occurred. For example, if it reads
hello
world
We say hello
hello
Birkbeck
hello
it would output 3 since the first line ("hello") occurred three times.
You may assume that the user enters at least two non-empty lines before the empty line.
not sure if you want to separate word with spaces or with the newline character and if we count the first occurence.
This is a sample solution for spaces separation between words. This works for the example that you provided
freq_dict = {}
word = input().split()
for w in word:
if w not in freq_dict.keys():
freq_dict[w] = 0
else:
freq_dict[w] += 1
print(freq_dict[word[0]])

How to count strings in specified field within each line of one or more csv files

Writing a Python program (ver. 3) to count strings in a specified field within each line of one or more csv files.
Where the csv file contains:
Field1, Field2, Field3, Field4
A, B, C, D
A, E, F, G
Z, E, C, D
Z, W, C, Q
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv
And the result is:
A 10
C 7
D 2
E 2
Z 2
B 1
Q 1
F 1
G 1
W 1
ERROR
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv file.csv file.csv
Where the error occurs:
for rowitem in reader:
for pos in field:
pos = rowitem[pos] ##<---LINE generating error--->##
if pos not in fieldcnt:
fieldcnt[pos] = 1
else:
fieldcnt[pos] += 1
TypeError: list indices must be integers or slices, not str
Thank you!
Judging from the output, I'd say that the fields in the csv file does not influence the count of the string. If the string uniqueness is case-insensitive please remember to use yourstring.lower() to return the string so that different case matches are actually counted as one. Also do keep in mind that if your text is large the number of unique strings you might find could be very large as well, so some sort of sorting must be in place to make sense of it! (Or else it might be a long list of random counts with a large portion of it being just 1s)
Now, to get a count of unique strings using the collections module is an easy way to go.
file = open('yourfile.txt', encoding="utf8")
a= file.read()
#if you have some words you'd like to exclude
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['<media','omitted>','it\'s','two','said']))
# make an empty key-value dict to contain matched words and their counts
wordcount = {}
for word in a.lower().split(): #use the delimiter you want (a comma I think?)
# replace punctuation so they arent counted as part of a word
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("!","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
That should do it. The wordcount dict should contain the word and it's frequency. After that just sort it using collections and print it out.
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(20):
print(word, ": ", count)
I hope this solves your problem. Lemme know if you face problems.

How to process each word in a python program

I want to write a program that reads every word from every line of a text file.
I tried using nested loop but the second loop starts reading each word. Can someone explain this? Accodrding to me it should read the individual words instead of letters.
fh=open("romeo.txt")
d=dict()
c=0
for i in fh:
for j in i:
d[c]=j
c+=1
print(d)
for i in d:
print(d.get('moon',None))
the output is shown in Picture 1
I made a code which does the thing I want but is there any short way to do it?
fh=open("romeo.txt")
d=dict()
c=0
for i in fh:
i=i.rstrip()
print("by the first loop ######################", i)
k=i.split()
for j in k:
print("by the second loop ##################", j)
d[c]=j
c+=1
print(d)
the output which I want is given in Picture 2
Also, can I use split() function here to do it?
How can I use it because it seems to get only the last line of the file as a list and I want all the words in list or dictionary.
Thank You
for i in fh:
This line iterates through each line of text in the file
for j in i:
Since i is a string, this line iterates through each letter in each line. Instead of doing it this way, split() the line over whitespace and then iterate through the resulting list:
for line in fh:
for word in line.split():
#do stuff
Anyway since you wanted a short way to do it here's a neat one liner:
To make a list of each word in the file:
[word for line in open("romeo.txt") for word in line.split()]
To make a dict (list is better since your keys are integer indices anyway):
{c: i for c, i in enumerate([word for line in open("romeo.txt") for word in line.split()])}

Reading in a file of one-word lines in python

Just curious if there's a cleaner way to do this. I have a list of words in a file, one word per line.
I want to read them in and pass each word to a function.
I've currently got this:
f = open(fileName,"r");
lines = f.readlines();
count = 0
for i in lines:
count += 1
print("--{}--".format(i.rstrip()))
if count > 100:
return
I there a way to read them in faster without using rstrip on each line?
with open(fileName) as f:
lines = (line for _, line in zip(range(100), f.readlines()))
for line in lines:
print('--{}--'.format(line.rstrip()))
This is how I would do it. Note the context manager (the with/as statement), and the generator comprehension giving us only the first 100 lines.
Similar to Patrick's answer:
with open(filename, "r") as f:
for i, line in enumerate(f):
if i >= 100:
break
print("--{}--".format(line[:-1]))
If you don't an .strip() and know the length line terminator, you can use [:-1].

Print some columns in multiline string

Is there any way to print some columns in a string which is in several lines. For instance, let's suppose we have the following string:
EXAMPLE1
- -- ---
EXAMPLE2
And I was only print the columns which has '-' in columns. So the the output for this case should be:
EAMLE1
------
EAMLE2
I was thinking of splitting the string and iterate throug every column by using zip and print just those columns which have '-' But don't really know how to use it properly.
Any idea would be welcomed
thanks in advance
Once we split the string into lines, we can use zip(*lines) to transpose the list, getting the columns, search those for -, and then transpose again to get the new lines. Then we can use str.join to assemble the result.
s = '''\
EXAMPLE1
- -- ---
EXAMPLE2'''
columns = (tup for tup in zip(*s.split('\n')) if any('-' in x for x in tup))
lines = (''.join(line) for line in zip(*columns))
print('\n'.join(lines))
Output:
EAMLE1
------
EAMLE2

Resources