In python, how to record only the matched part of the string using regex for data that we read from a file? - python-3.5

How to print the only part of the string that matched the regex from a line that is read from a text file.
I have the following code.
filename = "C:/Users/Desktop/netlist"
pattern = re.compile('^[A-Z]{1,2}\d{1,3} ')
with open(filename, "rt") as myfile:
for line in myfile:
if pattern.search(line) != None:
print(line, end='')
but this gives me entire lines containing the match.
Another thing i tried was.
k = []
filename = "C:/Users/Desktop/netlist"
pattern = re.compile('^[A-Z]{1,2}\d{1,3}')
with open(filename, "rt") as myfile:
for line in myfile:
if pattern.match(line) != None:
k.append(pattern.search(line))
But this comes out as re.Match objects rather than a list
[<re.Match object; span=(0, 3), match='NM4'>,
<re.Match object; span=(0, 3), match='NM3'>,
<re.Match object; span=(0, 2), match='M9'>,
<re.Match object; span=(0, 2), match='M7'>,
<re.Match object; span=(0, 2), match='M5'>,
<re.Match object; span=(0, 2), match='M2'>]
My input looks like :
NM4 (net19 net19 0 0) nmos1 w=(5.65u) l=410n as=3.39p ad=3.39p ps=12.5u \
NM3 (net28 net19 0 0) nmos1 w=(5.65u) l=410n as=3.39p ad=3.39p ps=12.5u \
M9 (vout\+ net19 0 0) nmos1 w=(12.71u) l=310n as=7.626p ad=7.626p \
M7 (vout\- net19 0 0) nmos1 w=(12.71u) l=310n as=7.626p ad=7.626p \
M5 (net7 net19 0 0) nmos1 w=(2u) l=180n as=1.2p ad=1.2p ps=5.2u pd=5.2u \
M2 (net8 Vin\- net7 0) nmos1 w=(28.25u) l=410n as=16.95p ad=16.95p \
I am expecting my answer to look like :
[NM4 NM3 M9 M7 M5 M2]

Use regex grouping '([A-Z]{1,2}\d{1,3})'
Ex:
k = []
filename = "C:/Users/Desktop/netlist"
pattern = re.compile('([A-Z]{1,2}\d{1,3})')
with open(filename, "rt") as myfile:
for line in myfile:
m = pattern.match(line)
if m: #Check if data.
k.append(m.group(1)) #Fetch result.
print(k) #['NM4', 'NM3', 'M9', 'M7', 'M5', 'M2']

Related

Alien Dictionary Python

Alien Dictionary
Link to the online judge -> LINK
Given a sorted dictionary of an alien language having N words and k starting alphabets of standard dictionary. Find the order of characters in the alien language.
Note: Many orders may be possible for a particular test case, thus you may return any valid order and output will be 1 if the order of string returned by the function is correct else 0 denoting incorrect string returned.
Example 1:
Input:
N = 5, K = 4
dict = {"baa","abcd","abca","cab","cad"}
Output:
1
Explanation:
Here order of characters is
'b', 'd', 'a', 'c' Note that words are sorted
and in the given language "baa" comes before
"abcd", therefore 'b' is before 'a' in output.
Similarly we can find other orders.
My working code:
from collections import defaultdict
class Solution:
def __init__(self):
self.vertList = defaultdict(list)
def addEdge(self,u,v):
self.vertList[u].append(v)
def topologicalSortDFS(self,givenV,visited,stack):
visited.add(givenV)
for nbr in self.vertList[givenV]:
if nbr not in visited:
self.topologicalSortDFS(nbr,visited,stack)
stack.append(givenV)
def findOrder(self,dict, N, K):
list1 = dict
for i in range(len(list1)-1):
word1 = list1[i]
word2 = list1[i+1]
rangej = min(len(word1),len(word2))
for j in range(rangej):
if word1[j] != word2[j]:
u = word1[j]
v = word2[j]
self.addEdge(u,v)
break
stack = []
visited = set()
vlist = [v for v in self.vertList]
for v in vlist:
if v not in visited:
self.topologicalSortDFS(v,visited,stack)
result = " ".join(stack[::-1])
return result
#{
# Driver Code Starts
#Initial Template for Python 3
class sort_by_order:
def __init__(self,s):
self.priority = {}
for i in range(len(s)):
self.priority[s[i]] = i
def transform(self,word):
new_word = ''
for c in word:
new_word += chr( ord('a') + self.priority[c] )
return new_word
def sort_this_list(self,lst):
lst.sort(key = self.transform)
if __name__ == '__main__':
t=int(input())
for _ in range(t):
line=input().strip().split()
n=int(line[0])
k=int(line[1])
alien_dict = [x for x in input().strip().split()]
duplicate_dict = alien_dict.copy()
ob=Solution()
order = ob.findOrder(alien_dict,n,k)
x = sort_by_order(order)
x.sort_this_list(duplicate_dict)
if duplicate_dict == alien_dict:
print(1)
else:
print(0)
My problem:
The code runs fine for the test cases that are given in the example but fails for ["baa", "abcd", "abca", "cab", "cad"]
It throws the following error for this input:
Runtime Error:
Runtime ErrorTraceback (most recent call last):
File "/home/e2beefe97937f518a410813879a35789.py", line 73, in <module>
x.sort_this_list(duplicate_dict)
File "/home/e2beefe97937f518a410813879a35789.py", line 58, in sort_this_list
lst.sort(key = self.transform)
File "/home/e2beefe97937f518a410813879a35789.py", line 54, in transform
new_word += chr( ord('a') + self.priority[c] )
KeyError: 'f'
Running in some other IDE:
If I explicitly give this input using some other IDE then the output I'm getting is b d a c
Interesting problem. Your idea is correct, it is a partially ordered set you can build a directed acyclcic graph and find an ordered list of vertices using topological sort.
The reason for your program to fail is because not all the letters that possibly some letters will not be added to your vertList.
Spoiler: adding the following line somewhere in your code solves the issue
vlist = [chr(ord('a') + v) for v in range(K)]
A simple failing example
Consider the input
2 4
baa abd
This will determine the following vertList
{"b": ["a"]}
The only constraint is that b must come before a in this alphabet. Your code returns the alphabet b a, since the letter d is not present you the driver code will produce an error when trying to check your solution. In my opinion it should simply output 0 in this situation.

Numeric value as a string and convert to actual numeric

I found this thread how to make a variable change from the text "1m" into "1000000" in python
My string values are in a column within a pandas dataframe. The string/0bkects values are like 18M, 345K, 12.9K, 0, etc.
values = df5['Values']
multipliers = { 'k': 1e3,
'm': 1e6,
'b': 1e9,
}
pattern = r'([0-9.]+)([bkm])'
for number, suffix in re.findall(pattern, values):
number = float(number)
print(number * multipliers[suffix])
Running the code gives this error:
Traceback (most recent call last):
File "c:/Users/thebu/Documents/Python Projects/trading/screen.py", line 19, in <module>
for number, suffix in re.findall(pattern, values):
File "C:\Users\thebu\Anaconda3\envs\trading\lib\re.py", line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
Thanks
Here's another way using regex:
import re
def get_word(s):
# find word
r = re.findall(r'[a-z]', s)
# find numbers
w = re.findall(r'[0-9]', s)
if len(r) > 0 and len(w) > 0:
r = r[0]
v = multipliers.get(r, None)
if v:
w = int(''.join(w))
w *= v
return round(w)
df['col2'] = df['col'].apply(get_word)
print(df)
col col2
0 10k 10000
1 20m 20000000
Sample Data
df = pd.DataFrame({'col': ['10k', '20m']})

How to convert multiple line string to data frame

my sample string is like this below:
>>> x3 = '\n DST: 10.1.1.1\n DST2: 10.1.2.1\n DST3: 10.1.3.1\n \n \n DST: 11.1.1.1\n DST2: 11.1.2.1\n DST3: 11.1.3.1\n \n \n'
>>> print(x3)
DST: 10.1.1.1
DST2: 10.1.2.1
DST3: 10.1.3.1
DST: 11.1.1.1
DST2: 11.1.2.1
DST3: 11.1.3.1
i want to convert it as data frame with DST, DST2 and DST3 as columns
You could do:
# get key, value pairs from string
items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())
# build data
d = {}
for key, value in items:
d.setdefault(key, []).append(value)
# convert it to a DataFrame
result = pd.DataFrame(d)
print(result)
Output
DST DST2 DST3
0 10.1.1.1 10.1.2.1 10.1.3.1
1 11.1.1.1 11.1.2.1 11.1.3.1
The line:
items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())
is a generator expression, for the purposes of the question you could consider it equivalent (but not the same) to the following for loop:
result = []
for line in x3.splitlines():
if line.strip():
result.append(line.strip().split(': '))
In addition splitlines, strip, split are functions of string.
import pandas as pd
if __name__ == '__main__':
x3 = "\n DST: 10.1.1.1\n DST2: 10.1.2.1\n DST3: 10.1.3.1\n \n \n DST: 11.1.1.1\n DST2: 11.1.2.1\n DST3: 11.1.3.1\n \n \n"
#remove spaces
x3_no_space = x3.replace(" ", "")
#remove new lines and replace with &
x3_no_new_line = x3_no_space.replace("\n", "&")
#split from &
x3_split = x3_no_new_line.split("&")
#data array for store values
DST_data = []
#dictionary for make dataframe
DST_TABLE = dict()
#loop splitted data
for DST in x3_split:
#check if data is empty or not if not empty add data to DST_DATA array
if DST != '':
DST_data.append(DST)
#split data from :
DST_split = DST.split(":")
#get column names and store it into dictionary with null array
DST_TABLE[DST_split[0]] = []
#read dst array
for COL_DATA in DST_data:
#split from :
DATA = COL_DATA.split(":")
#loop the dictionary
for COLS in DST_TABLE:
#check if column name of dictionary equal to splitted data 0 index if equals append the data to column
if DATA[0] == COLS:
DST_TABLE[COLS].append(DATA[1])
# this is dictionary
print("Python dictionary")
print(DST_TABLE)
# convert dictionary to dataframe using pandas
dataframe = pd.DataFrame.from_dict(DST_TABLE)
print("DATA FRAME")
print(dataframe)

outputting a list of tuples to a new file in a certain format

let's say I have a list of tuples my_list=[(who,2), (what,5), (where, 1)]
I want to write in into a new file (new_file.txt) in this format:
who,2
what,5
where, 1
One above the other, with no brackets and only inner commas.
This does not work:
with open('new_file.txt', 'w') as fd:
a = '\n'.join(str(x) for x in results)
fd.write('\n'.join(a))
fd.close()
Will appreciate your help !
results = [('who', 2), ('what', 5), ('where', 1)]
def line(u):
return ','.join(map(str, u))
with open('new_file.txt', 'w') as fd:
fd.write('\n'.join([line(u) for u in results]))
One note: you don't have to close the file explicitly, because with closes it for you.
If the results list is very long, you may not want to construct the file content in one go, but write them line by line:
with open('new_file.txt', 'w') as fd:
for u in results:
fd.write(line(u) + '\n')
You have converted the tuple to a string hence this might not work. A quick solution could be something like this assuming this is input data you are expecting:
results = [("who",2), ("what",5), ("where", 1)]
with open('new_file.txt', 'w') as fd:
data_str = ""
for data in results:
data_str += str(data[0]) + ',' + str(data[1]) + '\n'
fd.write(data_str)

How do I append a text changing the number format?

I'm getting number from a HTML, some of them are %, 4 digits and 7 digits (37.89%, 3.464, 2,193.813). I would like to save just the numbers, not the percentages, without the thousand separators (".").
list_of_rows = []
for row in table.findAll('div', attrs={'class': 'quadrado'}):
list_of_cells = []
for cell in row.findAll('span', attrs={'class': 'circulo'}):
text = cell.text
# print(text)
for cell_index in row.findAll('span', attrs={'class': 'triangulo'}):
text_index = cell_index.text
list_of_cells_index = [text, text_index]
list_of_cells_index_clean = ','.join(list_of_cells_index) # remove brackets and ''
# print(list_of_cells_index_clean)
list_of_cells.append(list_of_cells_index_clean)
list_of_rows.append(list_of_cells)
outfile = open("./list.csv", "a")
writer = csv.writer(outfile, lineterminator = '\n')
writer.writerows(list_of_rows)
I would like to get:
37.89%, 3464, 2193,813.
How can I do it?
I don't know all your input parameters, but this works for the ones that you provided.
s = ('37.89%', '3.464', '2,193.813')
for item in s:
remove_comma = item.replace(',', '')
keep_percentage = re.findall(r'\d{1,4}\.\d{1,4}%', remove_comma)
if keep_percentage:
keep_percentage = ''.join(keep_percentage)
print (keep_percentage)
else:
if (len(remove_comma)) == 5:
print (remove_comma.replace('.', ''))
else:
print (remove_comma.replace('.', ','))
**OUTPUTS**
37.89%
3464
2193,813

Resources