i want my program to print the five first characters when he recognize a string, made of the addition of two columns (from a dataframe made with pandas), in some lines of a .txt, but as it is said in the title, it gives me this error when i run the code. Here is the code (the important lines are in the end of the code, i just put everything if you want to see the whole code).
import pandas as pd
import re
import numpy as np
link = "excelfilett.txt"
file = open(link, "r")
frames_load = []
is_count_frames_load = False
for line in file:
if "[Interface1]" in line:
is_count_frames_load = True
if is_count_frames_load== True:
frames_load.append(line)
if "[EthernetComNeed]" in line:
break
number_of_rows_load = len(frames_load) -1
header_load = re.split(r'\t', frames_load[0])
number_of_columns_load = len(header_load)
frame_array_load = np.full((number_of_rows_load, number_of_columns_load), 0)
df_frame_array_load = pd.DataFrame(frame_array_load)
df_frame_array_load.columns= header_load
for row in range(number_of_rows_load):
frame_row_load = re.split(r'\t', frames_load[row])
for position in range(len(frame_row_load))
df_frame_array_load["[Name]"] = df_frame_array_load["[End1]"] + " " + df_frame_array_load["[End2]"]
link = "excelfilett.txt"
file = open(link, "r")
frames_path = []
is_count_frames_path = False
for line in file:
if "[Routing Paths]" in line:
is_count_frames_path = True
if is_count_frames_path== True:
for row in df_frame_array_load["[Name]"].rows:
if row in line:
print(line[0:4])
if "[EthernetComConfig]" in line:
break
It gives me the AttributeError on "for row in df_frame_array_load["[Name]"].rows:" and it shoudln't be a version error, what is the problem then? I don't understand.
for row in df_frame_array_load["[Name]"].rows:
because pandas Series object does not have a "rows" attribute, as you for perform a perform a loop operation in a Series you are iterating over it.
should be changed to just:
for row in df_frame_array_load["[Name]"]:
...
Related
This code read CSV file line by line and counts the number on each Unicode but I can't understand two parts of code like below.I've already googled but I could't find the answer. Could you give me advice ?
1) Why should I use numpy here instead of []?
emoji_time = np.zeros(200)
2) What does -1 mean ?
emoji_time[len(emoji_list)-1] = 1 ```
This is the code result:
0x100039, 47,
0x10002D, 121,
0x100029, 30,
0x100078, 6,
unicode_count.py
import codecs
import re
import numpy as np
file0 = "./message.tsv"
f0 = codecs.open(file0, "r", "utf-8")
list0 = f0.readlines()
f0.close()
print(len(list0))
len_list = len(list0)
emoji_list = []
emoji_time = np.zeros(200)
for i in range(len_list):
a = "0x1000[0-9A-F][0-9A-F]"
if "0x1000" in list0[i]: # 0x and 0x1000: same nuumber
b = re.findall(a, list0[i])
# print(b)
for j in range(len(b)):
if b[j] not in emoji_list:
emoji_list.append(b[j])
emoji_time[len(emoji_list)-1] = 1
else:
c = emoji_list.index(b[j])
emoji_time[c] += 1
print(len(emoji_list))
1) If you use a list instead of a numpy array the result should not change in this case. You can try it for yourself running the same code but replacing emoji_time = np.zeros(200) with emoji_time = [0]*200.
2) emoji_time[len(emoji_list)-1] = 1. What this line is doing is the follow: If an emoji appears for the first time, 1 is add to emoji_time, which is the list that contains the amount of times one emoji occurred. len(emoji_list)-1 is used to set the position in emoji_time, and it is based on the length of emoji_list (the minus 1 is only needed because the list indexing in python starts from 0).
I wrote the following code to create dataframes from files saved in sharefile. It works perfectly for excel files, but fails for csv files with the error EmptyDataError: No columns to parse from file.
tblname = 'test'
fPth = r'Z:\Favorites\test10 (Group D - Custom EM&V)\8 PII\16 - Project Selection Plan\QC\Data\test.csv'
sht = 'Gross_Data'
shtStart = 0
fType = 'csv'
fitem = sfsession.get_io_version(fPth)
if fitem is None:
print(f'Could not create sharefile item for {fPth}')
else:
try:
if fType == 'csv':
df = pd.read_csv(fitem.io_data, header = shtStart)
elif fType == 'excel':
df = pd.read_excel(fitem.io_data, sheet_name = sht, header = shtStart)
else:
pass
print(f'Data import COMPLETE for {fPth}: {str(datetime.now())}')
except:
print(f'Data import FAILED for {fPth}')
logging.critical(f'Data import FAILED for {fPth}')
If I replace fitem.io_data with fPth in df = pd.read_csv, the code works, but I can't use that as a permanent solution. Any suggestions?
Also sfsession is a sharefile session and get_io_version(fPth) gets the token and downloads all the file properties include its data.
Thanks.
An adaptation of this solution worked for me:
StringIO and pandas read_csv
I added fitem.io_data.seek(0) before the df = ... line
Closing the question.
I want my program to read lines in a txt and to recognize a string using two columns, i tried with for row in column1 and column2 but it isn't working and i don't really know why, here is the code.(Here i want to print the 5 first letters when it recognize the string, but later i will put those five letters in a list).
import pandas as pd
import re
import numpy as np
link = "excelfilett.txt"
file = open(link, "r")
frames_load = []
is_count_frames_load = False
for line in file:
if "[Interface1]" in line:
is_count_frames_load = True
if is_count_frames_load== True:
frames_load.append(line)
if "[EthernetComNeed]" in line:
break
number_of_rows_load = len(frames_load) -1
header_load = re.split(r'\t', frames_load[0])
number_of_columns_load = len(header_load)
frame_array_load = np.full((number_of_rows_load, number_of_columns_load), 0)
df_frame_array_load = pd.DataFrame(frame_array_load)
df_frame_array_load.columns= header_load
for row in range(number_of_rows_load):
frame_row_load = re.split(r'\t', frames_load[row])
for position in range(len(frame_row_load)):
df_frame_array_load.iloc[row, position] = frame_row_load[position]
print(df_frame_array_load)
df_frame_array_load["[Name]"] = df_frame_array_load["[End1]"] + '\t' + df_frame_array_load["[End2]"]
df_frame_array_load["[Name2]"] = df_frame_array_load["[End2]"] + '\t' + df_frame_array_load["[End1]"]
print(df_frame_array_load["[Name]"])
print(df_frame_array_load["[Name2]"])
link = "excelfilett.txt"
file = open(link, "r")
frames_path = []
is_count_frames_path = False
for line in file:
if "[Routing Paths]" in line:
is_count_frames_path = True
if is_count_frames_path== True:
for row in df_frame_array_load["[Name]"] and df_frame_array_load["[Name2]"]:
if row in line:
print(row)
print(line[0:4])
if "[EthernetComNeed]" in line:
break
if "[EthernetComConfig]" in line:
break
What I want as output is to print the 5 first letters in the lines of the txt. I'm using when it recognize a string, for example, when "S1\tS2" is in the line of the txt, it will print me the 5 first letters, so "FL_1", the two columns contains string as "S1\tS2" and the inverse (like "S2\tS1"), it is the point of the line where I have an issue, it gives me
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
in the line "for row in column1 and column2:".
I think it isn't possible to read rows on two columns at one time, i just make my program read the two columns separately, here is what i changed:
link = "excelfilett.txt"
file = open(link, "r")
frames_path = []
is_count_frames_path = False
for line in file:
if "[Routing Paths]" in line:
is_count_frames_path = True
if is_count_frames_path== True:
for row in df_frame_array_load["[Name]"]:
if row in line:
print(row)
print(line[0:4])
if "[EthernetComNeed]" in line:
break
for row in df_frame_array_load["[Name2]"]:
if row in line:
print(row)
print(line[0:4])
if "[EthernetComConfig]" in line:
break
I'm trying to make a program that will pick up randomly a name from a file. The user would be asked if he wants to pick up another one again (by pressing 1).
The names can't be picked up twice.
Once picked up, the names would be stocked in a list, written into a file.
When all the names are picked up, the program would be able to restart from the beginning.
I checked other similar problems, but I still don't get it...
from random import *
#import a list of name from a txt file
def getL1():
l1 = open("Employees.txt", "r")
list1 = []
x = 0
for line in l1:
list1.append(line)
x = x+1
list1 = [el.replace('\n', '') for el in list1]
#print("list" 1 :",list)
return list1
#import an empty list (that will be filled by tested employees) during
#execution of the program
def getL2():
l2 = open("tested.txt", "r")
list2 = []
for line in l2:
list2.append(line)
list2 = [el.replace('\n', '') for el in list2]
#print("list 2 :",list2)
l2.close()
return list2
def listCompare():
employees = getL1()#acquire first list from employee file
tested = getL2()#acquire second list from tested file
notTested = []# declare list to hole the results of the compare
for val in employees:
if val not in tested: #find employee values not present in tested
#print(val)
notTested.append(val)#append value to the notTested list
return notTested
def listCount():
x=0
employees = getL1()
tested = getL2()
for val in employees:
if val not in tested:
x = x+1
return x
#add the names of tested employees the the second second file
def addTested(x):
appendFile = open("tested.txt", "a")
appenFile.write(x)
appendFile.write('\n')
appendFile.close()
def main():
entry = 1
while entry == 1:
pickFrom = listCompare()
if listCount() > 0:
y = randint (0, (listCount ()-1))
print ('\n' + "Random Employee to be tested is: ", pickFrom(y), "\n")
addTested(pickFrom[y])
try:
entry = int(input("Would you like to test another employee? Enter 1:"))
except:
print("The entry must be a number")
entry = 0
else:
print("\n/\ new cycle has begun")
wipeFile = open("tested.txt", "w")
print ("goodbye")
main()
The last error that I have is :
Traceback (most recent call last):
File "prog.py", line 78, in <module>
main()
File "prog.py", line 65, in main
print ('\n' + "Random Employee to be tested is: ", pickFrom(y), "\n")
TypeError: 'list' object is not callable
As per the code print pickFrom is a list and when you are referencing it in the print it needs to be called using [ ]. Change it to pickFrom[y]
I am importing an excel file with whitespaces at the end of most cell content which need removing. The following script works with sample data:
import pandas as pd
def strip(text):
try:
return text.strip()
except AttributeError:
return text
def num_strip(text):
try:
return text.split(" ",1)[0]
except AttributeError:
return text
def parse_excel_sheet(input_file, sheet):
df = pd.read_excel(
input_file,
sheetname= sheet,
parse_cols = 'A,B,C',
names=['ID', 'name_ITA', 'name_ENG'],
converters = {
'ID' : num_strip,
'name1' : strip,
'name2' : strip,
}
)
return df
file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/excel_test.xlsx'
df = parse_excel_sheet(file,'1')
print(df)
however when trying the script on a larger file, parsing the first column 'ID' does not remove whitespaces.
file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/DRS_IL_startingpoint.xlsx'
df = parse_excel_sheet(file,'test')
print(df)
I just run your code and found that whitespaces were correctly removed from column 'ID' in larger file:
for i, el in enumerate(df['ID'].values):
# print(i)
if " " in el:
print(el)
returns no element from 'ID' column: there's no whitespace in these 28 elements.
How did you checked that this was not the case?