how to remove the quotations from the string in python? - python-3.x

the csv file returns the column value as dictionary format.but i cant get the value from dictionary by using dic.get("name") .it shows an error like ['str' object has no attribute 'get'].the actual problem is csv return the dict with quates so the python consider this as string.how to remove the quates and how can i fix it. please help!!
with open('file.csv') as file:
reader=csv.reader(file)
count=0
for idx,row in enumerate(reader):
dic=row[5]
if(idx==0):
continue
else:
print(dic.get("name"))
filename file_size file_attributes region_count region_id region_shape_attributes region_attributes
adutta_swan.jpg -1 {"caption":"Swan in lake Geneve","public_domain":"no","image_url":"http://www.robots.ox.ac.uk/~vgg/software/via/images/swan.jpg"} 1 0 {"name":"rect","x":82,"y":105,"width":356,"height":207} {"name":"not_defined","type":"unknown","image_quality":{"good":true,"frontal":true,"good_illumination":true}}
wikimedia_death_of_socrates.jpg -1 {"caption":"The Death of Socrates by David","public_domain":"yes","image_url":"https://en.wikipedia.org/wiki/The_Death_of_Socrates#/media/File:David_-_The_Death_of_Socrates.jpg"} 3 0 {"name":"rect","x":174,"y":139,"width":108,"height":227} {"name":"Plato","type":"human","image_quality":{"good_illumination":true}}
wikimedia_death_of_socrates.jpg -1 {"caption":"The Death of Socrates by David","public_domain":"yes","image_url":"https://en.wikipedia.org/wiki/The_Death_of_Socrates#/media/File:David_-_The_Death_of_Socrates.jpg"} 3 1 {"name":"rect","x":347,"y":114,"width":91,"height":209} {"name":"Socrates","type":"human","image_quality":{"frontal":true,"good_illumination":true}}
wikimedia_death_of_socrates.jpg -1 {"caption":"The Death of Socrates by David","public_domain":"yes","image_url":"https://en.wikipedia.org/wiki/The_Death_of_Socrates#/media/File:David_-_The_Death_of_Socrates.jpg"} 3 2 {"name":"ellipse","cx":316,"cy":180,"rx":17,"ry":12} {"name":"Hemlock","type":"cup"}

Use DictReader which reads the csv as a dictionary!
import csv
import json
with open('graph.csv') as file:
#Read csv as dictreader
reader=csv.DictReader(file)
count=0
#Iterate through rows
for idx,row in enumerate(reader):
#Load the string as a dictionary
region_shape_attributes = json.loads(row['region_shape_attributes'])
print(region_shape_attributes['name'])

`import csv
import ast
with open('file.csv') as file:
#Read csv as dictreader
reader=csv.DictReader(file)
count=0
#Iterate through rows
for idx,row in enumerate(reader):
#print(row)
row_5=row['region_shape_attributes']
y=ast.literal_eval(row_5)
print(y.get("name"))`
this code also work to me

Related

How to convert the 50000 txt file into csv

I have many text files. I tried to convert the txt files into a single CSV file, but it is taking a huge time. I put the code on run mode at night and I slept, it processed only 4500 files, but still morning it is running.
There is any way to fast way to convert the text files into csv?
Here is my code:
import pandas as pd
import os
import glob
from tqdm import tqdm
# create empty dataframe
csvout = pd.DataFrame(columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"])
# get list of files
file_list = glob.glob(os.path.join(os.getcwd(), "train/", "*.txt"))
for filename in tqdm(file_list):
# next file/record
mydict = {}
with open(filename) as datafile:
# read each line and split on " " space
for line in tqdm(datafile):
# Note: partition result in 3 string parts, "key", " ", "value"
# array slice third parameter [::2] means steps=+2
# so only take 1st and 3rd item
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
# put dictionary in dataframe
csvout = csvout.append(mydict, ignore_index=True)
# write to csv
csvout.to_csv("train.csv", sep=";", index=False)
Here is my example text file.
ID 0xb379
Delivery_person_ID BANGRES18DEL02
Delivery_person_Age 34.000000
Delivery_person_Ratings 4.500000
Restaurant_latitude 12.913041
Restaurant_longitude 77.683237
Delivery_location_latitude 13.043041
Delivery_location_longitude 77.813237
Order_Date 25-03-2022
Time_Orderd 19:45
Time_Order_picked 19:50
Weather conditions Stormy
Road_traffic_density Jam
Vehicle_condition 2
Type_of_order Snack
Type_of_vehicle scooter
multiple_deliveries 1.000000
Festival No
City Metropolitian
Time_taken (min) 33.000000
CSV is a very simple data format for which you don't need any sophisticated tools to handle. Just text and separators.
In your hopefully simple case there is no need to use pandas and dictionaries.
Except your datafiles are corrupt missing some columns or having some additional columns to skip. But even in this case you can handle such issues better within your own code so you have more control over it and are able to get results within seconds.
Assuming your datafiles are not corrupt having all columns in the right order with no missing columns or having additional ones (so you can rely on their proper formatting), just try this code:
from time import perf_counter as T
sT = T()
filesProcessed = 0
columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"]
import glob, os
file_list = glob.glob(os.path.join(os.getcwd(), "train/", "*.txt"))
csv_lines = []
csv_line_counter = 0
for filename in file_list:
filesProcessed += 1
with open(filename) as datafile:
csv_line = ""
for line in datafile.read().splitlines():
# print(line)
var = line.partition(" ")[-1]
csv_line += var.strip() + ';'
csv_lines.append(str(csv_line_counter)+';'+csv_line[:-1])
csv_line_counter += 1
with open("train.csv", "w") as csvfile:
csvfile.write(';'+';'.join(columns)+'\n')
csvfile.write('\n'.join(csv_lines))
eT = T()
print(f'> {filesProcessed=}, {(eT-sT)=:8.6f}')
I guess you will get the result in a speed beyond your expectations (in seconds, not minutes or hours)
On my computer, estimating from processing time of 100 files the time required for 50.000 files will be about 3 seconds.
I could not replicate. I took the example data file and created 5000 copies of it. Then I ran your code using tqdm and without. The below shows without:
import time
import csv
import os
import glob
import pandas as pd
from tqdm import tqdm
csvout = pd.DataFrame(columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"])
file_list = glob.glob(os.path.join(os.getcwd(), "sample_files/", "*.txt"))
t1 = time.time()
for filename in file_list:
# next file/record
mydict = {}
with open(filename) as datafile:
# read each line and split on " " space
for line in datafile:
# Note: partition result in 3 string parts, "key", " ", "value"
# array slice third parameter [::2] means steps=+2
# so only take 1st and 3rd item
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
# put dictionary in dataframe
csvout = csvout.append(mydict, ignore_index=True)
# write to csv
csvout.to_csv("train.csv", sep=";", index=False)
t2 = time.time()
print(t2-t1)
The times I got where:
tqdm 33 seconds
no tqdm 34 seconds
Then I ran using the csv module:
t1 = time.time()
with open('output.csv', 'a', newline='') as csv_file:
columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"]
mydict = {}
d_Writer = csv.DictWriter(csv_file, fieldnames=columns, delimiter=',')
d_Writer.writeheader()
for filename in file_list:
with open(filename) as datafile:
for line in datafile:
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
d_Writer.writerow(mydict)
t2 = time.time()
print(t2-t1)
The time for this was:
csv 0.32231569290161133 seconds.
Try it like this.
import glob
with open('my_file.csv', 'a') as csv_file:
for path in glob.glob('./*.txt'):
with open(path) as txt_file:
txt = txt_file.read() + '\n'
csv_file.write(txt)

Using pd.read_table() multiple times on same open file

I have a data structure of the following form:
**********DATA:0************
name_A name_B
0.16561919 0.03640960
0.39564838 0.66708115
0.60828075 0.95785214
0.68716186 0.92803331
0.80615505 0.96219926
**********data:0************
**********DATA:1************
name_A name_B
0.32474381 0.82506909
0.30934914 0.60406956
0.99519513 0.23425607
0.72210821 0.61141751
0.47362605 0.09892009
**********data:1************
**********DATA:2************
name_A name_B
0.46561919 0.13640960
0.29564838 0.66708115
0.40828075 0.35785214
0.08716186 0.52803331
0.70615505 0.96219926
**********data:2************
I would like to read each block to a seperate pandas dataframe with appropriate header titles. When I use the simple function below, only a single data block is stored in the output list. However, when I comment out the data.append(pd.read_table(file, nrows=5)) line, the function prints all individual headers. The pandas read_table call seems to break out of the loop.
import pandas as pd
def read_data(filename):
data = []
with open(filename) as file:
for line in file:
if "**********DATA:" in line:
print(line)
data.append(pd.read_table(file, nrows=5))
return data
read_data("data_file.txt")
How should I change the function to read all blocks?
I suggest a slightly different approach, in which you avoid using read_table and put dataframes in a dict instead of a list, like this:
import pandas as pd
def read_data(filename):
data = {}
i = 0
with open(filename) as file:
for line in file:
if "**********DATA:" in line:
data[i] = []
continue
if "**********data:" in line:
i += 1
data[i] = []
continue
else:
data[i].append(line.strip("\n").split(" "))
return {
f"data_{k}": pd.DataFrame(data=v[1:], columns=v[0])
for k, v in data.items()
if v
}
And so, with the text file you gave as input:
dfs = read_data("data_file.txt")
print(dfs["data_0"])
# Output
name_A name_B
0 0.16561919 0.03640960
1 0.39564838 0.66708115
2 0.60828075 0.95785214
3 0.68716186 0.92803331
4 0.80615505 0.96219926
print(dfs["data_1"])
# Output
name_A name_B
0 0.32474381 0.82506909
1 0.30934914 0.60406956
2 0.99519513 0.23425607
3 0.72210821 0.61141751
4 0.47362605 0.09892009
print(dfs["data_2"])
# Output
name_A name_B
0 0.46561919 0.13640960
1 0.29564838 0.66708115
2 0.40828075 0.35785214
3 0.08716186 0.52803331
4 0.70615505 0.96219926

How to read with Pandas txt file with column names in each row

I'm begginer with Python, and I need to read a txt file where the column name is on each row, the columns are dissordered and not all columns are informed. Are there any way to read this kind of file with Pandas?
This is a example (3 rows):
pepe01#mail.com:{ssha}fiy9XI6d:created="1575487257" fwd="" spf_block="" quota="1024mb" full_name="Full Name" mailaccess="envia" mailstatus="cancelled"
pepe02#mail.com:{ssha}Q0H90Rf9:created="1305323967" mailaccess="1" mailstatus="active" admin_access="" quota="" expire="0" full_name="Full Name" pais="CO"
pepe03#mail.com:{ssha}sCPC3HOE:created="1550680636" fwd="" pass_question="" pass_answer="" disabled="Y" mailstatus="cancelled" full_name="Name"
You can use re module to parse the file.
For example:
import re
import pandas as pd
all_data = []
with open('<YOUR FILE>', 'r') as f_in:
for line in f_in:
m = re.search(r'^(.*?):(.*?):', line)
if not m:
continue
data = dict(re.findall(r'([^\s]+)="([^"]+)"', line.split(':', maxsplit=2)[-1]))
data['mail'] = m.group(1)
data['password'] = m.group(2)
all_data.append(data)
df = pd.DataFrame(all_data).fillna('')
print(df)
Prints the dataframe:
created quota full_name mailaccess mailstatus mail password expire pais disabled
0 1575487257 1024mb Full Name envia cancelled pepe01#mail.com {ssha}fiy9XI6d
1 1305323967 Full Name 1 active pepe02#mail.com {ssha}Q0H90Rf9 0 CO
2 1550680636 Name cancelled pepe03#mail.com {ssha}sCPC3HOE Y

how to read data from csv to feed them to cnn?

I have the following csv structure:
image_id, class_name, color, browneyes, feature2, feature3, feature4
for example:
429759,dog,black,1,0,0,husky
352456,cat,white,0,0,0,any
how can i read the csv file so for each row it reads the image file and feeds it to the model? (the image_id is the image filename)
you haven't mentioned what language you code in your question, but since you've mentioned Python in question's tags I show you an example in python
in python there is a nice python built-in module named csv that you can use it for working with CSV files
see the code below
CSV sample
name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March
Code
import csv
with open('employee_birthday.txt') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
line_count += 1
print(f'Processed {line_count} lines.')
reference : https://realpython.com/python-csv/
official docs : https://docs.python.org/3/library/csv.html

There is a problem in conversion of text file content into csv format using python

I tried to convert text file content into a .csv format by reading each and every line using python csv module and converting that to a list. But i couldn't get the expected output and it stores the first line in a row but second line will be stored in 3rd row and 5th so on. Since I am new to python i don't know how to skip the line and store it in the right order.
def FileConversion():
try:
with open('TextToCSV.txt', 'r') as textFile:
LineStripped = (eachLine.strip() for eachLine in textFile)
lines = (eachLine.split(" ") for eachLine in LineStripped if eachLine)
with open('finalReport.csv', 'w') as CSVFile:
writer = csv.writer(CSVFile)
writer.writerow(('firstName', 'secondName', 'designation', "age"))
writer.writerows(lines)
Why don't you try doing something more simple:
import pandas as pd
aux = pd.read_csv("TextToCSV.txt", sep=" ")
aux.columns=['firstName', 'secondName', 'designation', "age"]
aux.to_csv("result.csv")

Resources