I have a csv file with data like
Name Cost SKU QTY
Julia 1 13 10
John 5 23 1
Julia 3 40 5
I would like to return a dictionary as:
{'Julia':'10', 'John':'1', 'Julia':'5'....}
My code is returning no duplicates as of now.
Run this:
dict(zip(df.Name,df.QTY)
check this answer on stackoverflow (Creating a dictionary from a csv file?) first you have to read as dictonary and change values according to that, remember that dictionary contains only distinct keys if key already exists it will be replaced by new values on the fly.
Related
I have 2 csv file, one is dictionary.csv which contains a list of words, and another is story.csv. In the story.csv there are many columns, and in one of the columns contains a lots of words called news_story. I wanted to check if the list of words from dictionary.csv exists in the news_story column. Afterwards i wanted to print all of the rows in which the news_story column contained words from the lists of words from dictionary.csv in a new csv file called New.csv
These are the codes i have tried so far
import csv
import pandas as pd
news=pd.read_csv("story.csv")
dictionary=pd.read_csv("dictionary.csv")
pattern = '|'.join(dictionary)
exist=news['news_story'].str.contains(pattern)
for CHECK in exist:
if not CHECK:
news['NEWcolumn']='NO'
else:
news['NEWcolumn']='YES'
news.to_csv('New.csv')
I kept on getting a nos eventhough there should be some trues
story.csv
news_url news_title news_date news_story
goog.com functional 2019 This story is about a functional requirement
live.com pbandJ 2001 I made a sandwich today
key.com uAndI 1992 A code name of a spy
dictionary.csv
red
tie
lace
books
functional
New.csv
news_url news_title news_date news_story
goog.com functional 2019 This story is about a functional requirement
First convert column to Series with header=None for avoid remove first value with squeeze=True in read_csv:
dictionary=pd.read_csv("dictionary.csv", header=None, squeeze=True)
print (dictionary)
0 red
1 tie
2 lace
3 books
4 functional
Name: 0, dtype: object
pattern = '|'.join(dictionary)
#for avoid match substrings use words boundaries
#pattern = '|'.join(r"\b{}\b".format(x) for x in dictionary)
Last filter by boolean indexing:
exist = news['news_story'].str.contains(pattern)
news[exist].to_csv('New.csv')
Detail:
print (news[exist])
news_url news_title news_date \
0 goog.com functional 2019
news_story
0 This story is about a functional requirement
I'm trying to scrape some data from a website. It looks like this:
Person 1
Data 1
Data 2
Data 3
Data 5
Person 2
Data 2
Data 3
Data 7
I would like the output in csv to be like:
Person 1 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7
data data data data
Person 2 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7
data data data
However, I don't know how to force data output if data is missing. I know there might be some try, except (maybe).
I'm using python 3.6.7 and selenium. BTW in the above example 'data' is any value found for the Data1-Data7 entries.
I hope it is clear.
You need to simply check if the data is empty and if it is, replace the empty text with whatever
for row in csvreader:
try:
#your code for cells which aren't empty
except:
#your code for empty cells
You can also try something like this
This will fill all empty cells to eqaul 0
result.fillna(0, inplace=True)
EDIT--------
Since there is no code posted with your answer here is a basic example of how you could replace the empty cells
for cl in list_of_columns:
df[cl] = df[cl].replace(r'\s+', np.nan, regex=True)
df[cl] = df[cl].fillna(0)
Thanks to all that tried to help me. I found the solution:
Data=None
try:
Cell=elm.find_element_by_xpath(".//div[#class='whatever_the_class_name'][contains(.,'whatever_data')].text
except NoSuchElementException:
Data=' '
line.append(Data)
The penultimate line of code is the answer of my question. Well, the simpest things are (sometimes) the hardest ;)
I have a question whose variations have already been asked, but I'm not able to find an answer among all previous posts to my particular question. So I hope someone can help me ...
I have a csv file as such (in this example, there are a total of 18 rows, and 4 columns, with the first 2 rows containing headers).
"Employees.csv" 17
ID Name Value Location
25-2002 James 1.2919 Finance
25-2017 Matthew 2.359 Legal
30-3444 Rob 3.1937 Operations
55-8988 Fred 3.1815 Research
26-1000 Lisa 4.3332 Research
56-0909 John 3.3533 Legal
45-8122 Anna 3.8887 Finance
10-1000 Rachel 4.1448 Maintenance
30-9000 Frank 3.7821 Maintenance
25-3000 Angela 5.5854 Service
45-4321 Christopher 9.1598 Legal
44-9821 Maddie 8.5823 Service
20-4000 Ruth 7.47 Operations
50-3233 Vera 5.5092 Operations
65-2045 Sydney 3.4542 Executive
45-8720 Vladimir 0.2159 Finance
I'd like to round the values in the 3rd column to 2 decimals, i.e., round(value, 2). So basically, I want to open the file, read column #3 (minus first 2 rows), round each value, write them back, and save the file. After reading through other similar posts, I've now learned that it's best to always create a temp file to do this work instead of trying to change the same file at once. So I have the following code:
import csv, os
val = []
with open('path/Employees.csv', 'r') as rf, open('path/tmpf.csv, 'w') as tmpf:
reader = csv.reader(rf)
writer = csv.writer(tmpf)
for _ in range(2): #skip first 2 rows
next(reader)
for line in reader:
val.append(float(line[2])) # read 3 column into list 'val'
# [... this is where i got stuck!
# ... how do I round each value of val, and
# ... write everything back to the tmpf file ?]
os.remove('path/Employees.csv')
os.rename('path/tmpf', 'path/Employees.csv')
Thanks!
You could
rounded_val = [ round (v,2) for v in val ]
to generate the list of rounded values.
I have a .csv file with 2 columns :
Item Value
A 1.3
B 2.6
D 4.2
E 5.6
F 3.2
A 1.2
C 5.2
D 6.4
I want to compare the values in column Item and find the duplicates, after that I want to compare the corresponding values from column Value.
In the example A and D from Item are duplicated, but they have different values in Value. I would like to clear the duplicates and save the ones with the lowest value in Value.
That's what I've tried and it works, but it is SLOW and resource expensive. I am sure there is a better way, I could use pandas or any other library for that matter, so please give me a suggestion.
file="file.csv"
def items_array(file):
with open(file,"r") as file:
file_reader=csv.DictReader(file,delimiter=";")
for row in file_reader:
items.append(row["Item_title"])
items_set=set(items)
return(items_set)
def find_lowest_value(item,file):
items_and_values=[]
with open(file,"r") as file:
file_reader=csv.DictReader(file,delimiter=";")
for row in file_reader:
items_and_values.append([row["Item"],row["Value"]])
value_for_single_item=[]
for i in items_and_values:
if item == i[0]:
value_for_single_item.append(i[1])
value_for_single_item.sort()
return(value_for_single_item[0])
items=items_array(file)
for i in items:
lv=find_lowest_value(i,file)
print(i,lv)
Since the rows in the actual .csv file are about 25k with the method I am using it takes about 30 minutes. I am sure it could be done faster and smarter :)
This is the expected result :
Item Value
B 2.6
D 4.2
E 5.6
F 3.2
A 1.2
C 5.2
If you import your csv in a Dataframe using Pandas, you don't have to read the file 25k, just once. And it'll be a lot quicker.
df=pd.read_csv(file,sep=";")
a=df.groupby("Item")["Value"].min()
Pretty much does the trick. Two lines of code and it took 2 seconds to do it. Pandas has to be some kind of magic.
I parse a CSV file into a Dataframe. 10,000 records go in, no problems.
Two columns one 'ID', one 'Reviews'.
I try to convert the DF into a dictionary where keys = 'ID', and values = 'Reviews'.
For some reason the new dictionary only contains 680 records.
#read csv data file
data = pd.read_csv("Movie_reviews.csv",
delimiter='\t',
header=None,names=['ID','Reviews'])
reviews = data.set_index('ID').to_dict().get('Reviews')
len(reviews)
output is 680
If I don't append '.get('Reviews')' everything is one big record.
the Dataframe 'data' looks like this
ID Reviews
1 076780192X it always amazes me how people can rate the DV...
2 0767821599 This movie is okay, but, its not worth what th...
3 0782008380 If you love the Highlander 1 movie and the ser...
4 0767726227 This is a great classic collection, if you lik...
5 0780621832 This is the second of John Ford and John Wayne...
6 0310263662 I am an evangelical Christian who believes in ...
7 0767809270 Federal law, in one of its numerous unfunded m...
In case it helps anyone else.
The id's for the movie reviews were not all unique. The .nunique() function revealed that as suggested by #YOLO.
Assigning only the values (Reviews) to the dictionary automatically added unique keys as suggested by #JackHoman resolving my issue.
I think you can do:
Method 1:
reviews = data.set_index('ID')['Reviews'].to_dict()
Method 2: Here we convert reviews to a list for each ID so that we don't lose any information.
reviews = data.groupby('ID')['Reviews'].apply(list).to_dict()