How to assign number to each value in python - python-3.x

I am comparatively new to python and data science and I was working with a CSV file which looks something like:
value1, value2
value3
value4...
Thing is, I want to assign a unique number to each of these values in the csv file such that the unique number acts as the key and the item in the CSV acts as the value like in a dictionary.
I tried using pandas but if possible, I wanted to know how I can solve this without using any libraries.
The desired output should be something like this:
{
"value1": 1,
"value2": 2,
"value3": 3,
.
.
.
and so on..
}

Was just about to talk about pandas before I saw that you wanted to do it in vanilla Python. I'd do it with pandas personally, but here you go:
You can read in lines from a file, split them by delimiter (','), and then get your word tokens.
master_dict = {}
counter = 1
with open("your_csv.csv", "r") as f:
for line in f:
words = line.split(',') # you may or may not want to add a call to .strip() as well
for word in words:
master_dict[counter] = word
counter += 1

Related

I'm looking for a way to extract strings from a text file using specific criterias

I have a text file containing random strings. I want to use specific criterias to extract the strings that match these criterias.
Example text :
B311-SG-1700-ASJND83-ANSDN762
BAKSJD873-JAN-1293
Example criteria :
All the strings that contains characters seperated by hyphens this way : XXX-XX-XXXX
Output : 'B311-SG-1700'
I tried creating a function but I can't seem to know how to use criterias for string specifically and how to apply them.
Based on your comment here is a python script that might do what you want (I'm not that familiar with python).
import re
p = re.compile(r'\b(.{4}-.{2}-.{4})')
results = p.findall('B111-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293\nB211-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293 B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293')
print(results)
Output:
['B111-SG-1700', 'B211-SG-1700', 'B311-SG-1700']
You can read a file as a string like this
text_file = open("file.txt", "r")
data = text_file.read()
And use findall over that. Depending on the size of the file it might require a bit more work (e.g. reading line by line for example
You can use re module to extract the pattern from text:
import re
text = """\
B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293
BAKSJD873-JAN-1293 B312-SG-1700-ASJND83-ANSDN762"""
for m in re.findall(r"\b.{4}-.{2}-.{4}", text):
print(m)
Prints:
B311-SG-1700
B312-SG-1700

Dictionary to Tab Delimited Text File for Particular Schema

I have a dictionary of the form:
data = {'a':'one','b':'two','c':'three'}
I want to convert this to a tab delimited text file such that the file reads as:
a b c one two three.
I tried:
import json
data = {'a':'one','b':'two','c':'three'}
with open('file.txt', 'w') as file:
file.write(json.dumps(data))
However the resulting file just reads as ('a':'one','b':'two','c':'three'). I knew it wouldn't be as simple as that, and I'm sure it's not complex, but I just can't seem to figure this one out.
data = {'a':'one','b':'two','c':'three'}
s = ""
for x in data.keys():
s += x
s += "\t"
for x in data.values():
s += x
s += "\t"
print(s)
with open('file.txt', 'w') as file:
file.write(s)
Dictionary is a structure that's designed for when a one-to-one association between values exist. Here is a link to further discussions on how it compares with other structures.
Therefore it makes sense to print the key:value pair together to preserve that association. Thus the default behaviour of print(data) or in your case file.write(data) when data is a dictionary is to output {'a': 'one', 'b': 'two', 'c': 'three'}.
The key1, key2, ... value1, value2 ... type of output format you request is not typical for a structure like dictionary, therefore a more "manual" approach like the one above involving two loops is required.
As for json, its usage is really not that relevant in the code you provided, maybe it is used in other parts of your code base when a json specific output format is required. You can read more on json here to know that it is a format independent of the python programming language.

How to loop through a list of dictionaries and write the values as individual columns in a CSV

I have a list of dictionaries
d = [{'value':'foo_1', 'word_list':['blah1', 'blah2']}, ...., {'value': 'foo_n', 'word_list':['meh1', 'meh2']}]
I want to write this to a CSV file with all the 'value' keys in one column, and then each individual word from the "value"'s word_list as its own column. So I have the first row as
foo_1 blah_1 blah_2
and so on.
I don't know how many dictionaries I have, or how many words I have in "word_list".
How would I go about doing this in Python 3?
Thanks!
I figured out a solution, but it's kind of messy (wow, I can't write a bit of code without it being in the "proper format"...how annoying):
with open('filename', 'w') as f:
for key in d.keys():
f.write("%s,"%(key))
for word in d[key]:
f.write("%s,"%(word))
f.write("\n")
You can loop through the dictionaries one at a time, construct the list and then use the csv module to write the data as I have shown here
import csv
d = [{'value':'foo_1', 'word_list':['blah1', 'blah2']}, {'value': 'foo_n', 'word_list':['meh1', 'meh2']}]
with open('test_file.csv', 'w') as file:
writer = csv.writer(file)
for val_dict in d:
csv_row = [val_dict['value']] + val_dict['word_list']
writer.writerow(csv_row)
It should work for word lists of arbitrary length and as many dictionaries as you want.
It would probably be easiest to flatten each row into a normal list before writing it to the file. Something like this:
with open(filename, 'w') as file:
writer = csv.writer(file)
for row in data:
out_row = [row['value']]
for word in row['word_list']:
out_row.append(word)
csv.writerow(out_row)
# Shorter alternative to the two loops:
# csv.writerow((row['value'], *row['word_list']) for row in data)

Correctly storing in data structure in Python

alone,1
amazed,10
amazing,10
bad,1
best,10
better,7
excellent,10
These are some of the keywords and their 'values' that I need to store in a
data structure, preferably a list. Each line will be later used to access/extract the word and its 'value'.
The list I made in a while loop was:
line = KeywordFile.readline()
while line != '':
line=KeywordFile.readline()
line = line.rstrip()
And I tried to convert it to a list form by doing this:
list=[line]
However, when I print the list, I get this:
['amazed,10']
['amazing,10']
['bad,1']
['best,10']
['better,7']
['excellent,10']
I don't think that I'll be able to extract my 'values' from the lists that easy if they are inside quotation marks.
I'm looking for a better way to store the words and its 'value'
Thanks in advance!
A dictionnary is what you need here:
You could do something like:
line = KeywordFile.readline()
while line != '':
line=KeywordFile.readline()
line = line.rstrip().split(',')
out[line[0]] = line[1]
out will look like
{ 'amazed' : 10, 'amazing': 10, 'bad':1 ...}
and the values can be accessed out['amazed'] will return 10

Sort excel worksheet using python

I have an excel sheet like this:
I would like to output the data into an excel file like this:
Basically, for the common elements in column 2,3,4, I want to contacenate the values in the 5th column.
Please suggest, how could I do this ?
The easiest way to approach an issue like this is exporting the spreadsheet to CSV first, in order to help ensure that it imports correctly into Python.
Using a defaultdict, you can create a dictionary that has unique keys and then iterate through lines adding the final column's values to a list.
Finally you can write it back out to a CSV format:
from collections import defaultdict
results = defaultdict(list)
with open("in_file.csv") as f:
header = f.readline()
for line in f:
cols = line.split(",")
key = ",".join(cols[0:4])
results[key].append(cols[4])
with open("out_file.csv", "w") as f:
f.write(header)
for k, v in results.iteritems():
line = '{},"{}",\n'.format(k, ", ".join(v))
f.write(line)

Resources