Related
I have a text file (Player_hits.text) that I am trying to pull player batting averages from. Similar to lines 179-189 I want to find an average. However, I do not want to find the average for the entire team. Instead, I want to find the average of every individual player on the team.
For instance, the text file is set up as such:
Player_hits.txt
In this file a 1 defines a hit and a 0 means the player did not get a hit. I am trying to pull an individual average for both players. (Alex = 0.500, Riley = 0.666)
If someone could help, that would be greatly appreciated!
Thanks!
Link to original code on repl.it: Baseball Stat-Tracking
JSONDecodeError Image
The json.decoder.JSONDecodeError: is coming because the json.loads() doesn't interpret that (each line, '[[1, 'Riley']\n'as valid json format. You can use ast to read in that list as a literal evaluation, thus storing that as a list element [', 'Riley'] in your list of p_hits.
Then the second part is you can convert to the dataframe and groupby the 'name' column. So jim has the right idea, but there's errors in that too (Ie. colmuns should be columns, and the items in the list need to be strings ['hit','name'], not undeclared variables.
import pandas as pd
import ast
p_hits = []
with open('Player_hits.txt') as hits:
for line in hits:
l = ast.literal_eval(line)
p_hits.append(l)
df = pd.DataFrame(p_hits, columns=['hit', 'name'])
Output: with an example dataset I made
print(df.groupby(['name']).mean())
hit
name
Matt 0.714286
Riley 0.285714
Todd 0.500000
import pandas as pd
import json
p_hits = []
with open('Player_hits.txt') as hits:
for line in hits:
l = json.loads(line)
p_hits.append(l)
df = pd.DataFrame.from_records(p_hits, colmuns=[hit, name])
df.groupby(['name']).mean()
I have an excel file that has multiple rows which contain similar data. For example employee name is repeated in multiple rows but i would like to import such records only once and not multiple times into my database to avoid redundancy. I have seen that skip_rows method may help with this but still cannot figure how exactly to use it since the documentation is very limited. Any help will be appreciated :)
One way to achieve this is to keep a list of already imported values (based on some identifier), and then override skip_row() to ignore any duplicates.
For example:
class _BookResource(resources.ModelResource):
imported_names = set()
def after_import_row(self, row, row_result, row_number=None, **kwargs):
self.imported_names.add(row.get("name"))
def skip_row(self, instance, original):
return instance.name in self.imported_names
class Meta:
model = Book
fields = ('id', 'name', 'author_email', 'price')
Then running this will skip any duplicates:
# set up 2 unique rows and 1 duplicate
rows = [
('book1', 'email#example.com', '10.25'),
('book2', 'email#example.com', '10.25'),
('book1', 'email#example.com', '10.25'),
]
dataset = tablib.Dataset(*rows, headers=['name', 'author_email', 'price'])
book_resource = _BookResource()
result = book_resource.import_data(dataset)
print(result.totals)
This gives the output:
OrderedDict([('new', 2), ('update', 0), ('delete', 0), ('skip', 1), ('error', 0), ('invalid', 0)])
I found a way to skip the rows which is already present in the database. One way to do this is by comparing the particular field in the database with a column in the excel file.
So in this particular example, I am assuming that ig_username is a field in django model InstagramData and ig_username is also present in our excel file which I want to upload.
So in this particular example ig_username value in the excel get skipped if it is already present in the database. I have suffered a lot for this answer, I don't want you to be 😊
class InstagramResource(resources.ModelResource):
def skip_row(self, instance, original):
check=[]
new=InstagramData.objects.all()
for p in new:
check.append(p.ig_username)
if instance.ig_username in check:
return True
else:
print("no")
return False
class Meta:
model = InstagramData
class InstagramDataAdminAdmin(ImportExportModelAdmin,admin.ModelAdmin):
resource_class = InstagramResource
admin.site.register(InstagramData,InstagramDataAdminAdmin)
I have a python dictionary like below
car_dict=
{
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
I need the output like below
benz,876456,965471
audi,523487,456879
bmw,754235,543298
and also in sorted form as well like below
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Please help me in getting both outputs
You could do this:
car_dict= {
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
cars = []
for car in car_dict:
cars.append('{},{},{}'.format(
car,
car_dict[car]['usa'],
car_dict[car]['uk']
))
cars = sorted(cars)
for car in cars:
print(cars)
Result
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Explanation
Loop through each car and store the model, USA number and UK number in a list. Sort the list alphabetically. List it.
To print the data
# Use List comprehension to sorted list of values from car, USA, UK fields
data = [[car] + list(regions.values()) for car, regions in sorted(car_dict.items(), key=lambda x:x[0])]
for row in data:
print(*row, sep = ',')
Output
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Explanation
Sort items by car
for car, regions in sorted(car_dict.items(), key=lambda x:x[0])
Each inner list in list comprehension to be row of car, USA, UK values
[car] + list(regions.values())
Print each row comma delimited
for row in data:
print(*row, sep = ',')
Are you looking to print the outputs or just organize the data in a better format for analysis.
If latter, I would use pandas and do the following
import pandas as pd
pd.DataFrame(car_dict).transpose().sort_index()
To view the output on terminal they way you requested,
for index, row in pd.DataFrame(car_dict).transpose().sort_index().iterrows():
print('{},{},{}'.format(index, row['usa'], row['uk']))
will print this out:
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Find below the solution,
car_dict = {
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
keys = list(car_dict.keys())
keys.sort()
for i in keys:
print ( i, car_dict[i] ['usa'], car_dict[i] ['uk'])
I like the list comprehension answer given by #DarrylG however you do not really need the lambda expression in this case.
sorted() will just do the sort by key by default, so you can just use :
data = [[car] + list(regions.values()) for car, regions in sorted(car_dict.items())]
I would also make another slight change. If you wanted more explicit control over the region ordering (or wanted a different ordering, you could replace the [car] + list(regions.values()) with [car, regions['usa'], regions['uk']] like this:
data = [[car, regions['usa'], regions['uk']] for car, regions in sorted(car_dict.items())]
Of course, that means that if you added more regions you would have to change this, but I prefer setting the order explicitly.
I want to create a new column that reflects the Group of the Text based on whether the Text contains keywords
The code for creating the data and expected output is as below. I know of functions that can compare across the same row but not across multiple roles as this one requires.
Creating a dictionary was one approach I tried, but it would be a massive dictionary beyond these test cases
import pandas as pd
import numpy as np
data = {'Group': ['Greetings', 'Farewells', 'Requests'], 'Key':['Hello hey','Goodbye', 'I need help'],
'Text': ['Hey Bob, goodbye', 'Hey, have you met handsome Terence?', 'Hey Omg I really need help']}
data2 = {'Group': ['Greetings', 'Farewells', 'Requests'], 'Key':['Hello hey','Goodbye', 'I need help'],
'Text': ['Hey Bob, goodbye', 'Hey, have you met handsome Terence?', 'Hey Omg I really need help'],
'Groupings': ['Greetings, Farewells', 'Greetings', 'Greetings, Requests']}
new_df = pd.DataFrame(data)
print(new_df)
expected_output = pd.DataFrame(data2)
expected_output
Text 1 has groupings Greetings, Farewells because it contains keywords "Hey" and "Goodbye".
Text 3 has groupings Greetings, Requests because it contains keywords "Hey" and "I need help" is found within the Text
Hope I've provided enough information to go on
EDIT: Also as a side-question, how do I include the tables in the question to make it easier to view?
I learned from the Torchtext documentation that the way to import csv files is through TabularDataset. I did it like this:
train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",data.Field(use_vocab=True,include_lengths=False)),
("statement",data.Field(use_vocab=True,include_lengths=True))],
skip_header=True)
"label" and "statement" are the header names of the 2 columns in my csv file. I defined them as data.Field, but "label" and "statement" don't seem to actually contain the data from my csv file, despite being recognized as data field objects by the console with no problem. I found out this issue when I tried to build a vocab list with statement.build_vocab(train, max_size=25000). I printed len(statement.vocab), the return is "2", which obviously doesn't reflect the actual data in the csv file. Did I do something wrong when importing the csv data or is my vocab building done wrong? Is there a separate method to put the data in the field objects? Thanks!!
The fields must be defined separately like this
TEXT = data.Field(sequential=True,tokenize=tokenize, lower=True, include_lengths=True)
LABEL = data.Field(sequential=True,tokenize=tokenize, lower=True)
train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)
test = data.TabularDataset(path='./test.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)