Related
I have a OrderedDict below, which column1 and column2 present a relationship
This created for me the following OrderedList
OrderedDict([('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
I would like to iterate over the elements, each time look at other row and collect column2.
eg.
element A contain Q_30 and Q24 and collect related member from other rows
element B contain Q_30, so collect Q24,Q28,Q30 and order by column3
OrderedDict([('AD',
[{'Q_30':100, 'Q_24':74, 'Q_25':38, 'Q_28': 37}, {'D_4':44}, {'D_5':44}])
When I understand this correctly, your "OrderedDict" is currently a tuple with a list inside, in which is another list and is meant to look like this:
OrderedList = ('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
and you want to convert it into a tuple with a list inside which holds dicts:
OrderedDict = ('AD',
[{'Q_30': 100,
'Q_24': 74,
'Q_25': 38,
'Q_28': 37},
{'D_4': 44},
{'D_5': 44}])
In this case I am guessing you look for groupby():
from itertools import groupby
OrderedList = ('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
for key, group in groupby(OrderedList[1], lambda x: x[0]):
for thing in group:
print("%s is a %s." % (thing[1], key))
Gives:
Q_30 is a A.
Q_24 is a A.
Q_28 is a B.
Q_30 is a B.
Q_25 is a C.
Q_30 is a C.
D_4 is a D.
D_4 is a E.
D_5 is a F.
This is not the full answer, but an example as I feel like it would be spoon-feeding otherwise
I am trying to transform a dataframe using pivot. Since the column contains duplicate entries, i tried to add a count column following what's suggested here (Question 10 posted in this answer).
import pandas as pd
from pprint import pprint
if __name__ == '__main__':
d = {
't': [0, 1, 2, 0, 1, 2, 0, 2, 0, 1],
'input': [2, 2, 2, 2, 2, 2, 4, 4, 4, 4],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B'],
'value': [0.1, 0.2, 0.3, 1, 2, 3, 1, 2, 1, 1],
}
df = pd.DataFrame(d)
df = df.drop('t', axis=1)
df.insert(0, 'count', df.groupby('input').cumcount())
pd.pivot(df, index='count', columns='type', values='value')
But I still get the same error raise ValueError("Index contains duplicate entries, cannot reshape") ValueError: Index contains duplicate entries, cannot reshape.
Could someone please suggest how to resolve this error?
As far as you have more then one value associated with 'A' and 'B' you have to aggregate values somehow.
So if I've understood your issue right possible solution is the following:
#pip install pandas
import pandas as pd
d = {
't': [0, 1, 2, 0, 1, 2, 0, 2, 0, 1],
'input': [2, 2, 2, 2, 2, 2, 4, 4, 4, 4],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B'],
'value': [0.1, 0.2, 0.3, 1, 2, 3, 1, 2, 1, 1],
}
df = pd.DataFrame(d)
df
# I've used aggfunc='sum' argument for example, the default value is 'mean'
pd.pivot_table(df, index='t', columns='type', values='value', aggfunc='sum')
Returns
I am trying to do something that seems straightforward, but is giving me endless trouble.
What I would like to do:
1 for i in nameList
2 Iterate through each row of aggregatedCSV
3 If i is a partial match in current row, append that entire row to a new name-specific CSV
(repeat steps 2 and 3 for remaining i in nameList)
nameList = ['Jon', 'Bob', 'Tim']
aggregatedCSV = [
[1, '3', 'Bob85'],
[2, 'Jon52', '8'],
['Bob1', '14', 3],
['Tim95', 8, '6'],
['8', 11, 'Tim48'],
[10, 'Jon11', '44'],
[26, '21', 'Jon90'],
[99, '23', 'Bob19'],
[7, '24', 'Tim82']
]
The desired output would ultimately be three new CSV files but, to keep it simple for here, I am trying to get something like:
JonList = [[2, 'Jon52', '8'], [10, 'Jon11', '44'],[26, '21', 'Jon90']]
BobList = [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
TimList = [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
Although I have manually created nameList for this example, I will be reading from csv files that will have an unknown number of rows, with an unknown number of values per row.
Any help is appreciated.
I don't know python so there is surely a faster, more efficient way but this is what I came up with:
from collections import defaultdict
nameSpecificData = defaultdict(list)
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
nameSpecificData[name].append(row)
This stores the results in a dictionary keyed on the name so that you don't need to know what is in the nameList in order to make output variables:
When run with your input, it results in:
{
'Jon': [[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']],
'Bob': [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']],
'Tim': [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
}
If you really, really want to make separate name specific variables, then this will work:
JonList = []
BobList = []
TimList = []
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
globals()[name+'List'].append(row)
And it produces your desired output:
>>> print(JonList)
[[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']]
>>> print(BobList)
[[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
>>> print(TimList)
[['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
I have a dataframe as given below
data = {
'Code': ['P', 'J', 'M', 'Y', 'P', 'Z', 'P', 'P', 'J', 'P', 'J', 'M', 'P', 'Z', 'Y', 'M', 'Z', 'J', 'J'],
'Value': [10, 10, 20, 30, 10, 40, 50, 10, 10, 20, 10, 50, 60, 40, 30, 20, 40, 20, 10]
}
example = pd.DataFrame(data)
Using Python 3, I want to create another dataframe from the dataframe example such that the Code associated with the greater number of Value is obtained.
The new dataframe should look like solution below
output = {'Code': ['J', 'M', 'Y', 'Z', 'P', 'M'],'Value': [10, 20, 30, 40, 50, 50]}
solution = pd.DataFrame(output)
As can be seen, J has more association to Value 10 than other Code so J is selected, and so on.
You could define a function that returns the most occurring items and apply it to the grouped elements. Finally explode to list to rows.
>>> def most_occurring(grp):
... res = Counter(grp)
... highest = max(res.values())
... return [k for k, v in res.items() if v == highest]
...
>>> example.groupby('Value')['Code'].apply(lambda x: most_occurring(x)).explode().reset_index()
Value Code
0 10 J
1 20 M
2 30 Y
3 40 Z
4 50 P
5 50 M
6 60 P
If I understood correctly, you need something like this:
grouped = example.groupby(['Code', 'Value']).indices
arr_tmp = []
[arr_tmp.append([i[0], i[1], len(grouped[i])]) for i in grouped]#['Int64Index'])
output = pd.DataFrame(data=arr_tmp, columns=['Code', 'Value', 'index_count'])
output = output.sort_values(by=['index_count'], ascending=False)
output.reset_index(inplace=True)
output
I have a double list of this type: dl = [[13, 22, 41], ['c', 'b', 'a']], in which, each element dl[0][i] belongs a value in dl[1][i] (with the same index). How can I sort my list using dl[0] values as my order criteria, maintainning linked both sublists? Sublist are kind of 'linked data', so the previous dl[0][i] and dl[1][i] values must match their index after sorting the parent entire list, using as sorting criteria, the first sublist values
I expect something like:
input: dl = [ [14,22,7,17], ['K', 'M', 'F','A'] ]
output: dl = [ [7, 14, 17, 22], ['F', 'K', 'A', 'M'] ]
This was way too much fun to write. I don't doubt that this function can be greatly improved, but this is what I've gotten in a very short amount of time and should get you started.
I've included some tests just so you can verify that this does indeed do what you want.
from unittest import TestCase, main
def sort_by_first(data):
sorted_data = []
for seq in data:
zipped_to_first = zip(data[0], seq)
sorted_by_first = sorted(zipped_to_first)
unzipped_data = zip(*sorted_by_first)
sorted_data.append(list(tuple(unzipped_data)[1]))
return sorted_data
class SortByFirstTestCase(TestCase):
def test_sort(self):
output_1 = sort_by_first([[1, 3, 5, 2, 4], ['a', 'b', 'c', 'd', 'e']])
self.assertEqual(output_1, [[1, 2, 3, 4, 5], ['a', 'd', 'b', 'e', 'c']])
output_2 = sort_by_first([[9, 1, 5], [21, 22, 23], ['spam', 'foo', 'bar']])
self.assertEqual(output_2, [[1, 5, 9], [22, 23, 21], ['foo', 'bar', 'spam']])
if __name__ == '__main__':
main()
Updated for what you're looking for, selection sort but added another line to switch for the second list to match the first.
for i in range(len(dl[0])):
min_idx = i
for j in range(i+1, len(dl[0])):
if dl[0][min_idx] > dl[0][j]:
min_idx = j
dl[0][i], dl[0][min_idx] = dl[0][min_idx], dl[0][i]
dl[1][i], dl[1][min_idx] = dl[1][min_idx], dl[1][i]
You can try solving this with a for loop also:
dl = [ [3,2,1], ['c', 'b', 'a'] ]
for i in range(0,len(dl)):
dl[i].sort()
print(dl)