I have a txt file including 9 columns and 6 rows. The first 8 columns are either of these values: "1" , "2" and "3". I named these columns from "A" to "H". I named the last column: "class".
The last column is a name : "HIGH". Here is the txt file (data.txt):
1,1,1,1,2,1,1,3,HIGH
1,1,1,2,2,1,1,3,HIGH
1,1,1,1,1,1,1,3,HIGH
1,1,1,2,1,1,1,3,HIGH
1,1,1,3,2,1,1,3,HIGH
1,1,1,2,1,2,1,3,HIGH
I am trying to count the number of each value in each column and print a list that should have 3 components including the numbers of "1", "2" and "3" values in that column respectively. For example in the first column (e.g A) all values are "1". I expect to get : A : [6,0,0]. For the 8th column (e.g. H) where all values are "3", I expect to get: H : [0,0,6] or for the fourth column (e.g. D) I have two "1" , three "2" and one "3". So I expect : D : [2,3,1]. I tried to get it done using pandas and collection . Here is what I did:
import pandas as pd
from collections import Counter
df = pd.read_csv('data.txt')
df.columns = ['A','B','C','D','E','F','G','H','class']
X = df.ix[:, 0:8].values
y = df.ix[:, 8].values
deg = ['HIGH']
names = ['A','B','C','D','E','F','G','H']
for j in range(0, 8):
freqs = Counter(X[y == deg[0], j])
print(names[j],':',list(freqs.values()))
The output of the above code are empty lists. Here is what it returns:
A : []
B : []
C : []
D : []
E : []
F : []
G : []
H : []
How can I modify the above code to get what I want?
Thanks!
Use pandas.Series.value_counts
df.loc[:, :"H"].apply(pd.Series.value_counts).fillna(0).to_dict("l")
Output:
{'A': [6.0, 0.0, 0.0],
'B': [6.0, 0.0, 0.0],
'C': [6.0, 0.0, 0.0],
'D': [2, 3, 1],
'E': [3.0, 3.0, 0.0],
'F': [5.0, 1.0, 0.0],
'G': [6.0, 0.0, 0.0],
'H': [0.0, 0.0, 6.0]}
Define the following function:
def cntInts(col):
vc = col.value_counts()
return [ vc.get(i, 0) for i in range(1,4) ]
Then apply it and print results:
for k, v in df.loc[:, 'A':'H'].apply(cntInts).iteritems():
print(f'{k}: {v}')
For your data sample I got:
A: [6, 0, 0]
B: [6, 0, 0]
C: [6, 0, 0]
D: [2, 3, 1]
E: [3, 3, 0]
F: [5, 1, 0]
G: [6, 0, 0]
H: [0, 0, 6]
Or maybe it is enough to call just:
df.loc[:, 'A':'H'].apply(cntInts)
This time the result is a Series, which when printed yields:
A [6, 0, 0]
B [6, 0, 0]
C [6, 0, 0]
D [2, 3, 1]
E [3, 3, 0]
F [5, 1, 0]
G [6, 0, 0]
H [0, 0, 6]
dtype: object
Edit
Following your comments I suppose that there is something wrong with your data.
To trace the actual reason:
Define a string variable:
txt = '''1,1,1,1,2,1,1,3,HIGH
1,1,1,2,2,1,1,3,HIGH
1,1,1,1,1,1,1,3,HIGH
1,1,1,2,1,1,1,3,HIGH
1,1,1,3,2,1,1,3,HIGH
1,1,1,2,1,2,1,3,HIGH'''
Run:
import io
df = pd.read_csv(io.StringIO(txt), names=['A','B','C','D','E','F','G','H','class'])
Run my code on my data. The result should be just as expected.
Then read your input file (also into df) and run my code again.
Probably there is some difference between your data and mine.
Especially look for any extra spaces in your input file,
check also column types (after read_csv).
Solution with collections is select all columns without last, convert Counter to Series, so output is DataFrame, replace missing values by DataFrame.fillna, convert values to integers and last to dictionary by DataFrame.to_dict:
from collections import Counter
d = (df.iloc[:, :-1].apply(lambda x: pd.Series(Counter(x)))
.fillna(0)
.astype(int)
.to_dict("list"))
print (d)
{'A': [6, 0, 0], 'B': [6, 0, 0],
'C': [6, 0, 0], 'D': [1, 4, 1],
'E': [3, 3, 0], 'F': [5, 1, 0],
'G': [6, 0, 0], 'H': [0, 0, 6]}
Only pandas solution with pandas.value_counts:
d = (df.iloc[:, :-1].apply(pd.value_counts)
.fillna(0)
.astype(int)
.to_dict("list"))
print (d)
{'A': [6, 0, 0], 'B': [6, 0, 0],
'C': [6, 0, 0], 'D': [2, 3, 1],
'E': [3, 3, 0], 'F': [5, 1, 0],
'G': [6, 0, 0], 'H': [0, 0, 6]}
Working within python, since your end result is a dictionary:
from string import ascii_uppercase
from collections import Counter, defaultdict
from itertools import chain, product
import csv
d = defaultdict(list)
fieldnames = ascii_uppercase[:9]
# test.csv is your file above
with open('test.csv') as csvfile:
reader = csv.DictReader(csvfile, fieldnames = list(fieldnames))
reader = Counter(chain.from_iterable(row.items() for row in reader))
for col, value in product(fieldnames, ("1","2","3")):
if col != fieldnames[-1]:
d[col].append(reader.get((col,value), 0))
print(d)
defaultdict(list,
{'A': [6, 0, 0],
'B': [6, 0, 0],
'C': [6, 0, 0],
'D': [2, 3, 1],
'E': [3, 3, 0],
'F': [5, 1, 0],
'G': [6, 0, 0],
'H': [0, 0, 6]})
I have two lists:
a = ["A", "B", "B", "C", "D", "A"]
b = [1, 2, 3, 4, 5, 6]
I want to have a dictionary like the one below:
d = {"A":[1, 6], "B":[2, 3], "C":[4], "D":[5]}
Right now I am doing something like this:
d = {i:[] for i in set(a)}
for c in zip(a, b):
d[c[0]].append(c[1])
Is there a better way to do this?
You can use the dict.setdefault method to initialize each key as a list and then append the current value to it while you iterate:
d = {}
for k, v in zip(a, b):
d.setdefault(k, []).append(v)
With the sample input, d would become:
{'A': [1, 6], 'B': [2, 3], 'C': [4], 'D': [5]}
Running your code, didnt produce the output what you wanted. The below is a bit more verbose but makes it easy to see what the code is doing.
a = ["A", "B", "B", "C", "D", "A"]
b = [1, 2, 3, 4, 5, 6]
c = {}
for a, b in zip(a, b):
if a in c:
if isinstance(a, list):
c[a].append(b)
else:
c[a] = [c[a], b]
else:
c[a] = b
print(c)
OUTPUT
{'A': [1, 6], 'B': [2, 3], 'C': 4, 'D': 5}
An alternative less verbose mode would be to use a default dict with list as the type, you can then append all the items to it. this will mean even single items will be in a list. to me its much cleaner as you will know the data type of each item in the list. However if you really do want to have single items not in a list you can clean it up with a dict comprehension
from collections import defaultdict
a = ["A", "B", "B", "C", "D", "A"]
b = [1, 2, 3, 4, 5, 6]
c = defaultdict(list)
for a, b in zip(a, b):
c[a].append(b)
d = {k: v if len(v) > 1 else v[0] for k, v in c.items()}
print(c)
print(d)
OUTPUT
defaultdict(<class 'list'>, {'A': [1, 6], 'B': [2, 3], 'C': [4], 'D': [5]})
{'A': [1, 6], 'B': [2, 3], 'C': 4, 'D': 5}
I have a python program in which I have a list which resembles the list below:
a = [[1,2,3], [4,2,7], [5,2,3], [7,8,5]]
Here I want to create a dictionary using the middle value of each sublist as keys which should look something like this:
b = {2:[[1,2,3], [4,2,7], [5,2,3]], 8: [[7,8,5]]}
How can I achieve this?
You can do it simply like this:
a = [[1,2,3], [4,2,7], [5,2,3], [7,8,5]]
b = {}
for l in a:
m = l[len(l) // 2] # : get the middle element
if m in b:
b[m].append(l)
else:
b[m] = [l]
print(b)
Output:
{2: [[1, 2, 3], [4, 2, 7], [5, 2, 3]], 8: [[7, 8, 5]]}
You could also use a defaultdict to avoid the if in the loop:
from collections import defaultdict
b = defaultdict(list)
for l in a:
m = l[len(l) // 2]
b[m].append(l)
print(b)
Output:
defaultdict(<class 'list'>, {2: [[1, 2, 3], [4, 2, 7], [5, 2, 3]], 8: [[7, 8, 5]]})
Here is a solution that uses dictionary comprehension:
from itertools import groupby
a = [[1,2,3], [4,2,7], [5,2,3], [7,8,5]]
def get_mid(x):
return x[len(x) // 2]
b = {key: list(val) for key, val in groupby(sorted(a, key=get_mid), get_mid)}
print(b)
I have a dataframe with empty columns and a corresponding dictionary which I would like to update the empty columns with based on index, column:
import pandas as pd
import numpy as np
dataframe = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 6, 2], [3, 4, 1]])
dataframe.columns = ['x', 'y', 'z']
additional_cols = ['a', 'b', 'c']
for col in additional_cols:
dataframe[col] = np.nan
x y z a b c
0 1 2 3
1 4 5 6
2 7 8 9
3 4 6 2
4 3 4 1
for row, column in x.iterrows():
#caluclations to return dictionary y
y = {"a": 5, "b": 6, "c": 7}
df.loc[row, :].map(y)
Basically after performing the calculations using columns x, y, z I would like to update columns a, b, c for that same row :)
I could use a function as such but as far as the pandas library and a method for the DataFrame object I am not sure...
def update_row_with_dict(dictionary, dataframe, index):
for key in dictionary.keys():
dataframe.loc[index, key] = dictionary.get(key)
The above answer with correct indent
def update_row_with_dict(df,d,idx):
for key in d.keys():
df.loc[idx, key] = d.get(key)
more short would be
def update_row_with_dict(df,d,idx):
df.loc[idx,d.keys()] = d.values()
for your code snipped the syntax would be:
import pandas as pd
import numpy as np
dataframe = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 6, 2], [3, 4, 1]])
dataframe.columns = ['x', 'y', 'z']
additional_cols = ['a', 'b', 'c']
for col in additional_cols:
dataframe[col] = np.nan
for idx in dataframe.index:
y = {'a':1,'b':2,'c':3}
update_row_with_dict(dataframe,y,idx)
Lets say I have a pandas data frame with 2 columns(column A and Column B):
For values in column 'A' there are multiple values in column 'B'.
I want to create a dictionary with multiple values for each key those values should be unique as well. Please suggest me a way to do this.
One way is to groupby columns A:
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
In [2]: df
Out[2]:
A B
0 1 2
1 1 4
2 5 6
In [3]: g = df.groupby('A')
Apply tolist on each of the group's column B:
In [4]: g['B'].tolist() # shorthand for .apply(lambda s: s.tolist()) "automatic delegation"
Out[4]:
A
1 [2, 4]
5 [6]
dtype: object
And then call to_dict on this Series:
In [5]: g['B'].tolist().to_dict()
Out[5]: {1: [2, 4], 5: [6]}
If you want these to be unique, use unique (Note: this will create a numpy array rather than a list):
In [11]: df = pd.DataFrame([[1, 2], [1, 2], [5, 6]], columns=['A', 'B'])
In [12]: g = df.groupby('A')
In [13]: g['B'].unique()
Out[13]:
A
1 [2]
5 [6]
dtype: object
In [14]: g['B'].unique().to_dict()
Out[14]: {1: array([2]), 5: array([6])}
Other alternatives are to use .apply(lambda s: set(s)), .apply(lambda s: list(set(s))), .apply(lambda s: list(s.unique()))...
You can actually loop over df.groupby object and collect the value as list.
In[1]:
df = pd.DataFrame([[1, 2], [1, 2], [5, 6]], columns=['A', 'B'])
{k: list(v) for k,v in df.groupby("A")["B"]}
Out[1]:
{1: [2, 2], 5: [6]}