I have the following dictionary of lists
d = {1: ['1','B1',['C1','C2','C3']], 2: ['2','B2','C15','D12'], 3: ['3','B3'], 4: ['4', 'B4', 'C4', ['D1', 'D2']]}
writing that to a csv using
with open('test.csv', "w", newline = '') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(d.values())
gives me a csv that looks like
A B C D
1 B1 ['C1','C2',C3']
2 B2 C15 D12
3 B3
4 B4 C4 ['D1','D2']
If there is a multiple item list in the value (nested list?), I would like that list to be expanded down the column like this
A B C D
1 B1 C1
1 C2
1 C3
2 B2 C15 D12
3 B3
4 B4 C4 D1
4 D2
I'm fairly new to python and can't seem to figure out a way to do what I need after a few days sifting through forums and banging my head on the wall. I think I may need to break apart the nested lists, but I need to keep them tied to their respective "A" value. Columns A and B will always have 1 entry, columns C and D can have 1 to X number of entries.
Any help is much appreciated
Seems like it might be easier to make a list of lists, with appropriately-located empty spaces, than what you're doing. Here's something that might do:
import csv
from itertools import zip_longest
def condense(dct):
# get the maximum number of columns of any list
num_cols = len(max(dct.values(), key=len)) - 1
# Ignore the key, it's not really relevant.
for _, v in dct.items():
# first, memorize the index of this list,
# since we need to repeat it no matter what
idx = v[0]
# next, use zip_longest to make a correspondence.
# We will deliberately make a 2d list,
# and we will later withdraw elements from it one by one.
matrix = [([] if elem is None else
[elem] if not isinstance(elem, list) else
elem[:] # soft copy to avoid altering original dict
) for elem, _ in zip_longest(v[1:], range(num_cols), fillvalue=None)
]
# Now, we output the top row of the matrix as long as it has contents
while any(matrix):
# If a column in the matrix is empty, we put an empty string.
# Otherwise, we remove the row as we pass through it,
# progressively emptying the matrix top-to-bottom
# as we output a row, we also remove that row from the matrix.
# *-notation is more convenient than concatenating these two lists.
yield [idx, *((col.pop(0) if col else '') for col in matrix)]
# e.g. for key 0 and a matrix that looks like this:
# [['a1', 'a2'],
# ['b1'],
# ['c1', 'c2', 'c3']]
# this would yield the following three lists before moving on:
# ['0', 'a1', 'b1', 'c1']
# ['0', 'a2', '', 'c2']
# ['0', '', '', 'c3']
# where '' should parse into an empty column in the resulting CSV.
The biggest thing to note here is that I use isinstance(elem, list) as a shorthand to check whether the thing is a list (which you need to be able to do, one way or another, to flatten or rounden lists as we do here). If you have more complicated or more varied data structures, you'll need to improvise with this check - maybe write a helper function isiterable() that tries to iterate through and returns a boolean based on whether doing so produced an error.
That done, we can call condense() on d and have the csv module deal with the output.
headers = ['A', 'B', 'C', 'D']
d = {1: ['1','B1',['C1','C2','C3']], 2: ['2','B2','C15','D12'], 3: ['3','B3'], 4: ['4', 'B4', 'C4', ['D1', 'D2']]}
# condense(d) produces
# [['1', 'B1', 'C1', '' ],
# ['1', '', 'C2', '' ],
# ['1', '', 'C3', '' ],
# ['2', 'B2', 'C15', 'D12'],
# ['3', 'B3', '', '' ],
# ['4', 'B4', 'C4', 'D1' ],
# ['4', '', '', 'D2' ]]
with open('test.csv', "w", newline = '') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(condense(d))
Which produces the following file:
A,B,C,D
1,B1,C1,
1,,C2,
1,,C3,
2,B2,C15,D12
3,B3,,
4,B4,C4,D1
4,,,D2
This is equivalent to your expected output. Hopefully the solution is sufficiently extensible for you to apply it to your non-MVCE problem.
I have the following dataframe:
raw_data = {'name': ['Willard', 'Nan', 'Omar', 'Spencer'],
'Last_Name': ['Smith', 'Nan', 'Sheng', 'Poursafar'],
'favorite_color': ['blue', 'red', 'Nan', "green"],
'Statues': ['Match', 'Mis-Match', 'Match', 'Mis_match']}
df = pd.DataFrame(raw_data, columns = ['name', 'age', 'favorite_color', 'grade'])
df
I wanna do the following tasks:
Separate the rows that contain Match and Mis-match
Make a category that only contains people whose first name and last name are Nan and love a color(any color except for nan).
Can you guys help me?
Use boolean indexing:
df1 = df[df['Statues'] == 'Match']
df2 = df[df['Statues'] =='Mis-Match']
If missing values are not strings use Series.isna and
Series.notna:
df3 = df[df['Name'].isna() & df['Last_NameName'].isna() & df['favorite_color'].notna()]
If Nans are strings compare by Nan:
df3 = df[(df['Name'] == 'Nan') &
(df['Last_NameName'] == 'Nan') &
(df['favorite_color'] != 'Nan')]
I'm having issues with a loop that I want to:
a. see if a value in a DF row is greater than a value from a list
b. if it is, concatenate the variable name and the value from the list as a string
c. if it's not, pass until the loop conditions are met.
This is what I've tried.
import pandas as pd
import numpy as np
df = {'level': ['21', '22', '23', '24', '25', '26', '27', '28', '29', '30']
, 'variable':'age'}
df = pd.DataFrame.from_dict(df)
knots = [0, 25]
df.assign(key = np.nan)
for knot in knots:
if df['key'].items == np.nan:
if df['level'].astype('int') > knot:
df['key'] = df['variable']+"_"+knot.astype('str')
else:
pass
else:
pass
However, this only yields the key column to have NaN values. I'm not sure why it's not placing the concatenation.
You can do something like this inside the for loop. No need of any if conditions:
df.loc[df['level'].astype('int') > 25, 'key'] = df.loc[df['level'].astype('int') > 25, 'variable'] + '_' + df.loc[df['level'].astype('int') > 25, 'level']
I am trying myself out at spame filters. I tried several methods to label text files as spam. As a result, I have three dataframes. They basically look like this:
df_method_1 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '0', '0']})
df_method_2 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})
df_method_3 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})
I am now trying to creat a dataframe showing, if a file was labled as spam and if so by which method.
In the best case, I can create a dataframe containing the following infortmation:
df_summary = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['All methods', 'Method 2 & Method 3', 'No method']})
Obviously, I am looking for the information. No need for the actual strings.
I tried pandas.DataFrame.isin() to make it happen. But I failed. Any ideas how to do this?
How about merge()?
df1.merge(df2, on="file").merge(df3, on="file")
file spam_x spam_y spam
0 A 1 1 1
1 B 0 1 1
2 C 0 0 0
Im trying to find a way to be able to input a matrix from a text file;
for example, a text file would contain
1 2 3
4 5 6
7 8 9
And it would make a matrix with those numbers and put it in matrix = [[1,2,3],[4,5,6],[7,8,9]]
And then this has to be compatible with the way I print the matrix:
print('\n'.join([' '.join(map(str, row)) for row in matrix]))
So far,I tried this
chemin = input('entrez le chemin du fichier')
path = input('enter file location')
f = open ( path , 'r')
matrix = [ map(int,line.split(','))) for line in f if line.strip() != "" ]
All it does is return me a map object and return an error when I try to print the matrix.
What am I doing wrong? Matrix should contain the matrix read from the text file and not map object,and I dont want to use external library such as numpy
Thanks
You can use list comprehension as such:
myfile.txt:
1 2 3
4 5 6
7 8 9
>>> matrix = open('myfile.txt').read()
>>> matrix = [item.split() for item in matrix.split('\n')[:-1]]
>>> matrix
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
You can also create a function for this:
>>> def matrix(file):
... contents = open(file).read()
... return [item.split() for item in contents.split('\n')[:-1]]
...
>>> matrix('myfile.txt')
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
is working with both python2(e.g. Python 2.7.10) and python3(e.g. Python 3.6.4)
rows=3
cols=3
with open('in.txt') as f:
data = []
for i in range(0, rows):
data.append(list(map(int, f.readline().split()[:cols])))
print (data)